date:20181106

Re: Optimizing nested ConvertRowtypeExpr execution

2018-11-06 Thread Kyotaro HORIGUCHI

Sorry for the absense.

At Sun, 4 Nov 2018 16:26:12 +0100, Dmitry Dolgov <9erthali...@gmail.com> wrote 
in 
> > On Sun, 4 Nov 2018 at 15:48, Andrew Gierth  
> > wrote:
> >
> > > "Dmitry" == Dmitry Dolgov <9erthali...@gmail.com> writes:
> >
> >  Dmitry> This patch went through the last tree commit fests without any
> >  Dmitry> noticeable activity, but cfbot says it still applies and
> >  Dmitry> doesn't break any tests. The patch itself is rather small, and
> >  Dmitry> I could reproduce ~20% of performance improvements while
> >  Dmitry> running the same scripts under pgbench (although not in all
> >  Dmitry> cases), but probably we need to find someone to take over it.
> >  Dmitry> Does anyone wants to do so, maybe Kyotaro?
> >
> > I'll deal with it.
> 
> Thanks!

My last comment was the while() loop does nothing. Ashutosh said
that it is on the model of RelabelType.

I examined the code for T_RelabelType again and it is intending
that the ece_mutator can convert something (specifically only
CollateExpr) to RelableType, then reduce the nested Relabels. So
the order is not wrong in this perspective.

So I don't object to the current patch, but it needs test like
attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 45ef5b753069eb89ab5139828884fbdbf0958778 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 6 Nov 2018 17:12:17 +0900
Subject: [PATCH 2/2] Test code for Optimize nested ConvertRowtypExprs

---
 src/test/regress/expected/inherit.out | 18 ++
 src/test/regress/sql/inherit.sql  |  5 +
 2 files changed, 23 insertions(+)

diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 4f29d9f891..1474ed8190 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -764,6 +764,8 @@ NOTICE:  drop cascades to table c1
 -- tables. See the pgsql-hackers thread beginning Dec. 4/04
 create table base (i integer);
 create table derived () inherits (base);
+create table more_derived (like derived, b int) inherits (derived);
+NOTICE:  merging column "i" with inherited definition
 insert into derived (i) values (0);
 select derived::base from derived;
  derived 
@@ -777,6 +779,22 @@ select NULL::derived::base;
  
 (1 row)
 
+-- remove redundant conversions.
+explain (verbose on, costs off) select row(i, b)::more_derived::derived::base from more_derived;
+QUERY PLAN 
+---
+ Seq Scan on public.more_derived
+   Output: (ROW(i, b)::more_derived)::base
+(2 rows)
+
+explain (verbose on, costs off) select (1, 2)::more_derived::derived::base;
+  QUERY PLAN   
+---
+ Result
+   Output: '(1)'::base
+(2 rows)
+
+drop table more_derived;
 drop table derived;
 drop table base;
 create table p1(ff1 int);
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index a6e541d4da..8308330fed 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -237,9 +237,14 @@ drop table p1 cascade;
 -- tables. See the pgsql-hackers thread beginning Dec. 4/04
 create table base (i integer);
 create table derived () inherits (base);
+create table more_derived (like derived, b int) inherits (derived);
 insert into derived (i) values (0);
 select derived::base from derived;
 select NULL::derived::base;
+-- remove redundant conversions.
+explain (verbose on, costs off) select row(i, b)::more_derived::derived::base from more_derived;
+explain (verbose on, costs off) select (1, 2)::more_derived::derived::base;
+drop table more_derived;
 drop table derived;
 drop table base;
 
-- 
2.16.3

Re: Support custom socket directory in pg_upgrade

2018-11-06 Thread Thomas Munro

On Wed, Oct 10, 2018 at 9:27 AM Daniel Gustafsson  wrote:
> > On 9 Oct 2018, at 16:22, Tom Lane  wrote:
> > Daniel Gustafsson  writes:
> >> Having hit the maximum socketdir length error a number of times in 
> >> pg_upgrade,
> >> especially when running tests in a deep directory hierarchy, I figured it 
> >> was
> >> time to see if anyone else has had the same problem?  The attached patch is
> >> what I run with locally to avoid the issue, it adds a --socketdir=PATH 
> >> option
> >> to pg_upgrade which overrides the default use of CWD.  Is that something 
> >> that
> >> could be considered?
> >
> > I think you could simplify matters if you installed the CWD default value
> > during option processing.
>
> The attached v2 tries to make the socketdir more like the other configurable
> directories in pg_upgrade (adding an envvar for it etc).  Is that more in line
> with what you were suggesting?  make -C src/bin/pg_upgrade check passes with
> this, both unmodified and with a -s in the test script to override it.  Also
> fixed incorrect syntax in the docs part from v1.

I think PGSOCKETDIR should be mentioned in the documentation like the
other environment variables, and also I'm wondering if it actually
works: you set it to the current working directory first, then parse
the command line option if present, and then read the env var only if
not already set: but it's always going to be, isn't it?  Perhaps you
should use getcwd() only if all else fails?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

2018-11-06 Thread Amit Langote

Hi,

Thank you for looking at this.

On 2018/11/06 7:25, Alvaro Herrera wrote:
> On 2018-Aug-07, Amit Langote wrote:
> 
>>> But in
>>> case of partitioning there is only one parent and hence
>>> coldef->constraints is NULL and hence just overwriting it works. I
>>> think we need comments mentioning this special case.
>>
>> That's what I wrote in this comment:
>>
>>/*
>> * Parent's constraints, if any, have been saved in
>> * 'constraints', which are passed back to the caller,
>> * so it's okay to overwrite the variable like this.
>> */
> 
> What is this for?  I tried commenting out that line to see what
> test breaks, and found none.

The quoted comment is an explanation I wrote to describe why I think the
*existing* statement (shown below) is correct.

   coldef->constraints = restdef->constraints;

As you've already figured out, 'coldef' here refers (referred) to the
inherited column definition and 'restdef' to the partition's local
definition.  Ashutosh asked in his review why doing this is OK, because it
appeared that it's essentially leaking/overwriting inherited constraints.
The comment I added was to try to assure future readers that the inherited
constraints have already been linked into into another output variable and
so no constraints are being leaked.

As for why ignoring partition's local constraints (restdef->constraints)
didn't break anything, I see that's because transformCreateStmt would
already have added them to CreateStmt.constraints, so they're already
taken care of.

> I tried to figure it out, so while thinking what exactly is "restdef" in
> that block, it struck me that we seem to be doing things in quite a
> strange way there by concatenating both schema lists.  I changed it so
> that that block scans only the "saved_schema" list (ie. the
> partition-local column list definition), searching the other list for
> each matching item.  This seems a much more natural implementation -- at
> least, it results in less list mangling and shorter code, so.
>
> One thing that broke was that duplicate columns in the partition-local
> definition would not be caught.  However, we already have that check a
> few hundred lines above in the same function ... which was skipped for
> partitions because of list-mangling that was done before that.  I moved
> the NIL-setting of schema one block below, and then all tests pass.

I had hit some error when I made the partition case reuse the code that
handles the OF type case, the details of which unfortunately I don't
remember.  Looking at your take, I can't now think of some case that's
being broken with it.  It's definitely nice that the same strange piece of
code is not repeated twice now.

> One thing that results is that is_from_parent becomes totally useless
> and can be removed.  It could in theory be removed from ColumnDef, if it
> wasn't for the ABI incompatibility that would cause.

:(.  That thing is never meaningful/true outside MergeAttributes().

> BTW this line:
>   coldef->is_not_null == (coldef->is_not_null || restdef->is_not_null)
> can be written more easily like:
>   coldef->is_not_null |= restdef->is_not_null;

Yeah, although there seems to be a typo above: s/==/=/g

Thanks,
Amit

RE: [Bug Fix]ECPG: cancellation of significant digits on ECPG

2018-11-06 Thread Higuchi, Daisuke

From: Dmitry Dolgov [mailto:9erthali...@gmail.com] 
> Thanks for the patches. Unfortunately, judging from the cfbot.cputube.org they
> can't be applied anymore to the current master, could you please rebase them?

Thank you for checking!
I rebased patches on the current master, so I attach them. 

Regards, 
Daisuke Higuchi


002_ecpg_numeric_test_v2.patch
Description: 002_ecpg_numeric_test_v2.patch


001_ecpg_numeric_bugfix_v2.patch
Description: 001_ecpg_numeric_bugfix_v2.patch

Performance improvements of INSERTs to a partitioned table

2018-11-06 Thread Kato, Sho

Hi,

I want to discuss performance improvements of INSERTs to a partitioned table.

When an application inserts records into a table partitioned into thousands, 
find_all_inheritors() locks all of the partitions.
Updating partition key locks in the same way.

So, Execution time becomes longer as the number of partition increases.

* nparts 8

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) 
values(8, 5000, current_timestamp);
   QUERY PLAN
-
Insert on accounts_history  (cost=0.00..0.02 rows=1 width=20) (actual 
time=0.281..0.281 rows=0 loops=1)
   ->  Result  (cost=0.00..0.02 rows=1 width=20) (actual time=0.079..0.080 
rows=1 loops=1)
Planning Time: 0.080 ms
Execution Time: 0.362 ms
(4 rows)

* nparts 8192

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) 
values(8192, 5000, current_timestamp);
   QUERY PLAN
-
Insert on accounts_history  (cost=0.00..0.02 rows=1 width=20) (actual 
time=0.058..0.059 rows=0 loops=1)
   ->  Result  (cost=0.00..0.02 rows=1 width=20) (actual time=0.032..0.034 
rows=1 loops=1)
Planning Time: 0.032 ms
Execution Time: 12.508 ms
(4 rows)

Locking only the target partitions like the patch previously proposed by 
David[1], the performance will improve greatly.

* nparts 8192 (patched)

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) 
values(8192, 5000, current_timestamp);
   QUERY PLAN
-
Insert on accounts_history  (cost=0.00..0.02 rows=1 width=20) (actual 
time=0.415..0.416 rows=0 loops=1)
   ->  Result  (cost=0.00..0.02 rows=1 width=20) (actual time=0.140..0.141 
rows=1 loops=1)
Planning Time: 0.120 ms
Execution Time: 1.694 ms
(4 rows)

However, I am concerned that "unsafe" is included in the name of this patch.
If locking only target partitions and locking all of partitions are executed at 
the same time, a problem may occurs.
But, I'm not sure what kind of problem will occur.
Is it enough to lock only the target partitions?

If a problem occurs in above case, I think it is safer to divide the steps to 
acquire the lock into two.

In first step, locking only the parent table in share or exclusive mode.
In second step, locking only the target partitions after locking the parent 
table.

Thoughts?

[1]: 
https://www.postgresql.org/message-id/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=gvbwvgh4a...@mail.gmail.com

regards,
Sho Kato

Re: ON COMMIT actions and inheritance

2018-11-06 Thread Amit Langote

Hi,

On 2018/11/06 12:03, Michael Paquier wrote:
> On Mon, Nov 05, 2018 at 02:37:05PM +0900, Amit Langote wrote:
>> Michael pointed out a problem with specifying different ON COMMIT actions
>> on a temporary inheritance parent and its children:
>>
>> https://www.postgresql.org/message-id/20181102051804.GV1727%40paquier.xyz
> 
> Thanks for starting a new thread on the matter.
> 
>> One way to fix that is to remove the tables that no longer exist from
>> the list that's passed to heap_truncate(), which the attached patch
>> implements.
> 
> I don't find that much elegant as you move the responsibility to do the
> relation existence checks directly into the ON COMMIT actions, and all
> this logic exists already when doing object drops as part of
> dependency.c.  Alvaro has suggested using performMultipleDeletions()
> instead, which is a very good idea in my opinion:
> https://www.postgresql.org/message-id/20181105193725.4eluxe3xsewr65iu@alvherre.pgsql

Agreed that Alvaro's idea is better.

> So I have dug into the issue and I am finishing with the attached, which
> implements the solution suggested by Alvaro.  The approach used is
> rather close to what is done for on-commit truncation, so as all the
> to-be-dropped relation OIDs are collected at once, then processed at the
> same time.  One thing is that the truncation needs to happen before
> dropping the relations as it could be possible that a truncation is
> attempted on something which has been already dropped because of a
> previous dependency.  This can feel like a waste as it is possible that
> a relation truncated needs to be dropped afterwards if its parent is
> dropped, but I think that this keeps the code simple and more
> understandable.

Thanks for rewriting the patch.

> Another interesting behavior is for example the following scenario with
> partitions:
> +-- Using ON COMMIT DELETE on a partitioned table does not remove
> +-- all rows if partitions preserve their data.
> +begin;
> +create temp table temp_parted_oncommit_test (a int)
> +  partition by list (a) on commit delete rows;
> +create temp table temp_parted_oncommit_test1
> +  partition of temp_parted_oncommit_test
> +  for values in (1) on commit preserve rows;
> +create temp table temp_parted_oncommit_test2
> +  partition of temp_parted_oncommit_test
> +  for values in (2) on commit drop;
> +insert into temp_parted_oncommit_test values (1), (2);
> +commit;
> +-- Data from the remaining partition is still here as its rows are
> +-- preserved.
> +select * from temp_parted_oncommit_test;
> + a
> +---
> + 1
> +(1 row)
> 
> What happens here is that the parent needs to truncate its data, but the
> child wants to preserve them.  This can be made to work but we would
> need to call again find_all_inheritors() to grab the list of partitions
> when working on a partition to match what a manual TRUNCATE does when
> run on a partitioned table.  However, there is a point that the
> partition explicitly wants to *preserve* its rows, which feels also
> natural to me.  This also keeps the code more simple, and users willing
> to remove roes on it could just specify "on commit delete rows" to
> remove them.  So I would tend to keep the code simple, which makes the
> behavior of on commit actions less surprising depending on the table
> kind worked on.

Agree with keeping it simple.  Maybe, we could (should?) document that the
only ON COMMIT action that works when specified with partitioned parent
table is DROP (other actions are essentially ignored)?

> This stuff is too late for this week's release, but we could get
> something into the next one to fix all that.  From what I can see, this
> is handled incorrectly for inheritance trees down to 9.4 (I have not
> tested 9.3 as it is basically EOL'd this week and I am not planning to
> do so).

Seems fine to me.

I only have cosmetic comments on the patch.

+
+/*
+ * Truncate relations before dropping so as all dependencies between

so as -> so that

+ * relations are removed after they are worked on.  This is a waste as it
+ * could be possible that a relation truncated needs also to be removed
+ * per dependency games but this makes the whole logic more robust and
+ * there is no need to re-check that a relation exists afterwards.
+ */

"afterwords" seems redundant, but may I suggest rewriting the 2nd sentence as:

Doing it like this might be a waste as it's possible that a relation being
truncated will be dropped anyway due to it's parent being dropped, but
this makes the code more robust because of not having to re-check that the
relation exists.

Thanks,
Amit

Re: PostgreSQL Limits and lack of documentation about them.

2018-11-06 Thread John Naylor

On 11/1/18, Andrew Gierth  wrote:
> So (with 8k blocks) the limit on the number of non-null external-toasted
> columns is about 450, while you can have the full 1600 columns if they
> are integers or smaller, or just over 1015 bigints. But you can have
> 1600 text columns if they average 4 bytes or less (excluding length
> byte).
>
> If you push too close to the limit, it may even be possible to overflow
> the tuple size by setting fields to null, since the null bitmap is only
> present if at least one field is null. So you can have 1010 non-null
> bigints, but if you try and do 1009 non-null bigints and one null, it
> won't fit (and nor will 999 non-nulls and 11 nulls, if I calculated
> right).

Thanks for that, Andrew, that was insightful. I drilled down to get
the exact values:

Non-nullable columns:
text (4 bytes each or less): 1600
toasted text: 452
int: 1600
bigint: 1017

Nullable columns with one null value:
text (4 bytes each or less): 1600
toasted text: 449
int: 1600
bigint: 1002


-John Naylor

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-06 Thread Thomas Munro

On Wed, Oct 24, 2018 at 1:21 AM Tom Lane  wrote:
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?=  writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
>
> > Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease.  If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
>
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
>
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure.  The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
>
> It's just a POC because there are some things that need more thought
> than I've given them:
>
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream?  If not, what infrastructure do we
> need to add to control that?  In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)
>
> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

I'm not sure about that.  It might in theory be telling you about some
other pipe.  If you're OK with that false positive, why not ignore all
errors after you've read enough successful input and decided to close
the pipe?

> 3. Maybe this should be implemented at some higher level?

It won't work for some programs that ignore or catch the signal, so in
theory you might want to give users the power/responsibility to say
"ignore errors that occur after I decide to hang up".  Here are three
different behaviours I found in popular software, showing termination
by signal, custom error handling that we can't distinguish, and a
bonehead strategy:

$ seq 1 100 | head -5
1
2
3
4
5
... exit code indicates killed by signal

$ python -c "for i in range(100): print i" | head -5
0
1
2
3
4
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 32] Broken pipe
... exit code 1

$ cat Test.java
public class Test {
public static void main(String[] args) {
for (int i = 0; i < 100; ++i) {
System.out.println(Integer.toString(i));
}
}
}
$ javac Test.java
$ java Test | head -5
0
1
2
3
4
... wait a really long time with no output, exit code 0

(Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
exception, except for PrintStreams like System.out which just eat data
after an error...)

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM.  The differences were:
PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored.  Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: PostgreSQL Limits and lack of documentation about them.

2018-11-06 Thread John Naylor

On 11/1/18, David Rowley  wrote:

> I've attached an updated patch, again it's just intended as an aid for
> discussions at this stage. Also included the rendered html.

Looks good so far. Based on experimentation with toasted columns, it
seems the largest row size is 452GB, but I haven't tried that on my
laptop. :-) As for the number-of-column limits, it's a matter of how
much detail we want to include. With all the numbers in my previous
email, that could probably use its own table if we include them all.

On 11/1/18, Nasby, Jim  wrote:
> It’s a bit misleading to say “Can be increased by increasing BLKSZ and
> recompiling”, since you’d also need to re initdb. Given that messing with
> BLKSZ is pretty uncommon I would simply put a note somewhere that mentions
> that these values assume the default BLKSZ of 8192.

+1

-John Naylor

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-06 Thread Etsuro Fujita


(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:

At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita  wrote 
in<5bdc4ba0.7050...@lab.ntt.co.jp>

(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:

At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane  wrote
in<18397.1540297...@sss.pgh.pa.us>

It's just a POC because there are some things that need more thought
than I've given them:

1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream?  If not, what infrastructure do we
need to add to control that?  In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)



ISTM that it would be OK to inherit SIG_DFL in both cases, because I
think it would be the responsibility of the called program to handle
SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
something, though.


So, I think we *should* (not just OK to) restore SIGPIPE to
SIG_DFL in any case here to prevent undetermined situation for
the program.


OK


2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)


Since the SIGPIPE may come from another pipe, I think we
shouldn't generally.


Agreed; if ClosePipeToProgram ignores that failure, we would fail to
get a better error message in CopySendEndOfRow if the called program
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:

 if (cstate->is_program)
 {
 if (errno == EPIPE)
 {
 /*
  * The pipe will be closed automatically on error at
  * the end of transaction, but we might get a better
  * error message from the subprocess' exit code than
  * just "Broken Pipe"
  */
 ClosePipeToProgram(cstate);

 /*
  * If ClosePipeToProgram() didn't throw an error, the
  * program terminated normally, but closed the pipe
  * first. Restore errno, and throw an error.
  */
 errno = EPIPE;
 }
 ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write to COPY program: %m")));
 }


Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
program's exit status. So it is irrelevant to called program's
SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
backend side.


My explanation might not be enough.  Let me explain.  If the called 
program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for 
some reason, ClosePipeToProgram *as-is* would create an error message 
from that program's exit code.  But if we modify ClosePipeToProgram like 
the original POC patch, that function would not create that message for 
that termination.  To avoid that, I think it would be better for 
ClosePipeToProgram to ignore the SIGPIPE failure only in the case where 
the caller is a COPY FROM PROGRAM that is allowed to terminate early. 
(Maybe we could specify that as a new argument for BeginCopyFrom.)



But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.


You mean that we should ignore other failures of the called program
when we stop reading from the pipe early?


Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program
since closing our writing end of a pipe is likely to cause it and


I think so too.


even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.


Sorry, I don't understand this.


4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?


All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..


Not sure, but reverting SIGPIPE to default would be enough as a fix
for the original issue, if we go with the POC patch.


Agreed. I wonder why there's no API that resets all handlers to
SIG_DFL at once or some flag telling to exec() that it should
start with default signal handler set.


Such API would be an improvement, but IMO I think that would go beyond 
the scope of this fix.



Finally in the attached PoC, I set cstate->eof_reached on failur

Re: Connection slots reserved for replication

2018-11-06 Thread Kyotaro HORIGUCHI

Hello. Thank you for the new version.

At Thu, 1 Nov 2018 11:58:52 +0100, Alexander Kukushkin  
wrote in 
> Hi,
> 
> Attached rebased version patch to the current HEAD and created commit fest 
> entry
> On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin  wrote:
> >
> > Hi,
> >
> > On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
> >  wrote:
> >
> > >
> > > Instaed, we can iterally "reserve" connection slots for the
> > > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > > slots are never stolen for other usage. Allowing additional
> > > walsenders comes after all the slots are filled to grab an
> > > available "normal" slot, it works as the same as the current
> > > behavior when walsender_reserved_connectsions = 0.
> > >
> > > What do you think about this?
> >
> > Sounds reasonable, please see the updated patch.

InitializeMaxBackends()
MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-   max_worker_processes;
+   max_worker_processes + replication_reserved_connections;

This means walsender doesn't comsume a connection, which is
different from the current behavior. We should reserve a part of
MaxConnections for walsenders.  (in PostmasterMain,
max_wal_senders is counted as a part of MaxConnections)

+   if (am_walsender && replication_reserved_connections < max_wal_senders
+   && *procgloballist == NULL)
+   procgloballist = &ProcGlobal->freeProcs;

Currently exccesive number of walsenders are rejected in
InitWalSenderSlot and emit the following error.

>ereport(FATAL,
>(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> errmsg("number of requested standby connections "
>"exceeds max_wal_senders (currently %d)",
>max_wal_senders)));

With this patch, if max_wal_senders =
replication_reserved_connections = 3 and the fourth walreceiver
comes, we will get "FATAL: sorry, too many clients already"
instead. It should be fixed.

When r_r_conn = 2 and max_wal_senders = 3 and the three
walsenders are active, in an exreme case where a new replication
connection comes at the same time another is exiting, we could
end up using two normal slots despite that one slot is vacant in
reserved slots. If we want to avoid the case, we need to limit
the number of normal slots to use. I don't think it is acceptable
as is but basically something like the attached would do that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3b6c636077..f86c05e8e0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2250,6 +2250,7 @@ static void
 InitWalSenderSlot(void)
 {
 	int			i;
+	bool		reject = false;
 
 	/*
 	 * WalSndCtl should be set up already (we inherit this by fork() or
@@ -2258,41 +2259,63 @@ InitWalSenderSlot(void)
 	Assert(WalSndCtl != NULL);
 	Assert(MyWalSnd == NULL);
 
+	/* limit the maximum number of non-reserved slots to use */
+	if (MyProc->procgloballist == &ProcGlobal->freeProcs)
+	{
+		int	max_normal_slots
+			= max_wal_senders - replication_reserved_connections;
+
+		if (max_normal_slots <= 0)
+		{
+			/* we mustn't use a normal slot */
+			reject = true;
+		}
+		else if (pg_atomic_add_fetch_u32(&WalSndCtl->n_using_normal_slots, 1)
+			> max_normal_slots)
+		{
+			pg_atomic_sub_fetch_u32(&WalSndCtl->n_using_normal_slots, 1);
+			reject = true;
+		}
+	}
+
 	/*
 	 * Find a free walsender slot and reserve it. If this fails, we must be
 	 * out of WalSnd structures.
 	 */
-	for (i = 0; i < max_wal_senders; i++)
+	if (!reject)
 	{
-		WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
-
-		SpinLockAcquire(&walsnd->mutex);
-
-		if (walsnd->pid != 0)
+		for (i = 0; i < max_wal_senders; i++)
 		{
-			SpinLockRelease(&walsnd->mutex);
-			continue;
-		}
-		else
-		{
-			/*
-			 * Found a free slot. Reserve it for us.
-			 */
-			walsnd->pid = MyProcPid;
-			walsnd->sentPtr = InvalidXLogRecPtr;
-			walsnd->write = InvalidXLogRecPtr;
-			walsnd->flush = InvalidXLogRecPtr;
-			walsnd->apply = InvalidXLogRecPtr;
-			walsnd->writeLag = -1;
-			walsnd->flushLag = -1;
-			walsnd->applyLag = -1;
-			walsnd->state = WALSNDSTATE_STARTUP;
-			walsnd->latch = &MyProc->procLatch;
-			SpinLockRelease(&walsnd->mutex);
-			/* don't need the lock anymore */
-			MyWalSnd = (WalSnd *) walsnd;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
 
-			break;
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid != 0)
+			{
+SpinLockRelease(&walsnd->mutex);
+continue;
+			}
+			else
+			{
+/*
+ * Found a free slot. Reserve it for us.
+ */
+walsnd->pid = MyProcPid;
+walsnd->sentPtr = InvalidXLogRecPtr;
+walsnd->write = InvalidXLogRecPtr;
+walsnd->flush = InvalidXLogRecPtr;
+walsnd->apply = InvalidXLogRecPtr;
+walsnd->writeLag = -1;
+walsnd->flushLag = -1;
+walsnd->applyLag = -1;
+walsnd->state = WALSNDSTATE_STARTUP;
+walsnd-

Re: [HACKERS] generated columns

2018-11-06 Thread Peter Eisentraut

On 30/10/2018 16:14, Sergei Kornilov wrote:
> Hi
> 
> I applied this patch on top 2fe42baf7c1ad96b5f9eb898161e258315298351 commit 
> and found a bug while adding STORED column:
> 
> postgres=# create table test(i int);
> CREATE TABLE
> postgres=# insert into test values (1),(2);
> INSERT 0 2
> postgres=# alter table test add column gen_stored integer GENERATED ALWAYS AS 
> ((i * 2)) STORED;
> ALTER TABLE
> postgres=# alter table test add column gen_virt integer GENERATED ALWAYS AS 
> ((i * 2));
> ALTER TABLE
> postgres=# table test;
>  i | gen_stored | gen_virt 
> ---++--
>  1 ||2
>  2 ||4
> 
> Virtual columns was calculated on table read and its ok, but stored column 
> does not update table data.

This is a small bug that I will fix in my next update.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

2018-11-06 Thread Peter Eisentraut

On 31/10/2018 08:58, Erikjan Rijkers wrote:
> I have also noticed that logical replication isn't possible on tables 
> with a generated column.  That's a shame but I suppsoe that is as 
> expected.

This is an issue we need to discuss.  How should this work?

The simplest solution would be to exclude generated columns from the
replication stream altogether.

Similar considerations also apply to foreign tables.  What is the
meaning of a stored generated column on a foreign table?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: plruby: rb_iterate symbol clash with libruby.so

2018-11-06 Thread Pavel Raiskup

On Monday, November 5, 2018 9:06:41 PM CET Robert Haas wrote:
> On Sat, Nov 3, 2018 at 2:20 PM Tom Lane  wrote:
> > > Is it realistic we could rename red-black tree methods from 'rb_*' to e.g.
> > > 'rbt_*' to avoid this clash?
> >
> > That's not terribly appetizing, because it essentially means we're giving
> > Ruby (and potentially every other library on the planet) veto power over
> > our function namespace.  That does not scale, especially not when the
> > feedback loop has a time constant measured in years :-(
> >
> > I don't have a huge objection to renaming the rbtree functions, other
> > than the precedent it sets ...
> 
> Maybe prefixing with pg_ would better than rb_ to rbt_.  That's our
> semi-standard namespace prefix, I think.  Of course nothing keeps
> somebody else from using it, too, but we can hope that they won't.
> It's certainly not very surprising that Ruby has symbols starting with
> rb_...

I now realized that there's rb_block_call() alternative for rb_iterate()
Ruby call -- which fortunately doesn't collide with PostgreSQL internals.

It means that for sufficiently new Ruby there exists some solution (not
that something similar can not re-appear elsewhere).

Pavel

Re: ON COMMIT actions and inheritance

2018-11-06 Thread Alvaro Herrera

While you're there -- I think the CCI after the heap_truncate is not
needed.  Could as well get rid of it ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

2018-11-06 Thread Simon Riggs

On Tue, 6 Nov 2018 at 04:31, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 31/10/2018 08:58, Erikjan Rijkers wrote:
> > I have also noticed that logical replication isn't possible on tables
> > with a generated column.  That's a shame but I suppsoe that is as
> > expected.
>
> This is an issue we need to discuss.  How should this work?
>
> The simplest solution would be to exclude generated columns from the
> replication stream altogether.
>

IMHO...

Virtual generated columns need not be WAL-logged, or sent.

Stored generated columns should be treated just like we'd treat a column
value added by a trigger.

e.g. if we had a Timestamp column called LastUpdateTimestamp we would want
to send that value


> Similar considerations also apply to foreign tables.  What is the
> meaning of a stored generated column on a foreign table?
>

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: csv format for psql

2018-11-06 Thread Daniel Verite

Michael Paquier wrote:

> Ordering them in alphabetical order is a good idea due to the high
> number of options available, and more would pile up even if this
> separates a bit "aligned" and "unaligned", so I have have separated
> those diffs from the core patch and committed it, leaving the core
> portion of the patch aside for later.

Here's a rebased version following these changes.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68..98147ef 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -672,6 +672,10 @@ COPY count
 
   
CSV Format
+   
+CSV
+in COPY
+   
 

 This format option is used for importing and exporting the Comma
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index a1ca940..2897486 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -152,6 +152,20 @@ EOF
 
 
 
+  --csv
+  
+  
+  
+   CSV
+   in psql
+  
+  Switches to CSV output mode. This is equivalent
+  to \pset format csv.
+  
+  
+
+
+
   -d dbname
   --dbname=dbname
   
@@ -2557,6 +2571,19 @@ lo_import 152801
   
 
   
+  fieldsep_csv
+  
+  
+  Specifies the field separator to be used in the
+  CSV format. When the separator appears in a field
+  value, that field is output inside double quotes according to
+  CSV rules. To set a tab as field separator, type
+  \pset fieldsep_csv '\t'. The default is a comma.
+  
+  
+  
+
+  
   fieldsep_zero
   
   
@@ -2584,11 +2611,16 @@ lo_import 152801
   format
   
   
-  Sets the output format to one of aligned,
-  asciidoc, html,
+  Sets the output format to one of
+  aligned,
+  asciidoc,
+  csv,
+  html,
   latex (uses tabular),
-  latex-longtable, troff-ms,
-  unaligned, or wrapped.
+  latex-longtable,
+  troff-ms,
+  unaligned,
+  or wrapped.
   Unique abbreviations are allowed.  (That would mean one letter
   is enough.)
   
@@ -2596,14 +2628,27 @@ lo_import 152801
   unaligned format writes all columns of a 
row on one
   line, separated by the currently active field separator. This
   is useful for creating output that might be intended to be read
-  in by other programs (for example, tab-separated or comma-separated
-  format).
+  in by other programs.
   
 
   aligned format is the standard, 
human-readable,
   nicely formatted text output;  this is the default.
   
 
+ csv format writes columns separated by
+ commas, applying the quoting rules described in RFC 4180.
+ Alternative separators can be selected with
+ \pset fieldsep_csv.
+ The output is compatible with the CSV format of the
+ COPY command. The header with column names
+ is output unless the tuples_only parameter is
+ on. Title and footers are not printed.
+ Each row is terminated by the system-dependent end-of-line character,
+ which is typically a single newline (\n) for
+ Unix-like systems or a carriage return and newline sequence
+ (\r\n) for Microsoft Windows.
+ 
+
   wrapped format is like 
aligned but wraps
   wide data values across lines to make the output fit in the target
   column width.  The target width is determined as described under
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 0dea54d..ea064ab 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1941,8 +1941,8 @@ exec_command_pset(PsqlScanState scan_state, bool 
active_branch)
 
int i;
static const char *const my_list[] = {
-   "border", "columns", "expanded", "fieldsep", 
"fieldsep_zero",
-   "footer", "format", "linestyle", "null",
+   "border", "columns", "expanded", "fieldsep", 
"fieldsep_csv",
+   "fieldsep_zero", "footer", "format", 
"linestyle", "null",
"numericlocale", "pager", "pager_min_lines",
"recordsep", "recordsep_zero",
"tableattr", "title", "tuples_only",
@@ -3566,6 +3566,9 @@ _align2string(enum printFormat in)
case PRINT_ASCIIDOC:
return "asciidoc";
break;
+   case PRINT_CSV:
+   return "csv";
+

Re: [Bug Fix]ECPG: cancellation of significant digits on ECPG

2018-11-06 Thread Dmitry Dolgov

> On Tue, 6 Nov 2018 at 10:19, Higuchi, Daisuke 
>  wrote:
>
> Thank you for checking!
> I rebased patches on the current master, so I attach them.

After adding 'EXEC SQL ALLOCATE DESCRIPTOR sqlda' I've managed to reproduce the
problem you're talking about, and indeed it looks strange:

=# table testtab ;
   c1
-
1.23456
0.12345
0.01234
(3 rows)

but in ecpg program we've got from gdb:

# for the second record 0.12345
$$1 = {
  ndigits = 5,
  weight = -1,
  rscale = 5,
  dscale = 5,
  sign = 0,
  buf = 0x557636d8 "",
  digits = 0x557636da "\001\002\003\004"
}

# for the third record 0.01234
$$0 = {
  ndigits = 4,
  weight = -2,
  rscale = 5,
  dscale = 5,
  sign = 0,
  buf = 0x55763578 "",
  digits = 0x5576357b "\001\002"
}

Also what's strange for me is that after applying your patches I still got the
same output, not sure why:

./numeric_test
ndigits :6
buf :0   1   2   3   4   5   6
digits  :1   2   3   4   5   6
numeric :1.23456

ndigits :5
buf :0   0   1   2   3   4   0
digits  :1   2   3   4   0
numeric :0.12340

ndigits :4
buf :0   0   0   1   2   0   0
digits  :1   2   0   0
numeric :0.01200

Re: Optimizing nested ConvertRowtypeExpr execution

2018-11-06 Thread Andrew Gierth

> "Kyotaro" == Kyotaro HORIGUCHI  writes:

 Kyotaro> My last comment was the while() loop does nothing. Ashutosh
 Kyotaro> said that it is on the model of RelabelType.

 Kyotaro> I examined the code for T_RelabelType again and it is
 Kyotaro> intending that the ece_mutator can convert something
 Kyotaro> (specifically only CollateExpr) to RelableType, then reduce
 Kyotaro> the nested Relabels. So the order is not wrong in this
 Kyotaro> perspective.

I'm eliminating the while loop in favour of an if, because I also think
it a good idea to ensure that the resulting convertformat is explicit if
any of the component conversions is explicit (rather than relying on the
top of the stack to be the most explicit one).

 Kyotaro> So I don't object to the current patch, but it needs test like
 Kyotaro> attached.

Thanks.

I'm going to pull all this together and commit it shortly.

-- 
Andrew (irc:RhodiumToad)

Re: PostgreSQL vs SQL/XML Standards

2018-11-06 Thread Pavel Stehule

po 29. 10. 2018 v 11:45 odesílatel Pavel Stehule 
napsal:

>
>
> po 29. 10. 2018 v 10:11 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> čt 25. 10. 2018 v 21:47 odesílatel Alvaro Herrera <
>> alvhe...@2ndquadrant.com> napsal:
>>
>>> On 2018-Oct-25, Pavel Stehule wrote:
>>>
>>> > I am thinking so I can fix some issues related to XMLTABLE. Please,
>>> send me
>>> > more examples and test cases.
>>>
>>> Please see Markus Winand's patch that I referenced upthread.
>>>
>>
>> here is a fix of some XMLTABLE mentioned issues.
>>
>
> this update allows cast boolean to numeric types from XPath expressions
>

Attached patch solves some cast issues mentioned by Chap. It solves issue
reported by Markus. I didn't use Markus's code, but it was inspiration for
me. I found native solution from libxml2.

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>> Regards
>>
>> Pavel
>>
>>
>>
>>>
>>> --
>>> Álvaro Herrerahttps://www.2ndQuadrant.com/
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>
>>
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 37d85f71f3..8a95045b87 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4427,6 +4427,35 @@ XmlTableFetchRow(TableFuncScanState *state)
 #endif			/* not USE_LIBXML */
 }
 
+/*
+ * Copy XmlChar string to PostgreSQL memory. Ensure releasing of
+ * source xmllib string.
+ */
+static char *
+copy_and_safe_free_xmlchar(xmlChar *str)
+{
+	char *result;
+
+	if (str)
+	{
+		PG_TRY();
+		{
+			result = pstrdup((char *) str);
+		}
+		PG_CATCH();
+		{
+			xmlFree(str);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+		xmlFree(str);
+	}
+	else
+		result = NULL;
+
+	return result;
+}
+
 /*
  * XmlTableGetValue
  *		Return the value for column number 'colnum' for the current row.  If
@@ -4490,85 +4519,72 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 			{
 *isnull = true;
 			}
-			else if (count == 1 && typid == XMLOID)
-			{
-text	   *textstr;
-
-/* simple case, result is one value */
-textstr = xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[0],
-			   xtCxt->xmlerrcxt);
-cstr = text_to_cstring(textstr);
-			}
-			else if (count == 1)
+			else
 			{
-xmlChar*str;
-xmlNodePtr	node;
-
-/*
- * Most nodes (elements and even attributes) store their data
- * in children nodes. If they don't have children nodes, it
- * means that they are empty (e.g. ). Text nodes and
- * CDATA sections are an exception: they don't have children
- * but have content in the Text/CDATA node itself.
- */
-node = xpathobj->nodesetval->nodeTab[0];
-if (node->type != XML_CDATA_SECTION_NODE &&
-	node->type != XML_TEXT_NODE)
-	node = node->xmlChildrenNode;
-
-str = xmlNodeListGetString(xtCxt->doc, node, 1);
-if (str != NULL)
+if (typid == XMLOID)
 {
-	PG_TRY();
-	{
-		cstr = pstrdup((char *) str);
-	}
-	PG_CATCH();
+	text	   *textstr;
+	StringInfoData str;
+	int			i;
+
+	/* Concatenate serialized values */
+	initStringInfo(&str);
+	for (i = 0; i < count; i++)
 	{
-		xmlFree(str);
-		PG_RE_THROW();
+		textstr =
+			xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
+ xtCxt->xmlerrcxt);
+
+		appendStringInfoText(&str, textstr);
 	}
-	PG_END_TRY();
-	xmlFree(str);
+	cstr = str.data;
 }
 else
 {
-	/* Ensure mapping of empty tags to PostgreSQL values. */
-	cstr = "";
-}
-			}
-			else
-			{
-StringInfoData str;
-int			i;
-
-Assert(count > 1);
+	xmlChar	   *str;
 
-/*
- * When evaluating the XPath expression returns multiple
- * nodes, the result is the concatenation of them all. The
- * target type must be XML.
- */
-if (typid != XMLOID)
-	ereport(ERROR,
-			(errcode(ERRCODE_CARDINALITY_VIOLATION),
-			 errmsg("more than one value returned by column XPath expression")));
+	if (count > 1)
+		ereport(ERROR,
+(errcode(ERRCODE_CARDINALITY_VIOLATION),
+ errmsg("more than one value returned by column XPath expression")));
 
-/* Concatenate serialized values */
-initStringInfo(&str);
-for (i = 0; i < count; i++)
-{
-	appendStringInfoText(&str,
-		 xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
-			  xtCxt->xmlerrcxt));
+	str = xmlXPathCastNodeSetToString(xpathobj->nodesetval);
+	if (str)
+		cstr = copy_and_safe_free_xmlchar(str);
+	else
+		/* empty element */
+		cstr = "";
 }
-cstr = str.data;
 			}
 		}
 		else if (xpathobj->type == XPATH_STRING)
 		{
 			cstr = (char *) xpathobj->stringval;
 		}
+		else if (xpathobj->type == XPATH_BOOLEAN)
+		{
+			char		typcategory;
+			bool		typispreferred;
+			xmlChar	   *str;
+
+			/* Allow implicit casting from boolean to numbers */
+			get_type_category_preferred(typid, &typcategory, &typispreferred);
+
+			if (typcate

Re: Optimizing nested ConvertRowtypeExpr execution

2018-11-06 Thread Andrew Gierth

> "Andrew" == Andrew Gierth  writes:

 Andrew> I'm going to pull all this together and commit it shortly.

Here's the patch with my edits (more comments and the while/if change).

I'll commit this in due course unless I hear otherwise.

-- 
Andrew (irc:RhodiumToad)

>From 9cc81cea6de41140fe361bff375190bfdd188ae9 Mon Sep 17 00:00:00 2001
From: Andrew Gierth 
Date: Tue, 6 Nov 2018 14:19:40 +
Subject: [PATCH] Optimize nested ConvertRowtypeExpr nodes.

A ConvertRowtypeExpr is used to translate a whole-row reference of a
child to that of a parent. The planner produces nested
ConvertRowtypeExpr while translating whole-row reference of a leaf
partition in a multi-level partition hierarchy. Executor then
translates the whole-row reference from the leaf partition into all
the intermediate parent's whole-row references before arriving at the
final whole-row reference. It could instead translate the whole-row
reference from the leaf partition directly to the top-most parent's
whole-row reference skipping any intermediate translations.

Ashutosh Bapat, with tests by Kyotaro Horiguchi and some
editorialization by me. Reviewed by Andres Freund, Pavel Stehule,
Kyotaro Horiguchi, Dmitry Dolgov.
---
 src/backend/optimizer/util/clauses.c  | 46 +++
 src/test/regress/expected/inherit.out | 18 ++
 src/test/regress/sql/inherit.sql  |  5 
 3 files changed, 69 insertions(+)

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 21bf5dea9c..d13c3ac895 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3716,6 +3716,52 @@ eval_const_expressions_mutator(Node *node,
 	  context);
 			}
 			break;
+		case T_ConvertRowtypeExpr:
+			{
+ConvertRowtypeExpr *cre = castNode(ConvertRowtypeExpr, node);
+Node		   *arg;
+ConvertRowtypeExpr *newcre;
+
+arg = eval_const_expressions_mutator((Node *) cre->arg,
+	 context);
+
+newcre = makeNode(ConvertRowtypeExpr);
+newcre->resulttype = cre->resulttype;
+newcre->convertformat = cre->convertformat;
+newcre->location = cre->location;
+
+/*
+ * In case of a nested ConvertRowtypeExpr, we can convert the
+ * leaf row directly to the topmost row format without any
+ * intermediate conversions. (This works because
+ * ConvertRowtypeExpr is used only for child->parent
+ * conversion in inheritance trees, which works by exact match
+ * of column name, and a column absent in an intermediate
+ * result can't be present in the final result.)
+ *
+ * No need to check more than one level deep, because the
+ * above recursion will have flattened anything else.
+ */
+if (arg != NULL && IsA(arg, ConvertRowtypeExpr))
+{
+	ConvertRowtypeExpr *argcre = castNode(ConvertRowtypeExpr, arg);
+
+	arg = (Node *) argcre->arg;
+
+	/*
+	 * Make sure an outer implicit conversion can't hide an
+	 * inner explicit one.
+	 */
+	if (newcre->convertformat == COERCE_IMPLICIT_CAST)
+		newcre->convertformat = argcre->convertformat;
+}
+
+newcre->arg = (Expr *) arg;
+
+if (arg != NULL && IsA(arg, Const))
+	return ece_evaluate_expr((Node *) newcre);
+return (Node *) newcre;
+			}
 		default:
 			break;
 	}
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index d768e5df2c..1e00c849f3 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -764,6 +764,8 @@ NOTICE:  drop cascades to table c1
 -- tables. See the pgsql-hackers thread beginning Dec. 4/04
 create table base (i integer);
 create table derived () inherits (base);
+create table more_derived (like derived, b int) inherits (derived);
+NOTICE:  merging column "i" with inherited definition
 insert into derived (i) values (0);
 select derived::base from derived;
  derived 
@@ -777,6 +779,22 @@ select NULL::derived::base;
  
 (1 row)
 
+-- remove redundant conversions.
+explain (verbose on, costs off) select row(i, b)::more_derived::derived::base from more_derived;
+QUERY PLAN 
+---
+ Seq Scan on public.more_derived
+   Output: (ROW(i, b)::more_derived)::base
+(2 rows)
+
+explain (verbose on, costs off) select (1, 2)::more_derived::derived::base;
+  QUERY PLAN   
+---
+ Result
+   Output: '(1)'::base
+(2 rows)
+
+drop table more_derived;
 drop table derived;
 drop table base;
 create table p1(ff1 int);
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index e8b6448f3c..afc72f47bc 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -237,9 +237,14 @@ drop table p1 cascade;
 -- tables. See the pgsql-hackers thread beginning Dec. 4/04
 create table base (i integer);
 create table derived () inherits (base);
+create table more_derived (like de

Doc patch on psql output formats

2018-11-06 Thread Daniel Verite

  Hi,

psql's documentation has this mention about output formats:
 "Unique abbreviations are allowed. (That would mean one letter is enough.)"

but "one letter is enough" is not true since 9.3 that added
"latex-longtable" sharing the same start as "latex", and then 
9.5 added "asciidoc" with the same first letter as "aligned".

When a non-unique abbreviation is used, psql uses the first
match in an arbitrary order defined in do_pset() by
a cascade of pg_strncasecmp().
(the recent commit add9182e reorders them alphabetically
but it turns out that it doesn't change the relative positions
of  "aligned" / "asciidoc", or "latex" / "latex-longtables"
so \pset format a and \pset format l will continue to
select "aligned" and "latex" as before).

Anyway, "Unique abbreviations are allowed" deserves to
be changed as well.

PFA a doc patch to say simply "Abbreviations are allowed".


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index a1ca940..7dd934f 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2589,8 +2589,7 @@ lo_import 152801
   latex (uses tabular),
   latex-longtable, troff-ms,
   unaligned, or wrapped.
-  Unique abbreviations are allowed.  (That would mean one letter
-  is enough.)
+  Abbreviations are allowed.
   
 
   unaligned format writes all columns of a 
row on one

Re: plruby: rb_iterate symbol clash with libruby.so

2018-11-06 Thread Tom Lane

Pavel Raiskup  writes:
> On Monday, November 5, 2018 9:06:41 PM CET Robert Haas wrote:
>> On Sat, Nov 3, 2018 at 2:20 PM Tom Lane  wrote:
 Is it realistic we could rename red-black tree methods from 'rb_*' to e.g.
 'rbt_*' to avoid this clash?

>>> I don't have a huge objection to renaming the rbtree functions, other
>>> than the precedent it sets ...

>> Maybe prefixing with pg_ would better than rb_ to rbt_. ...
>> It's certainly not very surprising that Ruby has symbols starting with
>> rb_...

> I now realized that there's rb_block_call() alternative for rb_iterate()
> Ruby call -- which fortunately doesn't collide with PostgreSQL internals.
> It means that for sufficiently new Ruby there exists some solution (not
> that something similar can not re-appear elsewhere).

Yeah.  The long and short of this is that we're trampling on namespace
that reasonably belongs to Ruby --- if they had some functions named
"pg_something" and complained about a collision with libpq, would we
change?  Nope.  So really we should rename these.

After looking at the code a bit I like the idea of s/rb/rbt/g
better than s/rb/pg_rb/g.  The latter seems verbose, and it would
also open the question of whether we need to rename rbtree.h/.c,
which would be an additional level of complication I don't want.
The rbt approach will allow skipping some other renamings that
would be needed for consistency if we use pg_rb.

Barring objections I'll go make this happen shortly.

It's too late for this week's releases, but Pavel could pick up the commit
once it happens and carry it as a Fedora patch until the next releases.

regards, tom lane

Re: Doc patch on psql output formats

2018-11-06 Thread Tom Lane

"Daniel Verite"  writes:
> psql's documentation has this mention about output formats:
>  "Unique abbreviations are allowed. (That would mean one letter is enough.)"

> but "one letter is enough" is not true since 9.3 that added
> "latex-longtable" sharing the same start as "latex", and then 
> 9.5 added "asciidoc" with the same first letter as "aligned".

Yeah, that text has clearly outstayed its welcome.

> When a non-unique abbreviation is used, psql uses the first
> match in an arbitrary order defined in do_pset() by
> a cascade of pg_strncasecmp().

Ugh.  Should we not fix the code so that it complains if there's
not a unique match?  I would bet that the code was also written
on the assumption that any abbrevation must be unique.

regards, tom lane

Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

2018-11-06 Thread Alvaro Herrera

Looking over the stuff in gram.y (to make sure there's nothing that
could be lost), I noticed that we're losing the COLLATE clause when they
are added to columns in partitions.  I would expect part1 to end up with
collation es_CL, or alternatively that the command throws an error:

55462 10.6 138851=# create table part (a text collate "en_US") partition by 
range (a);
CREATE TABLE
Duración: 23,511 ms
55462 10.6 138851=# create table part1  partition of part (a collate "es_CL") 
for values from ('ca') to ('cu');
CREATE TABLE
Duración: 111,551 ms
55462 10.6 138851=# \d part
   Tabla «public.part»
 Columna │ Tipo │ Collation │ Nullable │ Default 
─┼──┼───┼──┼─
 a   │ text │ en_US │  │ 
Partition key: RANGE (a)
Number of partitions: 1 (Use \d+ to list them.)

55462 10.6 138851=# \d part1
  Tabla «public.part1»
 Columna │ Tipo │ Collation │ Nullable │ Default 
─┼──┼───┼──┼─
 a   │ text │ en_US │  │ 
Partition of: part FOR VALUES FROM ('ca') TO ('cu')


(This case is particularly bothersome because the column is the
partition key, so the partition range bounds would differ depending on
which collation is used to compare.  I assume we'd always want to use
the parent table's collation; so there's even a stronger case for
raising an error if it doesn't match the parent's.  However, this case
could arise for other columns too, where it's not *so* bad, but still
not completely correct I think.)

This happens on unpatched code, and doesn't seem covered by any tests.
However, this seems an independent bug that isn't affected by this
patch.

The only other things there are deferrability markers, which seem to be
propagated in a relatively sane fashion (but no luck if you want to add
foreign keys with mismatching deferrability than the parent's -- you
just get a dupe, and there's no way in the grammar to change the flags
for the existing constraint).  But you can add a UNIQUE DEFERRED
constraint in a partition that wasn't there in the parent, for example.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_dump multi VALUES INSERT

2018-11-06 Thread Surafel Temesgen

hi,

On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO  wrote:

> Patch does not seem to apply anymore, could you rebase?
>
>
The attached patch is a rebased version and work by ‘inserts=100’ as
Stephen suggest


regards
Surafel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 790e81c32c..6cc15de2d8 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -789,6 +789,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --inserts=100
+  
+   
+Dump data as 100 values INSERT commands (rather than COPY).
+This will make the dump file smaller than --inserts and faster to reload but lack
+single row data lost on error while reloading rather entire 100 rows data lost.
+   
+  
+ 
+
  
   --load-via-partition-root
   
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index c51a130f43..c08cc80732 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -326,6 +326,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --inserts=100
+  
+   
+Dump data as 100 values INSERT commands (rather than COPY).
+This will make the dump file smaller than --inserts and faster to reload but lack
+single row data lost on error while reloading rather entire 100 rows data lost.
+   
+  
+ 
+
  
   --load-via-partition-root
   
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index ba798213be..2fd48cf2f2 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -72,6 +72,7 @@ typedef struct _restoreOptions
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_hundred;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
@@ -145,6 +146,7 @@ typedef struct _dumpOptions
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
 	int			dump_inserts;
+	int			dump_inserts_hundred;
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c8d01ed4a4..6df1fc2409 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -359,7 +359,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, &dopt.enable_row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"if-exists", no_argument, &dopt.if_exists, 1},
-		{"inserts", no_argument, &dopt.dump_inserts, 1},
+		{"inserts", optional_argument, NULL, 8},
 		{"lock-wait-timeout", required_argument, NULL, 2},
 		{"no-tablespaces", no_argument, &dopt.outputNoTablespaces, 1},
 		{"quote-all-identifiers", no_argument, "e_all_identifiers, 1},
@@ -562,6 +562,21 @@ main(int argc, char **argv)
 dosync = false;
 break;
 
+			case 8:			/* inserts option */
+if (optarg)
+{
+	if (atoi(optarg) != 100)
+		{
+		write_msg(NULL, "insert values must be 100\n");
+		exit_nicely(1);
+		}
+	dopt.dump_inserts_hundred = 1;
+
+}
+else
+	dopt.dump_inserts = 1;
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -609,9 +624,9 @@ main(int argc, char **argv)
 		exit_nicely(1);
 	}
 
-	if (dopt.dump_inserts && dopt.oids)
+	if ((dopt.dump_inserts || dopt.dump_inserts_hundred) && dopt.oids)
 	{
-		write_msg(NULL, "options --inserts/--column-inserts and -o/--oids cannot be used together\n");
+		write_msg(NULL, "options --inserts --column-inserts --inserts=100 and -o/--oids cannot be used together\n");
 		write_msg(NULL, "(The INSERT command cannot set OIDs.)\n");
 		exit_nicely(1);
 	}
@@ -619,8 +634,9 @@ main(int argc, char **argv)
 	if (dopt.if_exists && !dopt.outputClean)
 		exit_horribly(NULL, "option --if-exists requires option -c/--clean\n");
 
-	if (dopt.do_nothing && !(dopt.dump_inserts || dopt.column_inserts))
-		exit_horribly(NULL, "option --on-conflict-do-nothing requires option --inserts or --column-inserts\n");
+	if (dopt.do_nothing && !(dopt.dump_inserts || dopt.column_inserts ||
+		dopt.dump_inserts_hundred))
+		exit_horribly(NULL, "option --on-conflict-do-nothing requires option --inserts , --column-inserts or --inserts=100\n");
 
 	/* Identify archive format to emit */
 	archiveFormat = parseArchiveFormat(format, &archiveMode);
@@ -889,6 +905,7 @@ main(int argc, char **argv)
 	ropt->use_setsessauth = dopt.use_setsessauth;
 	ropt->disable_dollar_quoting = dopt.disable_dollar_quoting;
 	ropt->dump_inserts = dopt.dump_inserts;
+	ropt->dump_inserts_hundred = dopt.dump_inserts_hundred;
 	ropt->no_comments = dopt.no_comments;
 	ropt->no_publications = dopt.no_publications;
 	ropt->no_security_labels = dopt.no_security_labels;
@@ -980,6 +997,7 @@ help(const char *progname)
 	printf(_("  --exclude-table-data=TABLE   do NOT dump data for the named table(s)\n"));
 	printf(_("  --if-exists

Re: pread() and pwrite()

2018-11-06 Thread Tom Lane

Thomas Munro  writes:
> On Tue, Nov 6, 2018 at 6:23 AM Tom Lane  wrote:
>> What I suggest is that we *not* try to make this a completely transparent
>> substitute.  Instead, make the functions exported by src/port/ be
>> "pg_pread" and "pg_pwrite", ...

> OK.  But since we're using this from both fd.c and xlog.c, I put that
> into src/include/port.h.

LGTM.  I didn't bother to run an actual test cycle, since it's not
materially different from the previous version as far as portability
is concerned.

regards, tom lane

Re: Special role for subscriptions

2018-11-06 Thread Evgeniy Efimkin

Hi!
As a first step I suggest we allow CREATE SUBSCRIPTION for table owner only.

03.11.2018, 19:20, "Stephen Frost" :
> Greetings,
>
> * Evgeniy Efimkin (efim...@yandex-team.ru) wrote:
>>  In postgresql 10 and 11 only superuser can create/alter subscriptions.
>>  If there was a special role (like pg_monitor), it would be more easy to 
>> grant control on subscriptions.
>>  I can make a patch if there are no objections against it.
>
> I think the short answer is 'yes, we should let non-superusers do that',
> but the longer answer is:
>
> What level of access makes sense for managing subscriptions? Should
> there be a way to say "user X is allowed to create a subscription for
> remote system Y, but only for tables that exist in schema Q"?
>
> My general feeling is 'yes', though, of course, I don't want to say that
> we have to have all of that before we move forward with allowing
> non-superusers to create subscriptions, but I do think we want to make
> sure that we have a well thought-out path for how to get from where we
> are now to a system which has a lot more granularity, and to do our best
> to try avoiding any paths that might paint us into a corner.
>
> Thanks!
>
> Stephen

Re: plruby: rb_iterate symbol clash with libruby.so

2018-11-06 Thread Pavel Raiskup

On Tuesday, November 6, 2018 3:49:46 PM CET Tom Lane wrote:
> Pavel Raiskup  writes:
> > On Monday, November 5, 2018 9:06:41 PM CET Robert Haas wrote:
> >> On Sat, Nov 3, 2018 at 2:20 PM Tom Lane  wrote:
>  Is it realistic we could rename red-black tree methods from 'rb_*' to 
>  e.g.
>  'rbt_*' to avoid this clash?
> 
> >>> I don't have a huge objection to renaming the rbtree functions, other
> >>> than the precedent it sets ...
> 
> >> Maybe prefixing with pg_ would better than rb_ to rbt_. ...
> >> It's certainly not very surprising that Ruby has symbols starting with
> >> rb_...
> 
> > I now realized that there's rb_block_call() alternative for rb_iterate()
> > Ruby call -- which fortunately doesn't collide with PostgreSQL internals.
> > It means that for sufficiently new Ruby there exists some solution (not
> > that something similar can not re-appear elsewhere).
> 
> Yeah.  The long and short of this is that we're trampling on namespace
> that reasonably belongs to Ruby --- if they had some functions named
> "pg_something" and complained about a collision with libpq, would we
> change?  Nope.  So really we should rename these.
> 
> After looking at the code a bit I like the idea of s/rb/rbt/g
> better than s/rb/pg_rb/g.  The latter seems verbose, and it would
> also open the question of whether we need to rename rbtree.h/.c,
> which would be an additional level of complication I don't want.
> The rbt approach will allow skipping some other renamings that
> would be needed for consistency if we use pg_rb.
> 
> Barring objections I'll go make this happen shortly.
> 
> It's too late for this week's releases, but Pavel could pick up the commit
> once it happens and carry it as a Fedora patch until the next releases.

Thanks for looking at that, and no problem with this release.  To make
plruby work against supported PostgreSQL versions we'll have to first fix
quite a few _other_ legacy problems first.

Pavel

Re: pread() and pwrite()

2018-11-06 Thread Jesper Pedersen


Hi,

On 11/5/18 9:10 PM, Thomas Munro wrote:

On Tue, Nov 6, 2018 at 5:07 AM Alvaro Herrera  wrote:

Please remove Tell from line 18 in fd.h.  To Küssnacht with him!


Thanks, done.  But what is this arrow sticking through my Mac laptop's
screen...?

On Tue, Nov 6, 2018 at 6:23 AM Tom Lane  wrote:

Alvaro Herrera  writes:

On 2018-Nov-04, Thomas Munro wrote:

Here's a patch to add Windows support by supplying
src/backend/port/win32/pread.c.  Thoughts?



Hmm, so how easy is to detect that somebody runs read/write on fds where
pread/pwrite have occurred?  I guess for data files it's easy to detect
since you'd quickly end up with corrupted files, but what about other
kinds of files?  I wonder if we should be worrying about using this
interface somewhere other than fd.c and forgetting about the limitation.


Yeah.  I think the patch as presented is OK; it uses pread/pwrite only
inside fd.c, which is a reasonably non-leaky abstraction.  But there's
definitely a hazard of somebody submitting a patch that depends on
using pread/pwrite elsewhere, and then that maybe not working.

What I suggest is that we *not* try to make this a completely transparent
substitute.  Instead, make the functions exported by src/port/ be
"pg_pread" and "pg_pwrite", and inside fd.c we'd write something like

#ifdef HAVE_PREAD
#define pg_pread pread
#endif

and then refer to pg_pread/pg_pwrite in the body of that file.  That
way, if someone refers to pread and expects standard functionality
from it, they'll get a failure on platforms not supporting it.


OK.  But since we're using this from both fd.c and xlog.c, I put that
into src/include/port.h.


FWIW, I tested the given patches on HPUX 10.20; they compiled cleanly
and pass the core regression tests.




Passes check-world, and includes the feedback on this thread.

New status: Ready for Committer

Best regards,
 Jesper

Re: jsonpath

2018-11-06 Thread Tomas Vondra

On 11/6/18 3:31 PM, Nikita Glukhov wrote:

On 29.10.2018 2:20, Tomas Vondra wrote:>

>
> ...>

Thank you for your review.

1) There are no docs, which makes it pretty much non-committable for
now. I know there is [1] and it was a good intro for the review, but we
need something as a part of our sgml docs.

I think that jsonpath part of documentation can be extracted from [2] and
added to the patch set.

Yes, please let's do that. The patch could not get into RFC without 
docs, so let's deal with it now.

2) 0001 says this:

 *typmod = -1; /* TODO implement FF1, ..., FF9 */

but I have no idea what FF1 or FF9 are. I guess it needs to be
implemented, or explained better.

FF1-FF9 are standard datetime template fields used for specifying of fractional
seconds.  FF3/FF6 are analogues of our MS/US.  I decided simply to implement
this feature (see patch 0001, I also can supply it in the separate patch).

Understood. Thanks for the explanation.

But FF7-FF9 are questionable since the maximal supported precision is only 6.
They are optional by the standard:

   95) Specifications for Feature F555, “Enhanced seconds precision”:
 d) Subclause 9.44, “Datetime templates”:
   i) Without Feature F555, “Enhanced seconds precision”,
  a  shall not be FF7, FF8 or FF9.

So I decided to allow FF7-FF9 only on the output in to_char().

H, isn't that against the standard then? I mean, if our precision is 
only 6, that probably means we don't have the F555 feature, which means 
FF7-9 should not be available.

It also seems like a bit surprising behavior that we only allow those 
fields in to_char() and not in other places.

4) There probably should be .gitignore rule for jsonpath_gram.h, just
like for other generated header files.

I see 3 rules in /src/backend/utils/adt/.gitignore:

/jsonpath_gram.h
/jsonpath_gram.c
/jsonpath_scan.c

But there's a symlink in src/include/utils/jsonpath_gram.h and that's 
not in .gitignore.

FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the
codebase works with "strict" flags passed around, and it's easy to
forget to negate the flag somewhere (at least that's my experience).

Jsonpath lax/strict mode flag is used only in executeJsonPath() where it is
saved  in "laxMode" field.  New "strict" flag passed to datetime functions
is unrelated to jsonpath.

OK.

7) I see src/include/utils/jsonpath_json.h adds support for plain json
by undefining various jsonb macros and redirecting them to the json
variants. I find that rather suspicious - imagine you're investigating
something in code using those jsonb macros, and wondering why it ends up
calling the json stuff. I'd be pretty confused ...

I agree, this is rather simple but doubtful solution.  That's why json support
was in a separate patch until the 18th version of the patches.

But if we do not want to compile jsonpath.c twice with different definitions,
then we need some kind of run-time wrapping over json strings and jsonb
containers, which seems a bit harder to implement.

To simplify debugging I can also suggest to explicitly preprocess jsonpath.c
into jsonpath_json.c using redefinitions from jsonpath_json.h before its
compilation.

Not sure what's the right solution. But I agree the current approach 
probably is not it.

9) It's generally a good idea to make the individual pieces committable
separately, but that means e.g. the regression tests have to pass after
each patch. At the moment that does not seem to be the case for 0002,
see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY,
not sure if that's related.

This should definitely be a bug in json support, but I can't reproduce
it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY.  Could you provide
a stack trace at least?

I'll try.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Tom Lane

There's a thread on the ODBC list[1] complaining about the fact that
it's possible to set client_min_messages to FATAL or PANIC, because
that makes ODBC misbehave.  This is not terribly surprising, because
doing so arguably breaks the frontend protocol.  The simple-query
section says this:

In the event of an error, ErrorResponse is issued followed by
ReadyForQuery.

and the extended-query section says this:

Therefore, an Execute phase is always terminated by the appearance of
exactly one of these messages: CommandComplete, EmptyQueryResponse (if
the portal was created from an empty query string), ErrorResponse, or
PortalSuspended.

and both of those are lies if an ERROR response gets suppressed thanks to
client_min_messages being set too high.  It seems that libpq+psql manages
to survive the case (probably because psql is too stupid to notice that
anything is wrong), but I don't find it unreasonable that other clients
get hopelessly confused.

Hence, I propose that we should disallow setting client_min_messages
any higher than ERROR, and that this probably even amounts to a
back-patchable bug fix.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/EE586BE92A4AFB45B03310C2A0C0565D6D0EFC17%40G01JPEXMBKW03

Issue with v11.0 within jsonb_plperl tests in 32bit on AIX

2018-11-06 Thread REIX, Tony

Hi,


I'm building PostgreSQL v11.0 on AIX (6.1, 7.1, & 7.2).


The 2 new tests jsonb_plperl fail in 32bit, not in 64bit.

All other tests are OK.


 - Same issue for all 3 versions of AIX

 - Version of Perl : 5.24.0

 - Version of Python: 2.7.12



I have traced details, but it is still unclear to me what is the root cause.

I'd like to have some help.

Does someone has already built and tested PostgreSQL v11.0 in 32bit on 
Linux/x86_64 ?


Looking at details, it seems that some control of the conversion of infinity to 
json is not done in 32bit.


Do you think that this issue comes from inside PostgreSQL code of from the AIX 
tools used by PostgreSQL ?


Thanks


Tony



1) Traces of the execution of the tests:


for extra in contrib/jsonb_plperl; do gmake -C '../..'/$extra 
DESTDIR='/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit'/tmp_install 
install 
>>'/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit'/tmp_install/log/install.log
 || exit; done
PATH="/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit/tmp_install/opt/freeware/bin:$PATH"
 
LIBPATH="/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit/tmp_install/opt/freeware/lib:$LIBPATH"
 ../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. 
--bindir= --dbname=contrib_regression jsonb_plperl jsonb_plperlu
== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 60848 with PID 22610390
== creating database "contrib_regression" ==
CREATE DATABASE
ALTER DATABASE
== running regression test queries==
test jsonb_plperl ... FAILED
test jsonb_plperlu... FAILED
== shutting down postmaster   ==

==
 2 of 2 tests failed.
==


2) Details of the trace

64bit : OK


32bit : 2 failures :
== running regression test queries==
test jsonb_plperl ... FAILED
test jsonb_plperlu... FAILED


# diff ./contrib/jsonb_plperl/expected/jsonb_plperl.out 
./contrib/jsonb_plperl/results/jsonb_plperl.out
68,70c68,70
<  roundtrip
< ---
<  1
---
>   
>   
>roundtrip
>  
> -
>   
> 0.00266336279953129
74,75c74,78
< ERROR:  cannot convert infinity to jsonb
< CONTEXT:  PL/Perl function "roundtrip"
---
>   
>   
> roundtrip
>   
> -
>  
> 0.00266336287858179
> (1 row)
>
etc



cd contrib/jsonb_plperl/

sql/jsonb_plperl.sql :

...
CREATE FUNCTION roundtrip(val jsonb, ref text = ) RETURNS jsonb
LANGUAGE plperl
TRANSFORM FOR TYPE jsonb
AS $$
# can't use Data::Dumper, but let's at least check for unexpected ref type
die 'unexpected '.(ref($_[0]) || 'not a').' reference'
if ref($_[0]) ne $_[1];
return $_[0];
$$;


SELECT roundtrip('null') is null;
SELECT roundtrip('1');
SELECT roundtrip('1E+131071');
SELECT roundtrip('-1');
SELECT roundtrip('1.2');
SELECT roundtrip('-1.2');
SELECT roundtrip('"string"');
SELECT roundtrip('"NaN"');
...



How to reproduce :

su pgstbf -c "(set -x; ulimit -a; cd 
/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit/contrib/jsonb_plperl || 
exit 1; \
   ulimit -f unlimited; \
   
PATH="/opt/freeware/src/packages/BUILD/postgre

Re: Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Andres Freund

Hi,

On 2018-11-06 11:19:40 -0500, Tom Lane wrote:
> Hence, I propose that we should disallow setting client_min_messages
> any higher than ERROR, and that this probably even amounts to a
> back-patchable bug fix.
> 
> Thoughts?

Seems reasonable. I do think it's probably sensible to backpatch,
although I wonder if we shouldn't clamp the value to ERROR at log
emission error time, rather than via guc.c, so we don't prevent old code
/ postgresql.conf that set client_min_messages to > ERROR.

Greetings,

Andres Freund

Re: Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Tom Lane

Andres Freund  writes:
> On 2018-11-06 11:19:40 -0500, Tom Lane wrote:
>> Hence, I propose that we should disallow setting client_min_messages
>> any higher than ERROR, and that this probably even amounts to a
>> back-patchable bug fix.
>> 
>> Thoughts?

> Seems reasonable. I do think it's probably sensible to backpatch,
> although I wonder if we shouldn't clamp the value to ERROR at log
> emission error time, rather than via guc.c, so we don't prevent old code
> / postgresql.conf that set client_min_messages to > ERROR.

Hm, do you really think there is any?  And if there is, wouldn't we be
breaking it anyway thanks to the behavioral change?

regards, tom lane

Re: Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Andres Freund

On 2018-11-06 11:37:40 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-11-06 11:19:40 -0500, Tom Lane wrote:
> >> Hence, I propose that we should disallow setting client_min_messages
> >> any higher than ERROR, and that this probably even amounts to a
> >> back-patchable bug fix.
> >> 
> >> Thoughts?
> 
> > Seems reasonable. I do think it's probably sensible to backpatch,
> > although I wonder if we shouldn't clamp the value to ERROR at log
> > emission error time, rather than via guc.c, so we don't prevent old code
> > / postgresql.conf that set client_min_messages to > ERROR.
> 
> Hm, do you really think there is any?

I'm not sure. But it sounds like it'd possibly slow adoption of the
minor releases if we said "hey, make sure that you nowhere set
client_min_messages > ERROR", even if it's not particularly meaningful
thing to do, as it'd still imply a fair bit of work for bigger
applications with not great standards.

> And if there is, wouldn't we be breaking it anyway thanks to the
> behavioral change?

Yea, possibly. I'd assume that it'd mostly have been set out of a
mistake, and never really noticed, because it's hard to notice the
consequences when things are ok.

Greetings,

Andres Freund

Re: Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Alvaro Herrera

On 2018-Nov-06, Andres Freund wrote:

> On 2018-11-06 11:37:40 -0500, Tom Lane wrote:

> > Hm, do you really think there is any?
> 
> I'm not sure. But it sounds like it'd possibly slow adoption of the
> minor releases if we said "hey, make sure that you nowhere set
> client_min_messages > ERROR", even if it's not particularly meaningful
> thing to do, as it'd still imply a fair bit of work for bigger
> applications with not great standards.

I agree -- it seems better to have a benign no-op and prevent this kind
of silly rationale from preventing upgrades.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

2018-11-06 Thread Alvaro Herrera

Here's the patch I intend to push to branches 10 and 11.  The patch in
master is almost identical -- the only difference is that
ColumnDef->is_from_parent is removed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1a74c06b4ca66d10669a62c7877dd5d6525c5580 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 5 Nov 2018 12:37:37 -0300
Subject: [PATCH] Revise attribute handling code on partition creation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The original code to propagate NOT NULL and default expressions
specified when creating a partition was mostly copied from typed-tables
creation, but not being a great match it contained some duplicity,
inefficiency and bugs.  This rewrite makes it simpler:

1. instead of checking for duplicate column names in its own block,
reuse the original one that already did that;

2. instead of concatenating the list of columns from parent and the one
declared in the partition and scanning the result to (incorrectly)
propagate defaults and not-null constraints, just scan the latter
searching the former for a match, and merging sensibly.  This works
because we know the list in the parent is already correct and there can
only be one parent.

This fixes the bug that NOT NULL constraints declared in the parent
table would not be honored in the partition.  There are other issues
nearby, such as DEFAULT values declared in the partition that are not
honored if inserting through the parent; but that's a larger problem
that would not become backpatched.  Also, collation declarations in the
partition are silently lost, but fixing this seems a potential landmine.

This rewrite makes ColumnDef->is_from_parent unused, so it's removed
on branch master; on released branches, it's kept as an unused field in
order not to cause ABI incompatibilities.

Amit Langote wrote a less invasive fix for this, but while I kept the
test he added, I ended up not using his code.  Ashutosh Bapat reviewed
Amit's fix.

Author: Álvaro Herrera, Amit Langote
Reviewed-by: Ashutosh Bapat, Amit Langote
Reported-by: Jürgen Strobel (bug #15212)
Discussion: https://postgr.es/m/152746742177.1291.9847032632907407...@wrigleys.postgresql.org
---
 src/backend/commands/sequence.c|  1 -
 src/backend/commands/tablecmds.c   | 99 ++
 src/backend/nodes/equalfuncs.c |  1 -
 src/backend/nodes/makefuncs.c  |  1 -
 src/backend/parser/gram.y  |  4 --
 src/backend/parser/parse_utilcmd.c |  2 -
 src/include/nodes/parsenodes.h |  2 +-
 src/test/regress/expected/create_table.out | 16 +
 src/test/regress/sql/create_table.sql  |  8 +++
 9 files changed, 71 insertions(+), 63 deletions(-)

diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 2f3ee46236..74cb2e61bc 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -172,7 +172,6 @@ DefineSequence(ParseState *pstate, CreateSeqStmt *seq)
 		coldef->is_local = true;
 		coldef->is_not_null = true;
 		coldef->is_from_type = false;
-		coldef->is_from_parent = false;
 		coldef->storage = 0;
 		coldef->raw_default = NULL;
 		coldef->cooked_default = NULL;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 971a8721e1..b4a19771a2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1650,17 +1650,6 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 		MaxHeapAttributeNumber)));

 	/*
-	 * In case of a partition, there are no new column definitions, only dummy
-	 * ColumnDefs created for column constraints.  We merge them with the
-	 * constraints inherited from the parent.
-	 */
-	if (is_partition)
-	{
-		saved_schema = schema;
-		schema = NIL;
-	}
-
-	/*
 	 * Check for duplicate names in the explicit list of attributes.
 	 *
 	 * Although we might consider merging such entries in the same way that we
@@ -1673,17 +1662,19 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 		ListCell   *rest = lnext(entry);
 		ListCell   *prev = entry;

-		if (coldef->typeName == NULL)
-
+		if (!is_partition && coldef->typeName == NULL)
+		{
 			/*
 			 * Typed table column option that does not belong to a column from
 			 * the type.  This works because the columns from the type come
-			 * first in the list.
+			 * first in the list.  (We omit this check for partition column
+			 * lists; those are processed separately below.)
 			 */
 			ereport(ERROR,
 	(errcode(ERRCODE_UNDEFINED_COLUMN),
 	 errmsg("column \"%s\" does not exist",
 			coldef->colname)));
+		}

 		while (rest != NULL)
 		{
@@ -1717,6 +1708,17 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 	}

 	/*
+	 * In case of a partition, there are no new column definitions, only dummy
+	 * ColumnDe

Re: pg_dump multi VALUES INSERT

2018-11-06 Thread Alvaro Herrera

On 2018-Nov-06, Surafel Temesgen wrote:

> hi,
> 
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO  wrote:
> 
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100.  It shouldn't take much more code to handle
it that way, which makes more sense to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Doc patch on psql output formats

2018-11-06 Thread Daniel Verite

Tom Lane wrote:

> > When a non-unique abbreviation is used, psql uses the first
> > match in an arbitrary order defined in do_pset() by
> > a cascade of pg_strncasecmp().
> 
> Ugh.  Should we not fix the code so that it complains if there's
> not a unique match?  I would bet that the code was also written
> on the assumption that any abbrevation must be unique.

There's a backward compatibility issue, since a script issuing
  \pset format a
would now fail instead of setting the format to "aligned".
The doc patch is meant to describe the current behavior.

OTOH if we don't fail with non-unique matches, there is the risk that
in the future, a new format matching /^a/ will come alphabetically
before "aligned", and will be picked up instead of "aligned".

In both cases using abbreviations in scripts is not a great
idea. Anyway I will look into the changes required in do_pset to
implement the error on multiple matches, if that's the preferred
behavior.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: New Defects reported by Coverity Scan for PostgreSQL

2018-11-06 Thread Emre Hasegeli

> Good idea.  I haven't though about that, but now such names makes more
> sense to me.  I will prepare another patch to improve the naming and
> comments to be applied on top of my current patches.

The patch is attached to improve the comments and variable names.  I
explained the functions with the same signature on the file header.  I
can duplicate those on the function headers if you find that better.


geo-ops-comments-v00.patch
Description: Binary data

Re: New Defects reported by Coverity Scan for PostgreSQL

2018-11-06 Thread Alvaro Herrera

On 2018-Nov-06, Emre Hasegeli wrote:

> The patch is attached to improve the comments and variable names.  I
> explained the functions with the same signature on the file header.  I
> can duplicate those on the function headers if you find that better.

Surely the comment in line 3839 deserves an update :-)

This seems good material.  I would put the detailed conventions comment 
separately from the head of the file, like this (where I also changed
"Type1 *type1" into "Type1 *obj1", and a few "has" to "have")

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index e7c1160131..620d8a8c80 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -3,6 +3,16 @@
  * geo_ops.c
  *	  2D geometric operations
  *
+ * This module implements the geometric functions and operators.  The
+ * geometric types are (from simple to more complicated):
+ *
+ * - point
+ * - line
+ * - line segment
+ * - box
+ * - circle
+ * - polygon
+ *
  * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -25,6 +35,39 @@
 #include "utils/fmgrprotos.h"
 #include "utils/geo_decls.h"
 
+/*
+ * The functions to implement the types have the signature:
+ *
+ *		void type_construct(Type *result, ...);
+ *
+ * The functions to implement operators usually have signatures like:
+ *
+ *		void type1_operator_type2(Type *result, Type1 *obj1, Type2 *obj2);
+ *
+ * There are certain operators between different types.  The functions
+ * that return the intersection point between 2 types has signature:
+ *
+ *		bool type1_interpt_type2(Point *result, Type1 *obj1, Type2 *obj2);
+ *
+ * These return whether the two objects intersect, and set the intersection
+ * point to pre-allocated *result when it is not NULL.  Those functions may be
+ * used to implement multiple SQL level operators.  For example, determining
+ * whether two lines are parallel is done by checking whether they intersect.
+ *
+ * The functions that return whether an object contains another have
+ * the signature:
+ *
+ *		bool type1_contain_type2(Type1, *obj1, Type2 *obj2);
+ *
+ * The functions to get the closest point on an object to the second object
+ * have the signature:
+ *
+ *		float8 type1_closept_type2(Point *result, Type1 *obj1, Type2 *obj2);
+ *
+ * They return the shortest distance between the two objects, and set
+ * the closest point on the first object to the second object to pre-allocated
+ * *result when it is not NULL.
+ */
 
 /*
  * Internal routines
@@ -64,7 +107,7 @@ static int	lseg_crossing(float8 x, float8 y, float8 px, float8 py);
 static bool	lseg_contain_point(LSEG *lseg, Point *point);
 static float8 lseg_closept_point(Point *result, LSEG *lseg, Point *pt);
 static float8 lseg_closept_line(Point *result, LSEG *lseg, LINE *line);
-static float8 lseg_closept_lseg(Point *result, LSEG *l1, LSEG *l2);
+static float8 lseg_closept_lseg(Point *result, LSEG *on_lseg, LSEG *to_lseg);
 
 /* Routines for boxes */
 static inline void box_construct(BOX *result, Point *pt1, Point *pt2);
@@ -74,7 +117,7 @@ static float8 box_ar(BOX *box);
 static float8 box_ht(BOX *box);
 static float8 box_wd(BOX *box);
 static bool box_contain_point(BOX *box, Point *point);
-static bool box_contain_box(BOX *box1, BOX *box2);
+static bool box_contain_box(BOX *contains_box1, BOX *contained_box2);
 static bool box_contain_lseg(BOX *box, LSEG *lseg);
 static bool box_interpt_lseg(Point *result, BOX *box, LSEG *lseg);
 static float8 box_closept_point(Point *result, BOX *box, Point *point);
@@ -87,7 +130,7 @@ static float8 circle_ar(CIRCLE *circle);
 static void make_bound_box(POLYGON *poly);
 static void poly_to_circle(CIRCLE *result, POLYGON *poly);
 static bool lseg_inside_poly(Point *a, Point *b, POLYGON *poly, int start);
-static bool poly_contain_poly(POLYGON *polya, POLYGON *polyb);
+static bool poly_contain_poly(POLYGON *contains_poly, POLYGON *contained_poly);
 static bool plist_same(int npts, Point *p1, Point *p2);
 static float8 dist_ppoly_internal(Point *pt, POLYGON *poly);
 
@@ -651,12 +694,12 @@ box_contain(PG_FUNCTION_ARGS)
  * Check whether the box is in the box or on its border
  */
 static bool
-box_contain_box(BOX *box1, BOX *box2)
+box_contain_box(BOX *contains_box, BOX *contained_box)
 {
-	return FPge(box1->high.x, box2->high.x) &&
-		   FPle(box1->low.x, box2->low.x) &&
-		   FPge(box1->high.y, box2->high.y) &&
-		   FPle(box1->low.y, box2->low.y);
+	return FPge(contains_box->high.x, contained_box->high.x) &&
+		   FPle(contains_box->low.x, contained_box->low.x) &&
+		   FPge(contains_box->high.y, contained_box->high.y) &&
+		   FPle(contains_box->low.y, contained_box->low.y);
 }
 
 
@@ -1223,10 +1266,6 @@ line_interpt(PG_FUNCTION_ARGS)
 /*
  * Internal version of line_interpt
  *
- * Th

Re: valgrind issues on Fedora 28

2018-11-06 Thread Andres Freund

On 2018-11-06 18:24:55 +0100, Tomas Vondra wrote:
> I've recently updated to Fedora 28, and in that environment I get quite a
> few new valgrind issues (see the attached log).
> 
> Essentially, all the reports start with either
> 
> ==5971== Invalid read of size 32
> ==5971==at 0x5957EB1: __wcsnlen_avx2 (in /usr/lib64/libc-2.27.so)
> ==5971==by 0x589E871: wcsrtombs (in /usr/lib64/libc-2.27.so)
> ==5971==by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
> ==5971==by 0x97DD82: wchar2char (pg_locale.c:1641)


I think this isn't actually a bug, just a missing suppression. The avx2
code uses instructions to scan for 0 bytes in multiple bytes at the same
time. Therefore it can encounter a byte marked as undefined, even if it
never actually uses that value.

> or
> 
> ==5971== Conditional jump or move depends on uninitialised value(s)
> ==5971==at 0x5822123: __gconv_transform_internal_utf8 (in
> /usr/lib64/libc-2.27.so)
> ==5971==by 0x589E8A4: wcsrtombs (in /usr/lib64/libc-2.27.so)
> ==5971==by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
> ==5971==by 0x97DD82: wchar2char (pg_locale.c:1641)
> 
> or some other combination of that. In all cases the call stack is
> 
>   wchar2char > wcstombs > wcsrtombs > something

I think I came to the same conclusion here, but I'm not quite sure.

FWIW, I've supressed these on my valgrind animal a while ago.

Greetings,

Andres Freund

RE: Issue with v11.0 within jsonb_plperl tests in 32bit on AIX

2018-11-06 Thread REIX, Tony

Hummm The buildfarm does show that these tests are OK on AIX machines in 32bit, 
with GCC 4.8.1 .

Nuts !


Attached is the full diff between the expected results and the real results.


Cordialement,

Tony Reix

tony.r...@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net



De : REIX, Tony
Envoyé : mardi 6 novembre 2018 17:26
À : pgsql-hack...@postgresql.org
Cc : REIX, Tony
Objet : Issue with v11.0 within jsonb_plperl tests in 32bit on AIX


Hi,


I'm building PostgreSQL v11.0 on AIX (6.1, 7.1, & 7.2).


The 2 new tests jsonb_plperl fail in 32bit, not in 64bit.

All other tests are OK.


 - Same issue for all 3 versions of AIX

 - Version of Perl : 5.24.0

 - Version of Python: 2.7.12



I have traced details, but it is still unclear to me what is the root cause.

I'd like to have some help.

Does someone has already built and tested PostgreSQL v11.0 in 32bit on 
Linux/x86_64 ?


Looking at details, it seems that some control of the conversion of infinity to 
json is not done in 32bit.


Do you think that this issue comes from inside PostgreSQL code of from the AIX 
tools used by PostgreSQL ?


Thanks


Tony



1) Traces of the execution of the tests:


for extra in contrib/jsonb_plperl; do gmake -C '../..'/$extra 
DESTDIR='/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit'/tmp_install 
install 
>>'/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit'/tmp_install/log/install.log
 || exit; done
PATH="/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit/tmp_install/opt/freeware/bin:$PATH"
 
LIBPATH="/opt/freeware/src/packages/BUILD/postgresql-11.0/32bit/tmp_install/opt/freeware/lib:$LIBPATH"
 ../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. 
--bindir= --dbname=contrib_regression jsonb_plperl jsonb_plperlu
== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 60848 with PID 22610390
== creating database "contrib_regression" ==
CREATE DATABASE
ALTER DATABASE
== running regression test queries==
test jsonb_plperl ... FAILED
test jsonb_plperlu... FAILED
== shutting down postmaster   ==

==
 2 of 2 tests failed.
==


2) Details of the trace

64bit : OK


32bit : 2 failures :
== running regression test queries==
test jsonb_plperl ... FAILED
test jsonb_plperlu... FAILED


# diff ./contrib/jsonb_plperl/expected/jsonb_plperl.out 
./contrib/jsonb_plperl/results/jsonb_plperl.out
68,70c68,70
<  roundtrip
< ---
<  1
---
>   
>   
>roundtrip
>  
> -
>   
> 0.00266336279953129
74,75c74,78
< ERROR:  cannot convert infinity to jsonb
< CONTEXT:  PL/Perl function "roundtrip"
---
>   
>   
> roundtrip
>   
> -
>  
> 0.00266336287858179
> (1 row)
>
etc



cd contrib/jsonb_plperl/

sql/jsonb_plperl.sql :

...

Re: valgrind issues on Fedora 28

2018-11-06 Thread Alvaro Herrera

On 2018-Nov-06, Tomas Vondra wrote:

> Hi,
> 
> I've recently updated to Fedora 28, and in that environment I get quite a
> few new valgrind issues (see the attached log).

Hmm, related to https://postgr.es/m/20180220150838.GD18315@e733.localdomain ?
Apparently the right answer is to add local suppressions ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Issue with v11.0 within jsonb_plperl tests in 32bit on AIX

2018-11-06 Thread Alvaro Herrera

On 2018-Nov-06, REIX, Tony wrote:

> Hummm The buildfarm does show that these tests are OK on AIX machines in 
> 32bit, with GCC 4.8.1 .
> 
> Nuts !
> 
> 
> Attached is the full diff between the expected results and the real results.

Standard diffs are awful to read.  Would you mind using context or
unified diffs please (diff -c or diff -u)?  Also, keep in mind that the
regression tests save a file "regression.diffs" with all the diffs in
context format, so there's no need to create them yourself.

That said, this looks like there's an ABI mismatch somewhere, where
this:

< roundtrip
< -
<  {"1": {"2": [3, 4, 5]}, "2": 3}

ends up as

{
"1": {
"2": [

0.00531017013119972,

0.00531146529464635,

0.00530757980430645
]
},
"2": 
0.0026632835514017
}

Note that keys seem okay, but the values are corrupted.  (I wonder if
those values are actually representable in 32 bits.)  What catches my
attention is that the values for "3" in the two places where it appears
are different.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: New Defects reported by Coverity Scan for PostgreSQL

2018-11-06 Thread Emre Hasegeli

> Surely the comment in line 3839 deserves an update :-)

Done.

> This seems good material.  I would put the detailed conventions comment
> separately from the head of the file, like this (where I also changed
> "Type1 *type1" into "Type1 *obj1", and a few "has" to "have")

Looks better to me.  I found one more "has" and changed it.


geo-ops-comments-v02.patch
Description: Binary data

Re: valgrind issues on Fedora 28

2018-11-06 Thread Tomas Vondra


On 11/6/18 6:35 PM, Andres Freund wrote:

On 2018-11-06 18:24:55 +0100, Tomas Vondra wrote:

I've recently updated to Fedora 28, and in that environment I get quite a
few new valgrind issues (see the attached log).

Essentially, all the reports start with either

==5971== Invalid read of size 32
==5971==at 0x5957EB1: __wcsnlen_avx2 (in /usr/lib64/libc-2.27.so)
==5971==by 0x589E871: wcsrtombs (in /usr/lib64/libc-2.27.so)
==5971==by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
==5971==by 0x97DD82: wchar2char (pg_locale.c:1641)



I think this isn't actually a bug, just a missing suppression. The avx2
code uses instructions to scan for 0 bytes in multiple bytes at the same
time. Therefore it can encounter a byte marked as undefined, even if it
never actually uses that value.



OK, my thoughts exactly.


or

==5971== Conditional jump or move depends on uninitialised value(s)
==5971==at 0x5822123: __gconv_transform_internal_utf8 (in
/usr/lib64/libc-2.27.so)
==5971==by 0x589E8A4: wcsrtombs (in /usr/lib64/libc-2.27.so)
==5971==by 0x5834000: wcstombs (in /usr/lib64/libc-2.27.so)
==5971==by 0x97DD82: wchar2char (pg_locale.c:1641)

or some other combination of that. In all cases the call stack is

   wchar2char > wcstombs > wcsrtombs > something


I think I came to the same conclusion here, but I'm not quite sure.



Looking at gconv code at [1], it seems it's reading the data as int32 
values and using shifts to extract individual bytes. I'm pretty sure 
this confuses valgrind so it thinks it's accessing all the bytes.


[1] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=iconv/gconv_simple.c;h=506c92caf228d61f92986c39a2ddf9c0a134b4c0;hb=HEAD



FWIW, I've supressed these on my valgrind animal a while ago.



OK, I propose to add these suppressions into the current list.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: valgrind issues on Fedora 28

2018-11-06 Thread Tomas Vondra


On 11/6/18 6:35 PM, Alvaro Herrera wrote:

On 2018-Nov-06, Tomas Vondra wrote:


Hi,

I've recently updated to Fedora 28, and in that environment I get quite a
few new valgrind issues (see the attached log).


Hmm, related to https://postgr.es/m/20180220150838.GD18315@e733.localdomain ?
Apparently the right answer is to add local suppressions ...



Yep, seems like that. I'll check if the patch proposed by Aleksander 
resolves all the issues reported by valgrind.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_dump multi VALUES INSERT

2018-11-06 Thread Pavel Stehule

út 6. 11. 2018 v 18:18 odesílatel Alvaro Herrera 
napsal:

> On 2018-Nov-06, Surafel Temesgen wrote:
>
> > hi,
> >
> > On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO 
> wrote:
> >
> > > Patch does not seem to apply anymore, could you rebase?
> > >
> > The attached patch is a rebased version and work by ‘inserts=100’ as
> > Stephen suggest
>
> I thought the suggestion was that the number could be any positive
> integer, not hardcoded 100.  It shouldn't take much more code to handle
> it that way, which makes more sense to me.
>
>
+1

100 looks really strange

Pavel

-- 
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>

Re:Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-11-06 Thread chenhj

Hi,all
 
I  analyzed the btree block where lwlock deadlock occurred, as follows:

## insert process(ginInsertValue())

 644(root blkno)
  |
 7054(2. held LWLock:0x2aaac587ae64) rightlink>(3. 
Acquire LWLock:0x2aaab4009564，buffer = 2119038,blkno should be 9954)
   |
701(1. held LWLock:0x2aaab670dfe4)


The ginInsertValue() function above gets the lwlock in the order described in 
the README.

src/backend/access/gin/README
---
To avoid deadlocks, B-tree pages must always be locked in the same order:
left to right, and bottom to top.
...
-


## autovacuum process(ginScanToDelete())

 644(root blkno)
  |
...  9954(1. held LWLock:0x2aaab4009564) ...
   |
701(2. Acquire LWLock:0x2aaab670dfe4)

note：according to autovacuum's core 701's parent is 9954;while insert's core 
shows 701's parent is 7054, rightlink of 7054 is 9954！

However, ginScanToDelete() depth-first scans the btree and gets the EXCLUSIVE 
lock, which creates a deadlock.
Is the above statement correct? If so, deadlocks should easily happen.


static bool
ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
DataPageDeleteStack *parent, OffsetNumber myoff)
{
...
if (!isRoot)
LockBuffer(buffer, GIN_EXCLUSIVE);

...
for (i = FirstOffsetNumber; i <= GinPageGetOpaque(page)->maxoff; i++)
{
PostingItem *pitem = GinDataPageGetPostingItem(page, i);

if (ginScanToDelete(gvs, PostingItemGetBlockNumber(pitem), FALSE, 
me, i))
i--;
}
...
}

src/backend/access/gin/README
-
The previous paragraph's reasoning only applies to searches, and only to
posting trees. To protect from inserters following a downlink to a deleted
page, vacuum simply locks out all concurrent insertions to the posting tree,
by holding a super-exclusive lock on the parent page of subtree with deletable
pages. 
...
posting tree. To exclude interference with readers vacuum takes exclusive
locks in a depth-first scan in left-to-right order of page tuples. 
...
-

# stacks
## stack of insert：Acquire lock 0x2aaab4009564(blkno:9954) and hold 
0x2aaab670dfe4(blkno:701), 0x2aaac587ae64(blkno:7054)
-
(gdb) bt
#0  0x7fe11552379b in do_futex_wait.constprop.1 () from 
/lib64/libpthread.so.0
#1  0x7fe11552382f in __new_sem_wait_slow.constprop.0 () from 
/lib64/libpthread.so.0
#2  0x7fe1155238cb in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x0069d362 in PGSemaphoreLock (sema=0x2ac02958) at pg_sema.c:310
#4  0x007095ac in LWLockAcquire (lock=0x2aaab4009564, 
mode=LW_EXCLUSIVE) at lwlock.c:1233
#5  0x00490a32 in ginStepRight (buffer=6713826, index=, 
lockmode=lockmode@entry=2) at ginbtree.c:174
#6  0x00490c1c in ginFinishSplit (btree=btree@entry=0x7ffd81e4f950, 
stack=0x28eebf8, freestack=freestack@entry=1 '\001', 
buildStats=buildStats@entry=0x0) at ginbtree.c:701
#7  0x00491396 in ginInsertValue (btree=btree@entry=0x7ffd81e4f950, 
stack=, insertdata=insertdata@entry=0x7ffd81e4f940, 
buildStats=buildStats@entry=0x0)
at ginbtree.c:773
#8  0x0048fb42 in ginInsertItemPointers (index=, 
rootBlkno=rootBlkno@entry=644, items=items@entry=0x2916598, 
nitem=nitem@entry=353, buildStats=buildStats@entry=0x0)
at gindatapage.c:1907
#9  0x0048a9ea in ginEntryInsert (ginstate=ginstate@entry=0x28e7ef8, 
attnum=, key=42920440, category=, 
items=0x2916598, nitem=353, 
buildStats=buildStats@entry=0x0) at gininsert.c:214
#10 0x00496d94 in ginInsertCleanup (ginstate=ginstate@entry=0x28e7ef8, 
full_clean=full_clean@entry=0 '\000', fill_fsm=fill_fsm@entry=1 '\001', 
forceCleanup=forceCleanup@entry=0 '\000', stats=stats@entry=0x0) at 
ginfast.c:883
#11 0x00497727 in ginHeapTupleFastInsert 
(ginstate=ginstate@entry=0x28e7ef8, collector=collector@entry=0x7ffd81e4fe60) 
at ginfast.c:448
#12 0x0048b209 in gininsert (index=, 
values=0x7ffd81e4ff40, isnull=0x7ffd81e50040 "", ht_ctid=0x280d98c, 
heapRel=, checkUnique=, 
indexInfo=0x28b5aa8) at gininsert.c:522
#13 0x005ee8dd in ExecInsertIndexTuples (slot=slot@entry=0x28b5d58, 
tupleid=tupleid@entry=0x280d98c, estate=estate@entry=0x28b5288, 
noDupErr=noDupErr@entry=1 '\001', 
specConflict=specConflict@entry=0x7ffd81e5013b "", 
arbiterIndexes=arbiterIndexes@entry=0x28c6dd8) at execIndexing.c:386
#14 0x0060ccf5 in ExecInsert (canSetTag=1 '\001', estate=0x28b5288, 
onconflict=ONCONFLICT_UPDATE, arbiterIndexes=0x28c6dd8, planSlot=0x28b5d58, 
slot=0x28b5d58, mtstate=0x28b5628)
at nodeModifyTable.c:564
#15 ExecModifyTab

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Robert Haas

On Thu, Oct 25, 2018 at 4:26 PM Alvaro Herrera  wrote:
> Firstly, I took Robert's advice and removed the CONCURRENTLY keyword
> from the syntax.  We just do it that way always.  When there's a default
> partition, only that partition is locked with an AEL; all the rest is
> locked with ShareUpdateExclusive only.

Check.

> Then there are a few other implementation details worth mentioning:
>
> 3. parallel query: when a worker starts on a snapshot that has a
> partition descriptor cache, we need to transmit those partdescs from
> leader via shmem ... but we cannot send the full struct, so we just send
> the OID list of partitions, then rebuild the descriptor in the worker.
> Side effect: if a partition is detached right between the leader taking
> the partdesc and the worker starting, the partition loses its
> relpartbound column, so it's not possible to reconstruct the partdesc.
> In this case, we raise an error.  Hopefully this should be rare.

I don't think it's a good idea to for parallel query to just randomly
fail in cases where a non-parallel query would have worked.  I tried
pretty hard to avoid that while working on the feature, and it would
be a shame to see that work undone.

It strikes me that it would be a good idea to break this work into two
phases.  In phase 1, let's support ATTACH and CREATE TABLE ..
PARTITION OF without requiring AccessExclusiveLock.  In phase 2, think
about concurrency for DETACH (and possibly DROP).

I suspect phase 1 actually isn't that hard.  It seems to me that the
only thing we REALLY need to ensure is that the executor doesn't blow
up if a relcache reload occurs.  There are probably a few different
approaches to that problem, but I think it basically boils down to (1)
making sure that the executor is holding onto pointers to the exact
objects it wants to use and not re-finding them through the relcache
and (2) making sure that the relcache doesn't free and rebuild those
objects but rather holds onto the existing copies.  With this
approach, already-running queries won't take into account the fact
that new partitions have been added, but that seems at least tolerable
and perhaps desirable.

For phase 2, we're not just talking about adding stuff that need not
be used immediately, but about removing stuff which may already be in
use.  Your email doesn't seem to describe what we want the *behavior*
to be in that case.  Leave aside for a moment the issue of not
crashing: what are the desired semantics?  I think it would be pretty
strange if you had a COPY running targeting a partitioned table,
detached a partition, and the COPY continued to route tuples to the
detached partition even though it was now an independent table.  It
also seems pretty strange if the tuples just get thrown away.  If the
COPY isn't trying to send any tuples to the now-detached partition,
then it's fine, but if it is, then I have trouble seeing any behavior
other than an error as sane, unless perhaps a new partition has been
attached or created for that part of the key space.

If you adopt that proposal, then the problem of parallel query
behaving differently from non-parallel query goes away.  You just get
an error in both cases, probably to the effect that there is no
(longer) a partition matching the tuple you are trying to insert (or
update).

If you're not hacking on this patch set too actively right at the
moment, I'd like to spend some time hacking on the CREATE/ATTACH side
of things and see if I can produce something committable for that
portion of the problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Alvaro Herrera

On 2018-Nov-06, Robert Haas wrote:

> If you're not hacking on this patch set too actively right at the
> moment, I'd like to spend some time hacking on the CREATE/ATTACH side
> of things and see if I can produce something committable for that
> portion of the problem.

I'm not -- feel free to hack away.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: plruby: rb_iterate symbol clash with libruby.so

2018-11-06 Thread Tom Lane

I wrote:
> Yeah.  The long and short of this is that we're trampling on namespace
> that reasonably belongs to Ruby --- if they had some functions named
> "pg_something" and complained about a collision with libpq, would we
> change?  Nope.  So really we should rename these.

> Barring objections I'll go make this happen shortly.

Done.  I realized that the immediate problem, rb_iterate(), was only
added as of PG v10, which may explain why we hadn't heard complaints
about this till now.  So I've made the change only as far back as v10.
In principle we could change the rbtree code in 9.5/9.6 as well, but
I think that's more likely to create problems than fix any.

regards, tom lane

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Simon Riggs

On Tue, 6 Nov 2018 at 10:10, Robert Haas  wrote:

> With this
> approach, already-running queries won't take into account the fact
> that new partitions have been added, but that seems at least tolerable
> and perhaps desirable.
>

Desirable, imho. No data added after a query starts would be visible.

> If the
> COPY isn't trying to send any tuples to the now-detached partition,
> then it's fine, but if it is, then I have trouble seeing any behavior
> other than an error as sane, unless perhaps a new partition has been
> attached or created for that part of the key space.
>

Error in the COPY or in the DDL? COPY preferred. Somebody with insert
rights shouldn't be able to prevent a table-owner level action. People
normally drop partitions to save space, so it could be annoying if that was
interrupted.

Supporting parallel query shouldn't make other cases more difficult from a
behavioral perspective just to avoid the ERROR. The ERROR sounds annoying,
but not sure how annoying avoiding it would be.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-11-06 Thread Jesper Pedersen


On 11/4/18 5:07 AM, David Rowley wrote:

I've attached v13 which hopefully addresses these.



I ran a test against the INSERT case using a 64 hash partition.

Non-partitioned table: ~73k TPS
Master: ~25k TPS
0001: ~26k TPS
0001 + 0002: ~68k TPS

The profile of 0001 shows that almost all of 
ExecSetupPartitionTupleRouting() is find_all_inheritors(), hence the 
last test with a rebase of the original v1-0002 patch.


Best regards,
 Jesper

Re: Optimizing nested ConvertRowtypeExpr execution

2018-11-06 Thread Tom Lane

Andrew Gierth  writes:
> Here's the patch with my edits (more comments and the while/if change).

A couple minor thoughts:

* I dislike using castNode() in places where the code has just explicitly
verified the node type, which is true in both places where you used it
here.  The assertion is just bulking the code up to no purpose, and it
creates an unnecessary discrepancy between older and newer code.

* As you have it here, a construct such as
ConvertRowtypeExpr(ConvertRowtypeExpr(ConvertRowtypeExpr(Const)))
will laboriously perform each of the intermediate steps, which seems
like exactly the case we're trying to prevent at runtime.  I wonder
whether it is worth stripping off ConvertRowtypeExpr's before the
recursive eval_const_expressions_mutator call to prevent that.
You'd still want the check after the call, to handle situations where
something more complex got simplified to a ConvertRowtypeExpr; and this
would also complicate getting the convertformat right.  So perhaps it's
not worth the trouble, but I thought I'd mention it.

* I find the hardwired logic about how to merge distinct convertformat
values a bit troublesome.  Maybe use Min() instead?  Although there
is not currently any expectation about the ordering of that enum ...

regards, tom lane

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Robert Haas

On Tue, Nov 6, 2018 at 1:54 PM Simon Riggs  wrote:
> Error in the COPY or in the DDL? COPY preferred. Somebody with insert rights 
> shouldn't be able to prevent a table-owner level action. People normally drop 
> partitions to save space, so it could be annoying if that was interrupted.

Yeah, the COPY.

> Supporting parallel query shouldn't make other cases more difficult from a 
> behavioral perspective just to avoid the ERROR. The ERROR sounds annoying, 
> but not sure how annoying avoiding it would be.

In my view, it's not just a question of it being annoying, but of
whether anything else is even sensible.  I mean, you can avoid an
error when a user types SELECT 1/0 by returning NULL or 42, but that's
not usually how we roll around here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Simon Riggs

On Tue, 6 Nov 2018 at 10:56, Robert Haas  wrote:

> On Tue, Nov 6, 2018 at 1:54 PM Simon Riggs  wrote:
> > Error in the COPY or in the DDL? COPY preferred. Somebody with insert
> rights shouldn't be able to prevent a table-owner level action. People
> normally drop partitions to save space, so it could be annoying if that was
> interrupted.
>
> Yeah, the COPY.
>
> > Supporting parallel query shouldn't make other cases more difficult from
> a behavioral perspective just to avoid the ERROR. The ERROR sounds
> annoying, but not sure how annoying avoiding it would be.
>
> In my view, it's not just a question of it being annoying, but of
> whether anything else is even sensible.  I mean, you can avoid an
> error when a user types SELECT 1/0 by returning NULL or 42, but that's
> not usually how we roll around here.
>

If you can remove the ERROR without any other adverse effects, that sounds
great.

Please let us know what, if any, adverse effects would be caused so we can
discuss. Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Robert Haas

On Tue, Nov 6, 2018 at 2:01 PM Simon Riggs  wrote:
> If you can remove the ERROR without any other adverse effects, that sounds 
> great.
>
> Please let us know what, if any, adverse effects would be caused so we can 
> discuss. Thanks

Well, I've already written about this in two previous emails on this
thread, so I'm not sure exactly what you think is missing.  But to
state the problem again:

If you don't throw an error when a partition is concurrently detached
and then someone routes a tuple to that portion of the key space, what
DO you do?  Continue inserting tuples into the table even though it's
no longer a partition?  Throw tuples destined for that partition away?
 You can make an argument for both of those behaviors, but they're
both pretty strange.  The first one means that for an arbitrarily long
period of time after detaching a partition, the partition may continue
to receive inserts that were destined for its former parent.  The
second one means that your data can disappear into the ether.  I don't
like either of those things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Jonah H. Harris

Two options presented:

- Hard patch removes FATAL/PANIC from client_message_level_options in
guc.c, which also seems to make sense in regard to it's double-usage
with trace_recovery_messages.

- Soft patch keeps FATAL/PANIC in client_message_level_options but coerces
client_min_messages to ERROR when set to FATAL/PANIC and issues a warning.
This also exports error_severity from elog.c to retrieve severity -> text
mappings for the warning message.

-- 
Jonah H. Harris


client_min_messages_config_hard.patch
Description: Binary data


client_min_messages_config_soft.patch
Description: Binary data

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Alvaro Herrera

On 2018-Nov-06, Robert Haas wrote:

> If you don't throw an error when a partition is concurrently detached
> and then someone routes a tuple to that portion of the key space, what
> DO you do?  Continue inserting tuples into the table even though it's
> no longer a partition?

Yes -- the table was a partition when the query started, so it's still
a partition from the point of view of that query's snapshot.

> Throw tuples destined for that partition away?

Surely not.  (/me doesn't beat straw men anyway.)

> You can make an argument for both of those behaviors, but they're
> both pretty strange.  The first one means that for an arbitrarily long
> period of time after detaching a partition, the partition may continue
> to receive inserts that were destined for its former parent.

Not arbitrarily long -- only as long as those old snapshots live.  I
don't find this at all surprising.

(I think DETACH is not related to DROP in any way.  My proposal is that
DETACH can work concurrently, and if people want to drop the partition
later they can wait until snapshots/queries that could see that
partition are gone.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Robert Haas

On Tue, Nov 6, 2018 at 2:10 PM Alvaro Herrera  wrote:
> On 2018-Nov-06, Robert Haas wrote:
> > If you don't throw an error when a partition is concurrently detached
> > and then someone routes a tuple to that portion of the key space, what
> > DO you do?  Continue inserting tuples into the table even though it's
> > no longer a partition?
>
> Yes -- the table was a partition when the query started, so it's still
> a partition from the point of view of that query's snapshot.

I think it's important to point out that DDL does not in general
respect the query snapshot.  For example, you can query a table that
was created by a transaction not visible to your query snapshot.  You
cannot query a table that was dropped by a transaction not visible to
your query snapshot.  If someone runs ALTER FUNCTION on a function
your query uses, you get the latest committed version, not the version
that was current at the time your query snapshot was created.  So, if
we go with the semantics you are proposing here, we will be making
this DDL behave differently from pretty much all other DDL.

Possibly that's OK in this case, but it's easy to think of other cases
where it could cause problems.  To take an example that I believe was
discussed on-list a number of years ago, suppose that ADD CONSTRAINT
worked according to the model that you are proposing for ATTACH
PARTITION.  If it did, then one transaction could be concurrently
inserting a tuple while another transaction was adding a constraint
which the tuple fails to satisfy.  Once both transactions commit, you
have a table with a supposedly-valid constraint and a tuple inside of
it that doesn't satisfy that constraint.  Obviously, that's no good.

I'm not entirely sure whether there are any similar dangers in the
case of DETACH PARTITION.  I think it depends a lot on what can be
done with that detached partition while the overlapping transaction is
still active.  For instance, suppose you attached it to the original
table with a different set of partition bounds, or attached it to some
other table with a different set of partition bounds.  If you can do
that, then I think it effectively creates the problem described in the
previous paragraph with respect to the partition constraint.

IOW, we've got to somehow prevent this:

setup: partition is attached with bounds 1 to a million
S1: COPY begins
S2: partition is detached
S2: partition is reattached with bounds 1 to a thousand
S1: still-running copy inserts a tuple with value ten thousand

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Simon Riggs

On Tue, 6 Nov 2018 at 11:06, Robert Haas  wrote:

> On Tue, Nov 6, 2018 at 2:01 PM Simon Riggs  wrote:
> > If you can remove the ERROR without any other adverse effects, that
> sounds great.
> >
> > Please let us know what, if any, adverse effects would be caused so we
> can discuss. Thanks
>
> Well, I've already written about this in two previous emails on this
> thread, so I'm not sure exactly what you think is missing.  But to
> state the problem again:
>

I was discussing the ERROR in relation to parallel query, not COPY.

I didn't understand how that would be achieved.

Thanks for working on this.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Isaac Morland

On Tue, 6 Nov 2018 at 14:07, Jonah H. Harris  wrote:

> Two options presented:
>
> - Hard patch removes FATAL/PANIC from client_message_level_options in
> guc.c, which also seems to make sense in regard to it's double-usage
> with trace_recovery_messages.
>
> - Soft patch keeps FATAL/PANIC in client_message_level_options but coerces
> client_min_messages to ERROR when set to FATAL/PANIC and issues a warning.
> This also exports error_severity from elog.c to retrieve severity -> text
> mappings for the warning message.
>
>
What about no-op (soft patch) for 11.1 and backpatches, error (hard patch)
for 12?

Re: Optimizing nested ConvertRowtypeExpr execution

2018-11-06 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> * I dislike using castNode() in places where the code has just
 Tom> explicitly verified the node type, which is true in both places
 Tom> where you used it here. The assertion is just bulking the code up
 Tom> to no purpose, and it creates an unnecessary discrepancy between
 Tom> older and newer code.

hmm... fair point, I'll think about it

 Tom> * As you have it here, a construct such as
 Tom> ConvertRowtypeExpr(ConvertRowtypeExpr(ConvertRowtypeExpr(Const)))
 Tom> will laboriously perform each of the intermediate steps, which
 Tom> seems like exactly the case we're trying to prevent at runtime. I
 Tom> wonder whether it is worth stripping off ConvertRowtypeExpr's
 Tom> before the recursive eval_const_expressions_mutator call to
 Tom> prevent that. You'd still want the check after the call, to handle
 Tom> situations where something more complex got simplified to a
 Tom> ConvertRowtypeExpr; and this would also complicate getting the
 Tom> convertformat right. So perhaps it's not worth the trouble, but I
 Tom> thought I'd mention it.

I think it's not worth the trouble.

 Tom> * I find the hardwired logic about how to merge distinct
 Tom> convertformat values a bit troublesome. Maybe use Min() instead?
 Tom> Although there is not currently any expectation about the ordering
 Tom> of that enum ...

I considered using Min() but decided against it specifically _because_
there was no suggestion either in the enum definition, or in any other
use of CoercionForm anywhere, that the order of values was intended to
be significant. Since there's only one value for "implicit", it seemed
better to work from that.

-- 
Andrew (irc:RhodiumToad)

Re: pread() and pwrite()

2018-11-06 Thread Thomas Munro

On Wed, Nov 7, 2018 at 4:42 AM Jesper Pedersen
 wrote:
> Passes check-world, and includes the feedback on this thread.
>
> New status: Ready for Committer

Thanks!  Pushed.  I'll keep an eye on the build farm to see if
anything breaks on Cygwin or some other frankenOS.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Tom Lane

=?UTF-8?Q?Ond=c5=99ej_Bouda?=  writes:
>>> Hm, what are the data types of those columns?

> scheduletemplate_id bigint NOT NULL,
> period_day smallint NOT NULL,
> timerange timerange NOT NULL,

OK, so here's a minimal reproducer:

drop table schedulecard;

create table schedulecard (
  scheduletemplate_id bigint NOT NULL,
  period_day smallint NOT NULL
);

CREATE INDEX schedulecard_overlap_idx
 ON schedulecard USING gist
 (scheduletemplate_id, (period_day::integer % 7));

insert into schedulecard values(12, 1);

update schedulecard set period_day = period_day + 7;

Interestingly, it doesn't crash if I change the index type to btree,
which I was not expecting because the crashing code seems pretty
independent of the index type.

Haven't traced further than that yet.

regards, tom lane

Re: [HACKERS] generated columns

2018-11-06 Thread Peter Eisentraut

On 06/11/2018 14:28, Simon Riggs wrote:
> The simplest solution would be to exclude generated columns from the
> replication stream altogether.
> 
> 
> IMHO...
> 
> Virtual generated columns need not be WAL-logged, or sent.

right

> Stored generated columns should be treated just like we'd treat a column
> value added by a trigger.
> 
> e.g. if we had a Timestamp column called LastUpdateTimestamp we would
> want to send that value

Generated columns cannot have volatile expression results in them, so
this case cannot happen.

Also, we don't know whether the generation expression on the target is
the same (or even if it looks the same, consider locale issues etc.), so
we need to recompute the generated columns on the target anyway, so it's
pointless to send the already computed stored values.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Tom Lane

I wrote:
> Interestingly, it doesn't crash if I change the index type to btree,
> which I was not expecting because the crashing code seems pretty
> independent of the index type.

Oh ... duh.  The problem here is that ProjIndexIsUnchanged thinks that
the type of the index column is identical to the type of the source
datum for it, which is not true for any opclass making use of the
opckeytype property.

Ondřej, as a short-term workaround you could prevent the crash
by setting that index's recheck_on_update property to false.

regards, tom lane

Re: Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Jonah H. Harris

On Tue, Nov 6, 2018 at 2:46 PM Isaac Morland 
wrote:

> On Tue, 6 Nov 2018 at 14:07, Jonah H. Harris 
> wrote:
>
>> Two options presented:
>>
>> - Hard patch removes FATAL/PANIC from client_message_level_options in
>> guc.c, which also seems to make sense in regard to it's double-usage
>> with trace_recovery_messages.
>>
>> - Soft patch keeps FATAL/PANIC in client_message_level_options but
>> coerces client_min_messages to ERROR when set to FATAL/PANIC and issues a
>> warning. This also exports error_severity from elog.c to retrieve severity
>> -> text mappings for the warning message.
>>
>>
> What about no-op (soft patch) for 11.1 and backpatches, error (hard patch)
> for 12?
>

I'm usually a fan of the hard fix... but I do see the point they've made
about during an upgrade.

Also, fixed wording in the soft patch (frontend protocol requires %s or
above -> frontend protocol requires %s or below) attached.

-- 
Jonah H. Harris


client_min_messages_config_soft-v2.patch
Description: Binary data

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Ondřej Bouda


Ondřej, as a short-term workaround you could prevent the crash
by setting that index's recheck_on_update property to false.


Thanks for the tip. I am unsuccessful using it, though:

# ALTER INDEX public.schedulecard_overlap_idx SET (recheck_on_update = 
FALSE);


ERROR:  unrecognized parameter "recheck_on_update"


Creating a new index is wrong, too:

# CREATE INDEX schedulecard_overlap_idx2
ON public.schedulecard USING gist
(scheduletemplate_id, (period_day::integer % 7), timerange)
WITH (recheck_on_update = FALSE);

ERROR:  unrecognized parameter "recheck_on_update"


It only succeeds if not USING gist:

# CREATE INDEX schedulecard_overlap_idx2
ON public.schedulecard
(scheduletemplate_id, (period_day::integer % 7), timerange)
WITH (recheck_on_update = FALSE);

CREATE INDEX


Is there any other workaround for a gist index, please?
Maybe we will just drop the index until the bug gets fixed - better slow 
queries than crashing servers...


Thanks,
Ondřej Bouda

Re: First-draft release notes for back-branch releases

2018-11-06 Thread Andrew Gierth

Do we need to add anything in the release notes about possible
complications in upgrading caused by the 65f39408ee71 / 56535dcdc9e2
issue?

If upgrading from the immediately prior point releases to this one, then
the shutdown of the previous version might have left an incorrect min
recovery point in pg_control, yes? So the error could then occur when
starting the new version, even though the bug is now apparently fixed.

-- 
Andrew (irc:RhodiumToad)

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Tom Lane

=?UTF-8?Q?Ond=c5=99ej_Bouda?=  writes:
>> Ondřej, as a short-term workaround you could prevent the crash
>> by setting that index's recheck_on_update property to false.

> Thanks for the tip. I am unsuccessful using it, though:
> # ALTER INDEX public.schedulecard_overlap_idx SET (recheck_on_update = 
> FALSE);
> ERROR:  unrecognized parameter "recheck_on_update"

Oh, for crying out loud.  That's yet a different bug.
I'm not sure that it's the fault of the recheck_on_update
feature proper though; it might be a pre-existing bug in
the reloptions code.  Looks like somebody forgot to list
RELOPT_KIND_GIST in RELOPT_KIND_INDEX, but is that the
fault of commit c203d6cf8 or was it busted before?

regards, tom lane

Re: Special role for subscriptions

2018-11-06 Thread Stephen Frost

Greetings,

* Evgeniy Efimkin (efim...@yandex-team.ru) wrote:
> As a first step I suggest we allow CREATE SUBSCRIPTION for table owner only.

That's a nice idea but seems like we would want to have a way to filter
what tables a subscription follows then..?  Just failing if the
publication publishes tables that we don't have access to or are not the
owner of seems like a poor solution..

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Alvaro Herrera

On 2018-Nov-06, Tom Lane wrote:

> =?UTF-8?Q?Ond=c5=99ej_Bouda?=  writes:
> >> Ondřej, as a short-term workaround you could prevent the crash
> >> by setting that index's recheck_on_update property to false.
> 
> > Thanks for the tip. I am unsuccessful using it, though:
> > # ALTER INDEX public.schedulecard_overlap_idx SET (recheck_on_update = 
> > FALSE);
> > ERROR:  unrecognized parameter "recheck_on_update"
> 
> Oh, for crying out loud.  That's yet a different bug.
> I'm not sure that it's the fault of the recheck_on_update
> feature proper though; it might be a pre-existing bug in
> the reloptions code.  Looks like somebody forgot to list
> RELOPT_KIND_GIST in RELOPT_KIND_INDEX, but is that the
> fault of commit c203d6cf8 or was it busted before?

RELOPT_KIND_INDEX was invented by that commit, looks like :-(

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Andres Freund

On 2018-11-06 16:47:20 -0500, Tom Lane wrote:
> =?UTF-8?Q?Ond=c5=99ej_Bouda?=  writes:
> >> Ondřej, as a short-term workaround you could prevent the crash
> >> by setting that index's recheck_on_update property to false.
> 
> > Thanks for the tip. I am unsuccessful using it, though:
> > # ALTER INDEX public.schedulecard_overlap_idx SET (recheck_on_update = 
> > FALSE);
> > ERROR:  unrecognized parameter "recheck_on_update"
> 
> Oh, for crying out loud.  That's yet a different bug.
> I'm not sure that it's the fault of the recheck_on_update
> feature proper though; it might be a pre-existing bug in
> the reloptions code.  Looks like somebody forgot to list
> RELOPT_KIND_GIST in RELOPT_KIND_INDEX, but is that the
> fault of commit c203d6cf8 or was it busted before?

Looks new:
+   RELOPT_KIND_INDEX = 
RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST,

there aren't any other "for all indexes" type options, so the whole
category didn't exist before.

It also strikes me as a really bad idea, even if RELOPT_KIND_GIST
wouldn't have been omitted: It breaks index am extensibility.

Greetings,

Andres Freund

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-06 Thread Robert Haas

On Tue, Aug 7, 2018 at 9:29 AM Andres Freund  wrote:
> One approach would be to make sure that everything relying on
> rt_partdesc staying the same stores its value in a local variable, and
> then *not* free the old version of rt_partdesc (etc) when the refcount >
> 0, but delay that to the RelationClose() that makes refcount reach
> 0. That'd be the start of a framework for more such concurrenct
> handling.

Some analysis of possible trouble spots:

- get_partition_dispatch_recurse and ExecCreatePartitionPruneState
both call RelationGetPartitionDesc.  Presumably, this means that if
the partition descriptor gets updated on the fly, the tuple routing
and partition dispatch code could end up with different ideas about
which partitions exist.  I think this should be fixed somehow, so that
we only call RelationGetPartitionDesc once per query and use the
result for everything.

- expand_inherited_rtentry checks
RelationGetPartitionDesc(oldrelation) != NULL.  If so, it calls
expand_partitioned_rtentry which fetches the same PartitionDesc again.
We can probably just do this once in the caller and pass the result
down.

- set_relation_partition_info also calls RelationGetPartitionDesc.
Off-hand, I think this code runs after expand_inherited_rtentry.  Not
sure what to do about this.  I'm not sure what the consequences would
be if this function and that one had different ideas about the
partition descriptor.

- tablecmds.c is pretty free about calling RelationGetPartitionDesc
repeatedly, but it probably doesn't matter.  If we're doing some kind
of DDL that depends on the contents of the partition descriptor, we
*had better* be holding a lock strong enough to prevent the partition
descriptor from being changed by somebody else at the same time.
Allowing a partition to be added concurrently with DML is one thing;
allowing a partition to be added concurrently with adding another
partition is a whole different level of insanity.  I think we'd be
best advised not to go down that rathole - among other concerns, how
would you even guarantee that the partitions being added didn't
overlap?

Generally:

Is it really OK to postpone freeing the old partition descriptor until
the relation reference count goes to 0?  I wonder if there are cases
where this could lead to tons of copies of the partition descriptor
floating around at the same time, gobbling up precious cache memory.
My first thought was that this would be pretty easy: just create a lot
of new partitions one by one while some long-running transaction is
open.  But the actual result in that case depends on the behavior of
the backend running the transaction.  If it just ignores the new
partitions and sticks with the partition descriptor it has got, then
probably nothing else will request the new partition descriptor either
and there will be no accumulation of memory.  However, if it tries to
absorb the updated partition descriptor, but without being certain
that the old one can be freed, then we'd have a query-lifespan memory
leak which is quadratic in the number of new partitions.

Maybe even that would be OK -- we could suppose that the number of new
partitions would probably be all THAT crazy large, and the constant
factor not too bad, so maybe you'd leak a could of MB for the length
of the query, but no more.  However, I wonder if it would better to
give each PartitionDescData its own refcnt, so that it can be freed
immediately when the refcnt goes to zero.  That would oblige every
caller of RelationGetPartitionDesc() to later call something like
ReleasePartitionDesc().  We could catch failures to do that by keeping
all the PartitionDesc objects so far created in a list.  When the main
entry's refcnt goes to 0, cross-check that this list is empty; if not,
then the remaining entries have non-zero refcnts that were leaked.  We
could emit a WARNING as we do in similar cases.

In general, I think something along the lines you are suggesting here
is the right place to start attacking this problem.  Effectively, we
immunize the system against the possibility of new entries showing up
in the partition descriptor while concurrent DML is running; the
semantics are that the new partitions are ignored for the duration of
currently-running queries.  This seems to allow for painless creation
or addition of new partitions in normal cases, but not when a default
partition exists.  In that case, using the old PartitionDesc is
outright wrong, because adding a new toplevel partition changes the
default partition's partition constraint. We can't insert into the
default partition a tuple that under the updated table definition
needs to go someplace else.  It seems like the best way to account for
that is to reduce the lock level on the partitioned table to
ShareUpdateExclusiveLock, but leave the lock level on any default
partition as AccessExclusiveLock (because we are modifying a
constraint on it).  We would also need to leave the lock level on the
new partition as AccessExclusiveLoc

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Tom Lane

Andres Freund  writes:
> On 2018-11-06 16:47:20 -0500, Tom Lane wrote:
>> Looks like somebody forgot to list
>> RELOPT_KIND_GIST in RELOPT_KIND_INDEX, but is that the
>> fault of commit c203d6cf8 or was it busted before?

> Looks new:
> +   RELOPT_KIND_INDEX = 
> RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST,
> there aren't any other "for all indexes" type options, so the whole
> category didn't exist before.

> It also strikes me as a really bad idea, even if RELOPT_KIND_GIST
> wouldn't have been omitted: It breaks index am extensibility.

Hm, well, that enum already knows about all index types, doesn't it?

regards, tom lane

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Tomas Vondra

On 11/6/18 10:54 PM, Andres Freund wrote:
> On 2018-11-06 16:47:20 -0500, Tom Lane wrote:
>> =?UTF-8?Q?Ond=c5=99ej_Bouda?=  writes:
 Ondřej, as a short-term workaround you could prevent the crash
 by setting that index's recheck_on_update property to false.
>>
>>> Thanks for the tip. I am unsuccessful using it, though:
>>> # ALTER INDEX public.schedulecard_overlap_idx SET (recheck_on_update = 
>>> FALSE);
>>> ERROR:  unrecognized parameter "recheck_on_update"
>>
>> Oh, for crying out loud.  That's yet a different bug.
>> I'm not sure that it's the fault of the recheck_on_update
>> feature proper though; it might be a pre-existing bug in
>> the reloptions code.  Looks like somebody forgot to list
>> RELOPT_KIND_GIST in RELOPT_KIND_INDEX, but is that the
>> fault of commit c203d6cf8 or was it busted before?
> 
> Looks new:
> +   RELOPT_KIND_INDEX = 
> RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST,
> 
> there aren't any other "for all indexes" type options, so the whole
> category didn't exist before.
> 
> It also strikes me as a really bad idea, even if RELOPT_KIND_GIST
> wouldn't have been omitted: It breaks index am extensibility.
> 

Does it? The RELOPT_KIND_* stuff is hard-coded in reloptions.h anyway,
so I'm not sure how this particular thing makes it less extensible?

That being said, we also have RELOPT_KIND_BRIN, and that seems to be
missing from RELOPT_KIND_INDEX too (and AFAICS the optimization works
for all index types).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] generated columns

2018-11-06 Thread Simon Riggs

On Tue, 6 Nov 2018 at 13:16, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:


> > Stored generated columns should be treated just like we'd treat a column
> > value added by a trigger.
> >
> > e.g. if we had a Timestamp column called LastUpdateTimestamp we would
> > want to send that value
>
> Generated columns cannot have volatile expression results in them, so
> this case cannot happen.
>
> Also, we don't know whether the generation expression on the target is
> the same (or even if it looks the same, consider locale issues etc.), so
> we need to recompute the generated columns on the target anyway, so it's
> pointless to send the already computed stored values.
>

Makes sense.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: First-draft release notes for back-branch releases

2018-11-06 Thread Jonathan S. Katz

On 11/5/18 11:10 PM, Amit Langote wrote:
> Hi,
> 
> On 2018/11/06 12:49, Jonathan S. Katz wrote:
>>
>> I still feel inclined to elaborate on the "Prevent creation of a
>> partition in a trigger attached to its parent table" but perhaps in
>> future release we will just mention that this behavior is again available.
> 
> Maybe, it's enough that that detail is present in the release note item?

Yes...after sleeping on it, I'll keep the simplified explanation and go
with the path above of just mentioning when said functionality is
available again.

Thanks!

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Andres Freund

Hi,

On 2018-11-06 23:11:29 +0100, Tomas Vondra wrote:
> On 11/6/18 10:54 PM, Andres Freund wrote:
> > Looks new:
> > +   RELOPT_KIND_INDEX = 
> > RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST,
> > 
> > there aren't any other "for all indexes" type options, so the whole
> > category didn't exist before.
> > 
> > It also strikes me as a really bad idea, even if RELOPT_KIND_GIST
> > wouldn't have been omitted: It breaks index am extensibility.
> > 
> 
> Does it? The RELOPT_KIND_* stuff is hard-coded in reloptions.h anyway,
> so I'm not sure how this particular thing makes it less extensible?

Well, you can create new index AMs in extensions these days, but given
the relopt design above, the feature cannot be disabled for them. Yes,
there's *currently* probably no great way to have reloptions across all
potential index types, but that's not an excuse for adding something
broken.

Greetings,

Andres Freund

Re: First-draft release notes for back-branch releases

2018-11-06 Thread Stephen Frost

Greetings,

* Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
> Do we need to add anything in the release notes about possible
> complications in upgrading caused by the 65f39408ee71 / 56535dcdc9e2
> issue?
> 
> If upgrading from the immediately prior point releases to this one, then
> the shutdown of the previous version might have left an incorrect min
> recovery point in pg_control, yes? So the error could then occur when
> starting the new version, even though the bug is now apparently fixed.

Based on the discussion on IRC and Andrew's comments above, it seems to
me like we should really highlight this.  Would be nice if we could
include some information about what to do if someone is bit by this
also...

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Andres Freund

Hi,

On 2018-11-06 17:11:40 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-11-06 16:47:20 -0500, Tom Lane wrote:
> >> Looks like somebody forgot to list
> >> RELOPT_KIND_GIST in RELOPT_KIND_INDEX, but is that the
> >> fault of commit c203d6cf8 or was it busted before?
> 
> > Looks new:
> > +   RELOPT_KIND_INDEX = 
> > RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST,
> > there aren't any other "for all indexes" type options, so the whole
> > category didn't exist before.
> 
> > It also strikes me as a really bad idea, even if RELOPT_KIND_GIST
> > wouldn't have been omitted: It breaks index am extensibility.
> 
> Hm, well, that enum already knows about all index types, doesn't it?

Not quite sure what you mean.

Before c203d6cf8 there weren't reloptions across index types. But after
it, if one adds a new index AM via an extension, one can't set
recheck_on_update = false for indexes using it, even though the feature
affects such indexes. We support adding indexes AMs at runtime these
days, including WAL logging (even though that's a bit
voluminous). There's even a contrib index AM...

The list of AMs is supposed to be extensible at runtime, cf
add_reloption_kind().

Greetings,

Andres Freund

Re: First-draft release notes for back-branch releases

2018-11-06 Thread Tom Lane

Stephen Frost  writes:
> * Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
>> Do we need to add anything in the release notes about possible
>> complications in upgrading caused by the 65f39408ee71 / 56535dcdc9e2
>> issue?
>> 
>> If upgrading from the immediately prior point releases to this one, then
>> the shutdown of the previous version might have left an incorrect min
>> recovery point in pg_control, yes? So the error could then occur when
>> starting the new version, even though the bug is now apparently fixed.

> Based on the discussion on IRC and Andrew's comments above, it seems to
> me like we should really highlight this.  Would be nice if we could
> include some information about what to do if someone is bit by this
> also...

You could be bit by any shutdown of the old code, no, whether it's
part of a pg_upgrade or not?  Also, it looks like the bug only affects
standbys (or at least that's what the commit message seems to imply),
which makes it less of a data-loss hazard than it might've been.

regards, tom lane

Re: Disallow setting client_min_messages > ERROR?

2018-11-06 Thread Robert Haas

On Tue, Nov 6, 2018 at 11:48 AM Alvaro Herrera  wrote:
> I agree -- it seems better to have a benign no-op and prevent this kind
> of silly rationale from preventing upgrades.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Cache relation sizes?

2018-11-06 Thread Thomas Munro

Hello,

PostgreSQL likes to probe the size of relations with lseek(SEEK_END) a
lot.  For example, a fully prewarmed pgbench -S transaction consists
of recvfrom(), lseek(SEEK_END), lseek(SEEK_END), sendto().  I think
lseek() is probably about as cheap as a syscall can be so I doubt it
really costs us much, but it's still a context switch and it stands
out when tracing syscalls, especially now that all the lseek(SEEK_SET)
calls are gone (commit c24dcd0cfd).

If we had a different kind of buffer mapping system of the kind that
Andres has described, there might be a place in shared memory that
could track the size of the relation.  Even if we could do that, I
wonder if it would still be better to do a kind of per-backend
lock-free caching, like this:

1.  Whenever a file size has been changed by extending or truncating
(ie immediately after the syscall), bump a shared "size_change"
invalidation counter.

2.  Somewhere in SMgrRelation, store the last known size_change
counter and the last known size.  In _mdnblocks(), if the counter
hasn't moved, we can use the cached size and skip the call to
FileSize().

3.  To minimise false cache invalidations (caused by other relations'
size changes), instead of using a single size_change counter in shared
memory, use an array of N and map relation OIDs onto them.

4.  As for memory coherency, I think it might be enough to use uint32
without locks or read barriers on the read size, since you have a view
of memory at least as new as your snapshot (the taking of which
included a memory barrier).  That's good enough because we don't need
to see blocks added after our snapshot was taken (the same assumption
applies today, this just takes further advantage of it), and
truncations can't happen while we have a share lock on the relation
(the taking of which also includes memory barrier, covering the case
where the truncation happened after our snapshot and the acquisition
of the share lock on the relation).  In other words, there is heavy
locking around truncation already, and for extension we don't care
about recent extensions so we can be quite relaxed about memory.
Right?

I don't have a patch for this (though I did once try it as a
throw-away hack and it seemed to work), but I just wanted to share the
idea and see if anyone sees a problem with the logic/interlocking, or
has a better idea for how to do this.  It occurred to me that I might
be missing something or this would have been done already...

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Cache relation sizes?

2018-11-06 Thread Andres Freund

Hi,

On 2018-11-07 11:40:22 +1300, Thomas Munro wrote:
> PostgreSQL likes to probe the size of relations with lseek(SEEK_END) a
> lot.  For example, a fully prewarmed pgbench -S transaction consists
> of recvfrom(), lseek(SEEK_END), lseek(SEEK_END), sendto().  I think
> lseek() is probably about as cheap as a syscall can be so I doubt it
> really costs us much, but it's still a context switch and it stands
> out when tracing syscalls, especially now that all the lseek(SEEK_SET)
> calls are gone (commit c24dcd0cfd).

I'd really really like to see some benchmarking before embarking on a
more complex scheme.  I aesthetically dislike those lseeks, but ...


> If we had a different kind of buffer mapping system of the kind that
> Andres has described, there might be a place in shared memory that
> could track the size of the relation.  Even if we could do that, I
> wonder if it would still be better to do a kind of per-backend
> lock-free caching, like this:

Note that the reason for introducing that isn't primarily motivated
by getting rid of the size determining lseeks, but reducing the locking
for extending *and* truncating relations.


Greetings,

Andres Freund

Re: First-draft release notes for back-branch releases

2018-11-06 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> You could be bit by any shutdown of the old code, no, whether it's
 Tom> part of a pg_upgrade or not?

Nothing to do with pg_upgrade, this is likely to bite people just doing
an update from the previous minor release.

 Tom> Also, it looks like the bug only affects standbys (or at least
 Tom> that's what the commit message seems to imply), which makes it
 Tom> less of a data-loss hazard than it might've been.

The commit message doesn't really show the severity of the problem at
all.

The problem is this: the updating of minRecoveryPoint in the control
file is almost completely broken in the last point releases. It's not an
"incorrect calculation" as the commit message says, it's that the
bgwriter and checkpointer _do not update the value at all_ except
immediately after a checkpoint. That means that it is common to have a
situation where the recovery restartpoint is at lsn X, the
minRecoveryPoint is at a slightly later lsn Y, but there are on-disk
data pages with a _much_ later lsn Z.

If such a data page was the subject of a Btree/DELETE record, then any
attempt to do recovery will potentially PANIC with a (false) "WAL
contains references to invalid pages" error -- if, and only if, at least
one client (e.g. a monitoring system) is connected when the record is
replayed, which is possible because of the incorrect minRecoveryPoint.

The users whose case I was diagnosing on IRC were finding that their
monitoring system was sufficient to trigger the problem at least 80% of
the time. Consider that the broken minRecoveryPoint can be quite a long
way in the past relative to on-disk data pages, so the window of
vulnerability isn't necessarily small.

So while there _probably_ isn't any data corruption, the standby can get
into a state that isn't restartable unless you know to block client
connections to it until it has caught up. Rebuilding the standby from
the master will work but that may be a significant practical problem if
the data is large.

-- 
Andrew (irc:RhodiumToad)

RE: Number of buckets/partitions of dshash

2018-11-06 Thread Ideriha, Takeshi

>From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]

>> This would cause some waste of memory on DSA because some partitions 
>> (buckets)
>is allocated but not used.
>>
>> So I'm thinking that current dshash design is still ok but flexible
>> size of partition is appropriate for some use cases like mine.
>>
>> Do you have any thoughts?
>
>We could do that easily, but shouldn't we find a way to reduce or eliminate 
>the impact
>of locking first? dshash needs to hold partition lock while the caller is 
>examining a
>returned entry.

Thanks for the comment. 
I agreed.
It would take a long time to achieve it but as you've pointed out
finding way to minimize the locking time seems benefit for everyone and first 
priority.

Regards,
Takeshi Ideriha

Re: ON COMMIT actions and inheritance

2018-11-06 Thread Michael Paquier

On Tue, Nov 06, 2018 at 09:53:37AM -0300, Alvaro Herrera wrote:
> While you're there -- I think the CCI after the heap_truncate is not
> needed.  Could as well get rid of it ...

Yes, I have noticed the comment saying so.  I did not really want to
bother about that part yet for this patch as that concerns only HEAD
though.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Improvements to "Getting started" tutorial for Google Code-in task

2018-11-06 Thread Andreas 'ads' Scherbaum

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

This is a documentation update, no code changes. The initial idea came from the 
Google Code-In project, and the changes were modified and reviewed until the 
wording is appropriate.

The new status of this patch is: Ready for Committer

Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-06 Thread Kyotaro HORIGUCHI

At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita  
wrote in <5be18409.2070...@lab.ntt.co.jp>
> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> > At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
> > Fujita wrote
> > in<5bdc4ba0.7050...@lab.ntt.co.jp>
> >> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> >>> At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane  wrote
> >>> in<18397.1540297...@sss.pgh.pa.us>
>  It's just a POC because there are some things that need more thought
>  than I've given them:
> 
>  1. Is it OK to revert SIGPIPE to default processing for *all* programs
>  launched through OpenPipeStream?  If not, what infrastructure do we
>  need to add to control that?  In particular, is it sane to revert
>  SIGPIPE for a pipe program that we will write to not read from?
>  (I thought probably yes, because that is the normal Unix behavior,
>  but it could be debated.)
> 
> >> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
> >> think it would be the responsibility of the called program to handle
> >> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
> >> something, though.
> >
> > So, I think we *should* (not just OK to) restore SIGPIPE to
> > SIG_DFL in any case here to prevent undetermined situation for
> > the program.
> 
> OK
> 
>  2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>  SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>  we don't intend to terminate that early.)
> >>>
> >>> Since the SIGPIPE may come from another pipe, I think we
> >>> shouldn't generally.
> >>
> >> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
> >> get a better error message in CopySendEndOfRow if the called program
> >> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
> >>
> >>  if (cstate->is_program)
> >>  {
> >>  if (errno == EPIPE)
> >>  {
> >>  /*
> >>   * The pipe will be closed automatically on error
> >>   * at
> >>   * the end of transaction, but we might get a
> >>   * better
> >>   * error message from the subprocess' exit code
> >>   * than
> >>   * just "Broken Pipe"
> >>   */
> >>  ClosePipeToProgram(cstate);
> >>
> >>  /*
> >>   * If ClosePipeToProgram() didn't throw an error,
> >>   * the
> >>   * program terminated normally, but closed the
> >>   * pipe
> >>   * first. Restore errno, and throw an error.
> >>   */
> >>  errno = EPIPE;
> >>  }
> >>  ereport(ERROR,
> >>  (errcode_for_file_access(),
> >>   errmsg("could not write to COPY program:
> >>   %m")));
> >>  }
> >
> > Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
> > program's exit status. So it is irrelevant to called program's
> > SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
> > backend side.
> 
> My explanation might not be enough.  Let me explain.  If the called
> program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for
> some reason, ClosePipeToProgram *as-is* would create an error message
> from that program's exit code.  But if we modify ClosePipeToProgram
> like the original POC patch, that function would not create that
> message for that termination.  To avoid that, I think it would be
> better for ClosePipeToProgram to ignore the SIGPIPE failure only in
> the case where the caller is a COPY FROM PROGRAM that is allowed to
> terminate early. (Maybe we could specify that as a new argument for
> BeginCopyFrom.)

I understood. Thanks for the explanation. I agree to that.

> >>> But iff we are explicitly stop reading from
> >>> the pipe before detecting an error, it can be ignored since we
> >>> aren't interested in further failure.
> >>
> >> You mean that we should ignore other failures of the called program
> >> when we stop reading from the pipe early?
> >
> > Yes, we have received sufficient data from the pipe then closed
> > it successfully. The program may want write more but we don't
> > want it. We ragher should ignore SIGPIPE exit code of the program
rather
> > since closing our writing end of a pipe is likely to cause it and
> 
> I think so too.
> 
> > even if it comes from another pipe, we can assume that the
> > SIGPIPE immediately stopped the program before returning any
> > garbage to us.
> 
> Sorry, I don't understand this.

Mmm. It looks confusing, sorry. In other words:

We don't care the reason

Re: shared-memory based stats collector

2018-11-06 Thread Kyotaro HORIGUCHI

Thank you for the comments, Antonin, Tomas.

At Tue, 30 Oct 2018 13:35:23 +0100, Tomas Vondra  
wrote in 
> 
> 
> On 10/05/2018 10:30 AM, Kyotaro HORIGUCHI wrote:
> > Hello.
> > 
> > At Tue, 02 Oct 2018 16:06:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
> >  wrote in 
> > <20181002.160651.117284090.horiguchi.kyot...@lab.ntt.co.jp>
> >> It doesn't work nor even compile since I failed to include some
> >> changes. The atached v6-0008 at least compiles and words.
> >>
> >> 0001-0007 are not attached since they are still aplicable on
> >> master head with offsets.
> > 
> > In this patchset 0001-0007 are still the same with the previous
> > version. I'll reorganize the whole patchset in the next version.
> > 
> > This is more saner version of previous v5-0008, which didn't pass
> > regression test. v6-0008 to v6-0010 are attached and they are
> > applied on top of v5-0001-0007.
> > 
> 
> BTW one more thing - I strongly recommend always attaching the whole
> patch series, even if some of the parts did not change.
> 
> Firstly, it makes the reviewer's life much easier, because it's not
> necessary to hunt through past messages for all the bits and resolve
> potential conflicts (e.g. there are two 0008 in recent messages).
> 
> Secondly, it makes http://commitfest.cputube.org/ work - it tries to
> apply patches from a single message, which fails when some of the parts
> are omitted.

Yeah, I know about the second point. About the second point, I'm
not sure which makes review easier confirming difference in all
files or finding not-modified portion from upthread. But ok, I'l
follow your suggestion. I'll attach the whole patches and add
note about difference.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ON COMMIT actions and inheritance

2018-11-06 Thread Michael Paquier

On Tue, Nov 06, 2018 at 07:04:17PM +0900, Amit Langote wrote:
> Agree with keeping it simple.  Maybe, we could (should?) document that the
> only ON COMMIT action that works when specified with partitioned parent
> table is DROP (other actions are essentially ignored)?

I have been thinking about this one, and here are some ideas:
- for ON COMMIT DROP:
When used on a partitioned table or a table with inheritance children,
this drops the depending partitions and children.
- for ON DELETE ROWS:
When used on a partitioned table, this is not cascaded to its
partitions.

> Seems fine to me.

Thanks for looking at the patch.

> +
> +/*
> + * Truncate relations before dropping so as all dependencies between
> 
> so as -> so that

Fixed.

> + * relations are removed after they are worked on.  This is a waste as it
> + * could be possible that a relation truncated needs also to be removed
> + * per dependency games but this makes the whole logic more robust and
> + * there is no need to re-check that a relation exists afterwards.
> + */
> 
> "afterwords" seems redundant, but may I suggest rewriting the 2nd sentence as:
> 
> Doing it like this might be a waste as it's possible that a relation being
> truncated will be dropped anyway due to it's parent being dropped, but
> this makes the code more robust because of not having to re-check that the
> relation exists.

Your comment sounds better, so added.
--
Michael
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10428f8ff0..98c5a01416 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1215,7 +1215,8 @@ WITH ( MODULUS numeric_literal, REM
   All rows in the temporary table will be deleted at the end
   of each transaction block.  Essentially, an automatic  is done
-  at each commit.
+  at each commit.  When used on a partitioned table, this
+  is not cascaded to its partitions.
  
 

@@ -1225,7 +1226,9 @@ WITH ( MODULUS numeric_literal, REM
 
  
   The temporary table will be dropped at the end of the current
-  transaction block.
+  transaction block.  When used on a partitioned table or a
+  table with inheritance children, this drops the depending
+  partitions and children.
  
 

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6048334c75..f966f5c431 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13300,6 +13300,7 @@ PreCommit_on_commit_actions(void)
 {
 	ListCell   *l;
 	List	   *oids_to_truncate = NIL;
+	List	   *oids_to_drop = NIL;
 
 	foreach(l, on_commits)
 	{
@@ -13326,36 +13327,66 @@ PreCommit_on_commit_actions(void)
 	oids_to_truncate = lappend_oid(oids_to_truncate, oc->relid);
 break;
 			case ONCOMMIT_DROP:
-{
-	ObjectAddress object;
-
-	object.classId = RelationRelationId;
-	object.objectId = oc->relid;
-	object.objectSubId = 0;
-
-	/*
-	 * Since this is an automatic drop, rather than one
-	 * directly initiated by the user, we pass the
-	 * PERFORM_DELETION_INTERNAL flag.
-	 */
-	performDeletion(&object,
-	DROP_CASCADE, PERFORM_DELETION_INTERNAL);
-
-	/*
-	 * Note that table deletion will call
-	 * remove_on_commit_action, so the entry should get marked
-	 * as deleted.
-	 */
-	Assert(oc->deleting_subid != InvalidSubTransactionId);
-	break;
-}
+oids_to_drop = lappend_oid(oids_to_drop, oc->relid);
+break;
 		}
 	}
+
+	/*
+	 * Truncate relations before dropping so that all dependencies between
+	 * relations are removed after they are worked on.  Doing it like this
+	 * might be a waste as it is possible that a relation being truncated
+	 * will be dropped anyway due to its parent being dropped, but this
+	 * makes the code more robust because of not having to re-check that the
+	 * relation exists at truncation time.
+	 */
 	if (oids_to_truncate != NIL)
 	{
 		heap_truncate(oids_to_truncate);
 		CommandCounterIncrement();	/* XXX needed? */
 	}
+	if (oids_to_drop != NIL)
+	{
+		ObjectAddresses *targetObjects = new_object_addresses();
+		ListCell   *l;
+
+		foreach(l, oids_to_drop)
+		{
+			ObjectAddress object;
+
+			object.classId = RelationRelationId;
+			object.objectId = lfirst_oid(l);
+			object.objectSubId = 0;
+
+			Assert(!object_address_present(&object, targetObjects));
+
+			add_exact_object_address(&object, targetObjects);
+		}
+
+		/*
+		 * Since this is an automatic drop, rather than one directly initiated
+		 * by the user, we pass the PERFORM_DELETION_INTERNAL flag.
+		 */
+		performMultipleDeletions(targetObjects, DROP_CASCADE,
+ PERFORM_DELETION_INTERNAL | PERFORM_DELETION_QUIETLY);
+
+#ifdef USE_ASSERT_CHECKING
+
+		/*
+		 * Note that table deletion will call remove_on_commit_action, so the
+		 * entry

RE: speeding up planning with partitions

2018-11-06 Thread Imai, Yoshikazu

About inheritance_make_rel_from_joinlist(), I considered how it processes
joins for sub-partitioned-table.

sub-partitioned-table image:
part
  sub1
leaf1
leaf2

inheritance_make_rel_from_joinlist() translates join_list and join_info_list
for each leafs(leaf1, leaf2 in above image). To translate those two lists for
leaf1, inheritance_make_rel_from_joinlist() translates lists from part to sub1
and nextly from sub1 to leaf1. For leaf2, inheritance_make_rel_from_joinlist() 
translates lists from part to sub1 and from sub1 to leaf2. There are same
translation from part to sub1, and I think it is better if we can eliminate it.
I attached 0002-delta.patch.

In the patch, inheritance_make_rel_from_joinlist() translates lists not only for
leafs but for mid-nodes, in a depth-first manner, so it can use lists of
mid-nodes for translating lists from mid-node to leafs, which eliminates same
translation.

I think it might be better if we can apply same logic to inheritance_planner(),
which once implemented the same logic as below. 


foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
...
/*
 * expand_inherited_rtentry() always processes a parent before any of
 * that parent's children, so the parent_root for this relation should
 * already be available.
 */
parent_root = parent_roots[appinfo->parent_relid];
Assert(parent_root != NULL);
parent_parse = parent_root->parse;
...
subroot->parse = (Query *)
adjust_appendrel_attrs(parent_root,
   (Node *) parent_parse,
   1, &appinfo);


--
Yoshikazu Imai



0002-delta.patch
Description: 0002-delta.patch

Re: First-draft release notes for back-branch releases

2018-11-06 Thread Michael Paquier

On Tue, Nov 06, 2018 at 11:44:56PM +, Andrew Gierth wrote:
> The commit message doesn't really show the severity of the problem at
> all.

I take the blame for that.  And my apologies for what it's worth.

> The users whose case I was diagnosing on IRC were finding that their
> monitoring system was sufficient to trigger the problem at least 80% of
> the time. Consider that the broken minRecoveryPoint can be quite a long
> way in the past relative to on-disk data pages, so the window of
> vulnerability isn't necessarily small.

The first report after the last point release on the matter is here, and
those folks had exactly the same symptoms with clients aggressively
connecting to the standby:
https://postgr.es/m/153492341830.1368.3936905691758473...@wrigleys.postgresql.org

And this came out pretty quickly.

> So while there _probably_ isn't any data corruption, the standby can get
> into a state that isn't restartable unless you know to block client
> connections to it until it has caught up. Rebuilding the standby from
> the master will work but that may be a significant practical problem if
> the data is large.

The problem would show up if you enforce a crash recovery when
restarting the standby, not after when letting it shut down cleanly.
Corruptions could actually happen if you try to promote the standby
before it reaches the actual recovery LSN when it failed to update
minRecoveryPoint after it performed a crash recovery.  However this is
proving to be a problem only if have a standby do a crash recovery and a
promotion immediately afterwards, which does not happen when recovering
from a backup as well as the minimum recovery LSN comes from the backup
end record, not from the control file.
--
Michael


signature.asc
Description: PGP signature

Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

2018-11-06 Thread Amit Langote

Hi,

On 2018/11/07 0:10, Alvaro Herrera wrote:
> Looking over the stuff in gram.y (to make sure there's nothing that
> could be lost), I noticed that we're losing the COLLATE clause when they
> are added to columns in partitions.  I would expect part1 to end up with
> collation es_CL, or alternatively that the command throws an error:
> 
> 55462 10.6 138851=# create table part (a text collate "en_US") partition by 
> range (a);
> CREATE TABLE
> Duración: 23,511 ms
> 55462 10.6 138851=# create table part1  partition of part (a collate "es_CL") 
> for values from ('ca') to ('cu');
> CREATE TABLE
> Duración: 111,551 ms
> 55462 10.6 138851=# \d part
>Tabla «public.part»
>  Columna │ Tipo │ Collation │ Nullable │ Default 
> ─┼──┼───┼──┼─
>  a   │ text │ en_US │  │ 
> Partition key: RANGE (a)
> Number of partitions: 1 (Use \d+ to list them.)
> 
> 55462 10.6 138851=# \d part1
>   Tabla «public.part1»
>  Columna │ Tipo │ Collation │ Nullable │ Default 
> ─┼──┼───┼──┼─
>  a   │ text │ en_US │  │ 
> Partition of: part FOR VALUES FROM ('ca') TO ('cu')
> 
> 
> (This case is particularly bothersome because the column is the
> partition key, so the partition range bounds would differ depending on
> which collation is used to compare.  I assume we'd always want to use
> the parent table's collation; so there's even a stronger case for
> raising an error if it doesn't match the parent's.  However, this case
> could arise for other columns too, where it's not *so* bad, but still
> not completely correct I think.)

Thank you for investigating.

I think the result in this case should be an error, just as it would in
the regular inheritance case.

create table parent (a text);
create table child (a text collate "en_US") inherits (parent);
NOTICE:  merging column "a" with inherited definition
ERROR:  column "a" has a collation conflict
DETAIL:  "default" versus "en_US"

In fact, I see that ATTACH PARTITION rejects a partition if collations
don't match.

create table part (a text collate "en_US") partition by range (a);
create table part1 (a text collate "es_CL");
alter table part attach partition part1 for values from ('ca') to ('cu');
ERROR:  child table "part1" has different collation for column "a"

> This happens on unpatched code, and doesn't seem covered by any tests.
> However, this seems an independent bug that isn't affected by this
> patch.
> 
> The only other things there are deferrability markers, which seem to be
> propagated in a relatively sane fashion (but no luck if you want to add
> foreign keys with mismatching deferrability than the parent's -- you
> just get a dupe, and there's no way in the grammar to change the flags
> for the existing constraint).  But you can add a UNIQUE DEFERRED
> constraint in a partition that wasn't there in the parent, for example.

Looking again at MergeAttributes, it seems that the fix for disallowing
mismatching collations is not that invasive.  PFA a patch that applies on
top of your 0001-Revise-attribute-handling-code-on-partition-creation.patch.

I haven't looked closely at the cases involving deferrability markers though.

Thanks,
Amit
From d2d2489e4978366b4fb489c60d85540264ba1064 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 7 Nov 2018 10:32:14 +0900
Subject: [PATCH 2/2] Disallow creating partitions with mismatching collations
 for columns

---
 src/backend/commands/tablecmds.c   | 21 +
 src/test/regress/expected/create_table.out |  6 ++
 src/test/regress/sql/create_table.sql  |  5 +
 3 files changed, 32 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a0279ae383..fbf902143b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2446,6 +2446,10 @@ MergeAttributes(List *schema, List *supers, char 
relpersistence,
 
if (strcmp(coldef->colname, restdef->colname) 
== 0)
{
+   Oid defTypeId;
+   int32   deftypmod;
+   Oid newCollId;
+
found = true;
coldef->is_not_null |= 
restdef->is_not_null;
 
@@ -2465,6 +2469,23 @@ MergeAttributes(List *schema, List *supers, char 
relpersistence,
coldef->raw_default = 
restdef->raw_default;
coldef->cooked_default = NULL;
}
+
+   /*
+* Collation must be same, so error out 
if a different one
+* specified for the partition.
+

Re: Doc patch on psql output formats

2018-11-06 Thread Michael Paquier

On Tue, Nov 06, 2018 at 06:18:37PM +0100, Daniel Verite wrote:
> In both cases using abbreviations in scripts is not a great
> idea. Anyway I will look into the changes required in do_pset to
> implement the error on multiple matches, if that's the preferred
> behavior.

You would also break the compatibility of a script using only "a" by
issuing an error.  Anyway, complaining about an unmatch sounds better
than a silent failure.
--
Michael


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 119 matches

Mail list logo