date:20200707

Re: some more pg_dump refactoring

2020-07-07 Thread Peter Eisentraut


On 2020-06-30 16:56, Fabien COELHO wrote:

Would it make sense to accumulate in the other direction, older to newer,
so that new attributes are added at the end of the select?

I think that could make sense, but the current style was introduced by
daa9fe8a5264a3f192efa5ddee8fb011ad9da365.  Should we revise that?

It seems to me more logical to do it while you're at it, but you are the
one writing the patches:-)


What do you think about this patch to reorganize the existing code from 
that old commit?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From eca6fd80b6ca850a0f5e40c40d7cb1be48ab2e2a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 7 Jul 2020 08:58:16 +0200
Subject: [PATCH] pg_dump: Further reorganize getTableAttrs()

After further discussion after
daa9fe8a5264a3f192efa5ddee8fb011ad9da365, reorder the version-specific
sections from oldest to newest.  Also, remove the variable assignments
from PQfnumber() to reduce vertical space.
---
 src/bin/pg_dump/pg_dump.c | 149 ++
 1 file changed, 54 insertions(+), 95 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a41a3db876..aa0725627a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -8431,35 +8431,14 @@ void
 getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 {
DumpOptions *dopt = fout->dopt;
-   int i,
-   j;
PQExpBuffer q = createPQExpBuffer();
-   int i_attnum;
-   int i_attname;
-   int i_atttypname;
-   int i_atttypmod;
-   int i_attstattarget;
-   int i_attstorage;
-   int i_typstorage;
-   int i_attnotnull;
-   int i_atthasdef;
-   int i_attidentity;
-   int i_attgenerated;
-   int i_attisdropped;
-   int i_attlen;
-   int i_attalign;
-   int i_attislocal;
-   int i_attoptions;
-   int i_attcollation;
-   int i_attfdwoptions;
-   int i_attmissingval;
-   PGresult   *res;
-   int ntups;
-   boolhasdefaults;
 
-   for (i = 0; i < numTables; i++)
+   for (int i = 0; i < numTables; i++)
{
TableInfo  *tbinfo = &tblinfo[i];
+   PGresult   *res;
+   int ntups;
+   boolhasdefaults;
 
/* Don't bother to collect info for sequences */
if (tbinfo->relkind == RELKIND_SEQUENCE)
@@ -8497,27 +8476,27 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int 
numTables)
 "a.attislocal,\n"
 
"pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,\n");
 
-   if (fout->remoteVersion >= 12)
-   appendPQExpBufferStr(q,
-
"a.attgenerated,\n");
-   else
-   appendPQExpBufferStr(q,
-"'' AS 
attgenerated,\n");
-
-   if (fout->remoteVersion >= 11)
+   if (fout->remoteVersion >= 9)
appendPQExpBufferStr(q,
-"CASE WHEN 
a.atthasmissing AND NOT a.attisdropped "
-"THEN 
a.attmissingval ELSE null END AS attmissingval,\n");
+
"array_to_string(a.attoptions, ', ') AS attoptions,\n");
else
appendPQExpBufferStr(q,
-"NULL AS 
attmissingval,\n");
+"'' AS 
attoptions,\n");
 
-   if (fout->remoteVersion >= 10)
+   if (fout->remoteVersion >= 90100)
+   {
+   /*
+* Since we only want to dump COLLATE clauses for 
attributes whose
+* collation is different from their type's default, we 
use a CASE
+* here to suppress uninteresting attcollations cheaply.
+*/
appendPQExpBufferStr(q,
-
"a.attidentity,\n");
+

Re: Yet another fast GiST build (typo)

2020-07-07 Thread Andrey M. Borodin



> 6 июля 2020 г., в 17:55, Andrey M. Borodin  написал(а):
> 
> 
> 
>> 1 июля 2020 г., в 17:05, Daniel Gustafsson  написал(а):
>> 
>>> On 29 Feb 2020, at 18:24, Andrey M. Borodin  wrote:
>> 
>>> I've fixed this and added patch with GiST reloption to provide sort support 
>>> function.
>> 
>> 0002 in this patchset fails to apply, please submit a rebased version.  I've
>> marked the entry Waiting for Author in the meantime.
> 
> Thanks, Daniel!
> 
> PFA v7.

Oops. I've mismerged docs and did not notice this with check world. PFA v8 with 
fixed docs.

Best regards, Andrey Borodin.


v8-0001-Add-sort-support-for-point-gist_point_sortsupport.patch
Description: Binary data


v8-0002-Implement-GiST-build-using-sort-support.patch
Description: Binary data

Quick doc patch

2020-07-07 Thread Guillaume Lelarge

Hi,

Here is a quick issue I found on the BRIN documentation. I'm not a 100%
sure I'm right but it looks like a failed copy/paste from the GIN
documentation.

Cheers.


-- 
Guillaume.
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 4c5eeb875f..55b6272db6 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -585,7 +585,7 @@ typedef struct BrinOpcInfo
 
   
Since both key extraction of indexed values and representation of the
-   key in GIN are flexible, they may depend on
+   key in BRIN are flexible, they may depend on
user-specified parameters.

Re: Question: PostgreSQL on Amazon linux EC2

2020-07-07 Thread Flavio Henrique Araque Gurgel

> Thank you Flavio, this is helpful.
>
> Have you faced any other challenges or any other problems after installing
> Postgres?
>

No problem related to software installation whatsoever, you'll need to take
care of the same points as any Postgres usage and some cloud specific care
like replication between geographically separated instances (what they call
availability zones) and disk performance.
I suggest you to try it and read a bit, you'll find a lot of feedback from
people that already did it on the internet and AWS documentation is key
when dealing with their hardware specifics. I only learned how to do some
stuff their way when I tried, even with AWS premium support at hand.
Because every database is different.
Questions on this list are better handled when you have a more specific
question with a problem you're experiencing (like your first one) and the
hackers list is more aimed at Postgres development, post user questions on
pgsql-general.

Flavio Gurgel

Re: Implementing Incremental View Maintenance

2020-07-07 Thread Tatsuo Ishii

>> Query checks for following restrictions are added:
> 
> 
> Are all known supported cases listed below?

They are "restrictions" and are not supported.
> 
>> - inheritance parent table
>> ...
>> - targetlist containing IVM column
>> - simple subquery is only supported
>>
> 
> How to understand 3 items above?

The best way to understand them is looking into regression test.
src/test/regress/expected/incremental_matview.out.

>> - inheritance parent table
-- inheritance parent is not supported with IVM"
BEGIN;
CREATE TABLE parent (i int, v int);
CREATE TABLE child_a(options text) INHERITS(parent);
CREATE INCREMENTAL MATERIALIZED VIEW  mv_ivm21 AS SELECT * FROM parent;
ERROR:  inheritance parent is not supported on incrementally maintainable 
materialized view

>> - targetlist containing IVM column

-- tartget list cannot contain ivm clumn that start with '__ivm'
CREATE INCREMENTAL MATERIALIZED VIEW  mv_ivm28 AS SELECT i AS "__ivm_count__" 
FROM mv_base_a;
ERROR:  column name __ivm_count__ is not supported on incrementally 
maintainable materialized view

>> - simple subquery is only supported
-- subquery is not supported with outer join
CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a 
a LEFT JOIN (SELECT * FROM mv_base_b) b ON a.i=b.i;
ERROR:  this query is not allowed on incrementally maintainable materialized 
view
HINT:  subquery is not supported with outer join

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-07 Thread Amit Langote

Hi Bharath,

On Mon, Jun 29, 2020 at 2:21 PM Bharath Rupireddy
 wrote:
> For Copy From Binary files, there exists below information for each tuple/row.
> 1. field count(number of columns)
> 2. for every field, field size(column data length)
> 3. field data of field size(actual column data)
>
> Currently, all the above data required at each step is read directly from 
> file using fread() and this happens for all the tuples/rows.
>
> One observation is that in the total execution time of a copy from binary 
> file, the fread() call is taking upto 20% of time and the fread() function 
> call count is also too high.
>
> For instance, with a dataset of size 5.3GB, 10million tuples with 10 columns,
> total exec time in sec total time taken for fread() fread() function call 
> count
> 101.193 21.33 21005
> 101.345 21.436 21005
>
> The total time taken for fread() and the corresponding function call count 
> may increase if we have more number of columns for instance 1000.
>
> One solution to this problem is to read data from binary file in 
> RAW_BUF_SIZE(64KB) chunks to avoid repeatedly calling fread()(thus possibly 
> avoiding few disk IOs). This is similar to the approach followed for csv/text 
> files.

I agree that having the buffer in front of the file makes sense,
although we do now have an extra memcpy, that is, from raw_buf to
attribute_buf.data.  Currently, fread() reads directly into
attribute_buf.data.  But maybe that's okay as I don't see the new copy
being all that bad.

> Attaching a patch, implementing the above solution for binary format files.
>
> Below is the improvement gained.
> total exec time in sec total time taken for fread() fread() function call 
> count
> 75.757 2.73 160884
> 75.351 2.742 160884
>
> Execution is 1.36X times faster, fread() time is reduced by 87%, fread() call 
> count is reduced by 99%.
>
> Request the community to take this patch for review if this approach and 
> improvement seem beneficial.
>
> Any suggestions to improve further are most welcome.

Noticed the following misbehaviors when trying to test the patch:

create table foo5 (a text, b text, c text, d text, e text);
insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc',
'd', 'eee' from generate_series(1, 1000);
copy foo5 to '/tmp/foo5.bin' binary;
truncate foo5;
copy foo5 from '/tmp/foo5.bin' binary;
ERROR:  unexpected EOF in COPY data
CONTEXT:  COPY foo5, line 33, column a

create table bar (a numeric);
insert into bar select sqrt(a) from generate_series(1, 1) a;
copy bar to '/tmp/bar.bin' binary;
copy bar from '/tmp/bar.bin' binary;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Trying to figure what was going wrong in each of these cases, I found
the new code a bit hard to navigate and debug :(. Here are a couple of
points that I think could have made things a bit easier:

* Avoid spreading the new buffering logic in multiple existing
functions, with similar pieces of code repeated in multiple places.  I
would add a single new function that takes care of the various
buffering details and call it where CopyGetData() is being used
currently.

* You could've reused CopyLoadRawBuffer()'s functionality instead of
reimplementing it.  I also see multiple instances of special case
handling, which often suggests that bugs are lurking.

Considering these points, I came up with the attached patch with a
much smaller footprint.  As far as I can see, it implements the same
basic idea as your patch.  With it, I was able to see an improvement
in loading time consistent with your numbers.  I measured the time of
loading 10 million rows into tables with 5, 10, 20 text columns as
follows:

create table foo5 (a text, b text, c text, d text, e text);
insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc',
'd', 'eee' from generate_series(1, 1000);
copy foo5 to '/tmp/foo5.bin' binary;
truncate foo5;
copy foo5 from '/tmp/foo5.bin' binary;

create table foo10 (a text, b text, c text, d text, e text, f text, g
text, h text, i text, j text);
insert into foo10 select repeat('a', (random()*100)::int), 'bbb',
'cc', 'd', 'eee', 'f', 'gg', 'hh', 'i', 'jjj' from generate_series(1,
1000);
copy foo10 to '/tmp/foo10.bin' binary;
truncate foo10;
copy foo10 from '/tmp/foo10.bin' binary;

create table foo20 (a text, b text, c text, d text, e text, f numeric,
g text, h text, i text, j text, k text, l text, m text, n text, o
text, p text, q text, r text, s text, t text);
insert into foo20 select repeat('a', (random()*100)::int), 'bbb',
'cc', 'd', 'eee', '123.456', 'gg', 'hh', 'ii', '', 'kkk', '',
'mm', 'n', 'o', '', 'q', 'r', 'ss', '' from
generate_series(1, 1000);
copy foo20 to '/tmp/foo20.bin' binary;
truncate foo20;
copy foo20 from '/tmp/foo20.bin' binary;

The median times for the COPY

Re: Add Information during standby recovery conflicts

2020-07-07 Thread Masahiko Sawada

On Mon, 6 Jul 2020 at 15:42, Drouvot, Bertrand  wrote:
>
>
> On 7/3/20 4:20 AM, Masahiko Sawada wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> >
> >
> >
> > On Wed, 1 Jul 2020 at 21:52, Drouvot, Bertrand  wrote:
> >> Hey,
> >>
> >> On 6/29/20 10:55 AM, Masahiko Sawada wrote:
> >>> CAUTION: This email originated from outside of the organization. Do not 
> >>> click links or open attachments unless you can confirm the sender and 
> >>> know the content is safe.
> >>>
> >>>
> >>>
> >>> On Thu, 18 Jun 2020 at 16:28, Drouvot, Bertrand  
> >>> wrote:
>  Hi hackers,
> 
>  I've attached a patch to add information during standby recovery 
>  conflicts.
>  The motive behind is to be able to get some information:
> 
>  On the apply side
>  On the blocker(s) side
> 
>  Motivation:
> 
>  When a standby recovery conflict occurs it could be useful to get more 
>  information to be able to dive deep on the root cause and find a way to 
>  avoid/mitigate new occurrences.
> >>> I think this is a good feature. Like log_lock_waits, it will help the
> >>> users to investigate recovery conflict issues.
> >> exactly, thanks for looking at it.
>  Adding this information would make the investigations easier, it could 
>  help answering questions like:
> 
>  On which LSN was the WAL apply blocked?
>  What was the purpose of the bocked WAL record?
>  On which relation (if any) was the blocked WAL record related to?
>  What was the blocker(s) doing?
>  When did the blocker(s) started their queries (if any)?
>  What was the blocker(s) waiting for? on which wait event?
> 
>  Technical context and proposal:
> 
>  There is 2 points in this patch:
> 
>  Add the information about the blocked WAL record. This is done in 
>  standby.c (ResolveRecoveryConflictWithVirtualXIDs, 
>  ResolveRecoveryConflictWithDatabase, StandbyTimeoutHandler)
> >>> I think we already have the information about the WAL record being
> >>> applied in errcontext.
> >> right, but it won’t be displayed in case log_error_verbosity is set to
> >> terse.
> > Yes. I think in this case errcontext or errdetail is more appropriate
> > for this information in this case. Otherwise, we will end up emitting
> > same WAL record information twice in log_error_verbosity = verbose.
> >
> > For instance, here is the server logs when I tested a recovery
> > conflict on buffer pin but it looks redundant:
> >
> > 2020-07-03 11:01:15.339 JST [60585] LOG:  wal record apply is blocked
> > by 1 connection(s), reason: User query might have needed to see row
> > versions that must be removed.
> > 2020-07-03 11:01:15.339 JST [60585] CONTEXT:  WAL redo at 0/30025E0
> > for Heap2/CLEAN: remxid 505
> > 2020-07-03 11:01:15.339 JST [60585] LOG:  blocked wal record rmgr:
> > Heap2, lsn: 0/030025E0, received at: 2020-07-03 11:01:15.338997+09,
> > desc: CLEAN, relation: rel 1663/12930/16384 fork main blk 0
> > 2020-07-03 11:01:15.339 JST [60585] CONTEXT:  WAL redo at 0/30025E0
> > for Heap2/CLEAN: remxid 505
> > 2020-07-03 11:01:15.347 JST [60604] LOG:  about to be interrupted:
> > backend_type: client backend, state: active, wait_event: PgSleep,
> > query_start: 2020-07-03 11:01:14.337708+09
> > 2020-07-03 11:01:15.347 JST [60604] STATEMENT:  select pg_sleep(30);
> > 2020-07-03 11:01:15.347 JST [60604] ERROR:  canceling statement due to
> > conflict with recovery
> > 2020-07-03 11:01:15.347 JST [60604] DETAIL:  User query might have
> > needed to see row versions that must be removed.
> > 2020-07-03 11:01:15.347 JST [60604] STATEMENT:  select pg_sleep(30);
> >
> > There are the information WAL record three times and the reason for
> > interruption twice.
>
>
> Fully makes sense, the new patch version attached is now producing:
>
> 2020-07-06 06:10:36.022 UTC [14035] LOG:  waiting for recovery conflict
> on snapshot

How about adding the subject? that is, "recovery is waiting for
recovery conflict on %s" or "recovery process  is waiting for
conflict on %s".

> 2020-07-06 06:10:36.022 UTC [14035] DETAIL:  WAL record received at
> 2020-07-06 06:10:36.021963+00.
>  Tablespace/database/relation are 1663/13586/16672, fork is main
> and block is 0.
>  There is 1 blocking connection(s).

To follow the existing log message, perhaps this can be something like
"WAL record received at %s, rel %u/%u/%u, fork %s, blkno %u. %d
processes". But I'm not sure the errdetail is the best place to
display the WAL information as I mentioned in the latter part of this
email.

> 2020-07-06 06:10:36.022 UTC [14035] CONTEXT:  WAL redo at 0/3A05708 for
> Heap2/CLEAN: remxid 972
>
> It does not provide the pid(s) of the blocking processes as they'll
> appear during the interruption(s).
>
> >> Or did you had in mind to try to avoid using the new
> >> “c

Re: Quick doc patch

2020-07-07 Thread Daniel Gustafsson

> On 7 Jul 2020, at 09:17, Guillaume Lelarge  wrote:

> Here is a quick issue I found on the BRIN documentation. I'm not a 100% sure 
> I'm right but it looks like a failed copy/paste from the GIN documentation.

I agree, it looks like a copy-pasteo in 15cb2bd2700 which introduced the
paragraph for both GIN and BRIN.  LGTM.  Adding Alexander who committed in on
cc.

cheers ./daniel

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-07 Thread Amit Langote

On Tue, Jul 7, 2020 at 4:28 PM Amit Langote  wrote:
> The median times for the COPY FROM commands above, with and without
> the patch, are as follows:
>
> HEADpatched
> foo58.5 6.5
> foo10   14  10
> foo20   25  18

Sorry, I forgot to mention that these times are in seconds.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-07 Thread Magnus Hagander

On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila  wrote:

> On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada
>  wrote:
> >
> > On Mon, 6 Jul 2020 at 20:45, Amit Kapila 
> wrote:
> > >
> > > > Second, as long as the unique identifier is the slot name there is no
> > > > convenient way to distinguish between the same name old and new
> > > > replication slots, so the backend process or wal sender process sends
> > > > a message to the stats collector to drop the replication slot at
> > > > ReplicationSlotDropPtr(). This strategy differs from what we do for
> > > > table, index, and function statistics. It might not be a problem but
> > > > I’m thinking a better way.
> > > >
> > >
> > > Can we rely on message ordering in the transmission mechanism (UDP)
> > > for stats?  The wiki suggests [1] we can't.  If so, then this might
> > > not work.
> >
> > Yeah, I'm also concerned about this. Another idea would be to have
> > another unique identifier to distinguish old and new replication slots
> > with the same name. For example, creation timestamp. And then we
> > reclaim the stats of unused slots later like table and function
> > statistics.
> >
>
> So, we need to have 'creation timestamp' as persistent data for slots
> to achieve this?  I am not sure of adding creation_time as a parameter
> to identify for this case because users can change timings on systems
> so it might not be a bullet-proof method but I agree that it can work
> in general.
>

If we need them to be persistent across time like that, perhaps we simply
need to assign oids to replication slots? That might simplify this problem
quite a bit?


> On the other hand, if the ordering were to be reversed, we would miss
> > that stats but the next stat reporting would create the new entry. If
> > the problem is unlikely to happen in common case we can live with
> > that.
> >
>
> Yeah, that is a valid point and I think otherwise also some UDP
> packets can be lost so maybe we don't need to worry too much about
> this.  I guess we can add a comment in the code for such a case.
>

The fact that we may in theory lose some packages over UDP is the main
reason we're using UDP in the first place, I believe :) But it's highly
unlikely to happen in the real world I believe (and I think on some
platforms impossible).


> > > Aside from the above, this patch will change the most of the changes
> > > > introduced by commit 9290ad198b1 and introduce new code much. I’m
> > > > concerned whether such changes are acceptable at the time of beta 2.
> > > >
> > >
> > > I think it depends on the final patch. My initial thought was that we
> > > should do this for PG14 but if you are suggesting removing the changes
> > > done by commit 9290ad198b1 then we need to think over it.  I could
> > > think of below options:
> > > a. Revert 9290ad198b1 and introduce stats for spilling in PG14.  We
> > > were anyway having spilling without any work in PG13 but didn’t have
> > > stats.
> > > b. Try to get your patch in PG13 if we can, otherwise, revert the
> > > feature 9290ad198b1.
> > > c. Get whatever we have in commit 9290ad198b1 for PG13 and
> > > additionally have what we are discussing here for PG14.  This means
> > > that spilled stats at slot level will be available in PG14 via
> > > pg_stat_replication_slots and for individual WAL senders it will be
> > > available via pg_stat_replication both in PG13 and PG14.  Even if we
> > > can get your patch in PG13, we can still keep those in
> > > pg_stat_replication.
> > > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it
> > > for PG14.  I don't think this will be a popular approach.
> >
> > I was thinking option (a) or (b). I'm inclined to option (a) since the
> > PoC patch added a certain amount of new codes. I agree with you that
> > it depends on the final patch.
> >
>
> Magnus, Tomas, others, do you have any suggestions on the above
> options or let us know if you have any other option in mind?
>
>
I have a feeling it's far too late for (b) at this time. Regardless of the
size of the patch, it feels that this can end up being a rushed and not
thought-through-all-the-way one, in which case we may end up in an even
worse position.

Much as I would like to have these stats earlier, I'm also
leaning towards (a).

//Magnus

RE: archive status ".ready" files may be created too early

2020-07-07 Thread matsumura....@fujitsu.com

Hello, Horiguchi-san

At Monday, July 6, 2020 05:13:40 +,  "Kyotaro Horiguchi 
" wrote in
> > > after WAL buffer is filled up to the requested position. So when it
> > > crosses segment boundary we know the all past corss segment-boundary
> > > records are stable. That means all we need to remember is only the
> > > position of the latest corss-boundary record.
> > 
> > I could not agree. In the following case, it may not work well.
> > - record-A and record-B (record-B is a newer one) is copied, and
> > - lastSegContRecStart/End points to record-B's, and
> > - FlushPtr is proceeded to in the middle of record-A.
>
> IIUC, that means record-B is a cross segment-border record and we hav e
> flushed beyond the recrod-B. In that case crash recovery afterwards
> can read the complete record-B and will finish recovery *after* the
> record-B. That's what we need here.

I'm sorry I didn't explain enough.

Record-A and Record-B are cross segment-border records.
Record-A spans segment X and X+1
Record-B spans segment X+2 and X+3.
If both records have been inserted to WAL buffer, lastSegContRecStart/End 
points to Record-B.
If a writer flushes upto the middle of segment-X+1, NotifyStableSegments() 
allows the writer to notify segment-X.
Is my understanding correct?

Regards
Ryo Matsumrua

Re: Performing partition pruning using row value

2020-07-07 Thread Etsuro Fujita

Kato-san,

On Mon, Jul 6, 2020 at 5:25 PM kato-...@fujitsu.com
 wrote:
> I would like to ask about the conditions under which partition pruning is 
> performed.
> In PostgreSQL 12, when I executed following SQL, partition pruning is not 
> performed.
>
> postgres=# explain select * from a where (c1, c2) < (99, 99);
>QUERY PLAN
> 
>  Append  (cost=0.00..60.00 rows=800 width=40)
>->  Seq Scan on a1 a_1  (cost=0.00..28.00 rows=400 width=40)
>  Filter: (ROW(c1, c2) < ROW(99, 99))
>->  Seq Scan on a2 a_2  (cost=0.00..28.00 rows=400 width=40)
>  Filter: (ROW(c1, c2) < ROW(99, 99))
> (5 rows)
>
> However, pruning is performed when I changed the SQL as follows.
>
> postgres=# explain select * from a where c1  < 99 and c2 < 99;
>QUERY PLAN
> 
>  Seq Scan on a1 a  (cost=0.00..28.00 rows=133 width=40)
>Filter: ((c1 < 99) AND (c2 < 99))
> (2 rows)

Just to be clear, the condition (c1, c2) < (99, 99) is not equivalent
to the condition c1 < 99 and c2 < 99 (see the documentation note in
[1]).

> Looking at the code, "(c1, c2) < (99, 99)" is recognized as RowCompExpr and 
> "c1 < 99 and c2 < 99" is recognized combination of OpExpr.
>
> Currently, pruning is not performed for RowCompExpr, is this correct?

Yeah, I think so.

> Because it would take a long time to parse all Expr nodes, does 
> match_cluause_to_partition_key() return PART_CLAUSE_UNSUPPORTED when such 
> Expr node is passed?

I don't know the reason why that function doesn't support row-wise
comparison, but I don't think the main reason for that is that it
takes time to parse expressions.

> If the number of args in RowCompExpr is small, I would think that expanding 
> it would improve performance.

Yeah, I think it's great to support row-wise comparison not only with
the small number of args but with the large number of them.

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/docs/current/functions-comparisons.html#ROW-WISE-COMPARISON

Re: Quick doc patch

2020-07-07 Thread Michael Paquier

On Tue, Jul 07, 2020 at 09:58:59AM +0200, Daniel Gustafsson wrote:
> I agree, it looks like a copy-pasteo in 15cb2bd2700 which introduced the
> paragraph for both GIN and BRIN.  LGTM.  Adding Alexander who committed in on
> cc.

+1.
--
Michael


signature.asc
Description: PGP signature

Re: warnings for invalid function casts

2020-07-07 Thread Peter Eisentraut


On 2020-07-04 16:16, Tom Lane wrote:

Peter Eisentraut  writes:

Do people prefer a typedef or just writing it out, like it's done in the
Python code?


I'm for a typedef.  There is *nothing* readable about "(void (*) (void))",
and the fact that it's theoretically incorrect for the purpose doesn't
exactly aid intelligibility either.  With a typedef, not only are
the uses more readable but there's a place to put a comment explaining
that this is notionally wrong but it's what gcc specifies to use
to suppress thus-and-such warnings.


Makes sense.  New patch here.


But if we prefer a typedef then I'd propose
GenericFuncPtr like in the initial patch.


That name is OK by me.


I changed that to pg_funcptr_t, to look a bit more like C and less like 
Java. ;-)


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 96149ead0af749592d3f2eb87929533ceb416d76 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 7 Jul 2020 11:41:09 +0200
Subject: [PATCH v3] Fix -Wcast-function-type warnings

Three groups of issues needed to be addressed:

load_external_function() and related functions returned PGFunction,
even though not necessarily all callers are looking for a function of
type PGFunction.  Since these functions are really just wrappers
around dlsym(), change to return void * just like dlsym().

In dynahash.c, we are using strlcpy() were a function with a signature
like memcpy() is expected.  This should be safe, as the new comment
there explains, but the cast needs to be augmented to avoid the
warning.

In PL/Python, methods all need to be cast to PyCFunction, per Python
API, but this now runs afoul of these warnings.  (This issue also
exists in core CPython.)

To fix the second and third case, we add a new type pg_funcptr_t that
is defined specifically so that gcc accepts it as a generic function
pointer that can be cast to any other function pointer.

Also add -Wcast-function-type to the standard warning flags, subject
to configure check.

Discussion: 
https://www.postgresql.org/message-id/flat/1e97628e-6447-b4fd-e230-d109cec2d584%402ndquadrant.com
---
 configure | 91 +++
 configure.in  |  2 +
 src/backend/utils/fmgr/dfmgr.c| 14 ++---
 src/backend/utils/hash/dynahash.c | 11 +++-
 src/include/c.h   |  9 +++
 src/include/fmgr.h|  6 +-
 src/pl/plpython/plpy_plpymodule.c | 14 ++---
 7 files changed, 129 insertions(+), 18 deletions(-)

diff --git a/configure b/configure
index 2feff37fe3..9907637e31 100755
--- a/configure
+++ b/configure
@@ -5643,6 +5643,97 @@ if test 
x"$pgac_cv_prog_CXX_cxxflags__Wimplicit_fallthrough_3" = x"yes"; then
 fi
 
 
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports 
-Wcast-function-type, for CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports -Wcast-function-type, for 
CFLAGS... " >&6; }
+if ${pgac_cv_prog_CC_cflags__Wcast_function_type+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${CFLAGS} -Wcast-function-type"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__Wcast_function_type=yes
+else
+  pgac_cv_prog_CC_cflags__Wcast_function_type=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_prog_CC_cflags__Wcast_function_type" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__Wcast_function_type" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__Wcast_function_type" = x"yes"; then
+  CFLAGS="${CFLAGS} -Wcast-function-type"
+fi
+
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports 
-Wcast-function-type, for CXXFLAGS" >&5
+$as_echo_n "checking whether ${CXX} supports -Wcast-function-type, for 
CXXFLAGS... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__Wcast_function_type+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CXXFLAGS=$CXXFLAGS
+pgac_save_CXX=$CXX
+CXX=${CXX}
+CXXFLAGS="${CXXFLAGS} -Wcast-function-type"
+ac_save_cxx_werror_flag=$ac_cxx_werror_flag
+ac_cxx_werror_flag=yes
+ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS 
conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  pgac_cv_prog_CXX_cxxflags__Wcast_function_type=yes
+else
+  pgac_cv_prog_CXX_cxxflags__Wcast_function_type=no
+fi
+rm -f core conftes

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Amit Kapila

On Tue, Jul 7, 2020 at 9:18 AM Jeff Davis  wrote:
>
> On Mon, 2020-07-06 at 15:59 +0530, Amit Kapila wrote:
> > I agree that it's good to wait for actual problems. But the
> > > challenge
> > > is that we can't backport an added GUC.
> > >
> >
> > Is it because we won't be able to edit existing postgresql.conf file
> > or for some other reasons?
>
> Perhaps "can't" was too strong of a word, but I think it would be
> unprecedented to introduce a GUC in a minor version.
>

I don't think this is true.  We seem to have introduced three new guc
variables in a 9.3.3 minor release.  See the following entry in 9.3.3
release notes [1]: "Create separate GUC parameters to control
multixact freezing  Introduce new settings
vacuum_multixact_freeze_min_age, vacuum_multixact_freeze_table_age,
and autovacuum_multixact_freeze_max_age to control when to freeze
multixacts."

Apart from this, we have asked users to not only edit postgresql.conf
file but also update system catalogs.  See the fix for "Cope with the
Windows locale named "Norwegian (Bokmål)" [2] in 9.4.1 release.

There are other instances where we also suggest users to set gucs,
create new system objects (like views), perform DDL, DMLs, run REINDEX
on various indexes, etc. in the minor release.

[1] - https://www.postgresql.org/docs/release/9.3.3/
[2] - https://wiki.postgresql.org/wiki/Changes_To_Norwegian_Locale

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [Proposal] Global temporary tables

2020-07-07 Thread Pavel Stehule

Hi


> GTT Merge the latest PGMaster and resolves conflicts.
>
>
>
I tested it and it looks fine. I think it is very usable in current form,
but still there are some issues:

postgres=# create global temp table foo(a int);
CREATE TABLE
postgres=# insert into foo values(10);
INSERT 0 1
postgres=# alter table foo add column x int;
ALTER TABLE
postgres=# analyze foo;
WARNING:  reloid 16400 not support update attstat after add colunm
WARNING:  reloid 16400 not support update attstat after add colunm
ANALYZE

Please, can you summarize what is done, what limits are there, what can be
implemented hard, what can be implemented easily?



I found one open question - how can be implemented table locks - because
data is physically separated, then we don't need table locks as protection
against race conditions.

Now, table locks are implemented on a global level. So exclusive lock on
GTT in one session block insertion on the second session. Is it expected
behaviour? It is safe, but maybe it is too strict.

We should define what table lock is meaning on GTT.

Regards

Pavel


> Pavel
>
>
>> With Regards,
>> Prabhat Kumar Sahu
>> EnterpriseDB: http://www.enterprisedb.com
>>
>>
>>
>

Re: Multi-byte character case-folding

2020-07-07 Thread Daniel Verite

Tom Lane wrote:

> CREATE TABLE public."myÉclass" (
>f1 text
> );
> 
> If we start to case-fold É, then the only way to access this table will
> be by double-quoting its name, which the application probably is not
> expecting (else it would have double-quoted in the original CREATE TABLE).

This problem already exists when migrating from a mono-byte database
to a multi-byte database, since downcase_identifier()  does use
tolower() for mono-byte databases.

db9=# show server_encoding ;
 server_encoding 
-
 LATIN9
(1 row)

db9=# create table MYÉCLASS (f1 text);
CREATE TABLE

db9=# \d
  List of relations
 Schema |   Name   | Type  |  Owner   
+--+---+--
 public | myéclass | table | postgres
(1 row)

db9=# select * from MYÉCLASS;
 f1 

(0 rows)

pg_dump will dump this as

CREATE TABLE public."myéclass" (
f1 text
);

So far so good. But after importing this into an UTF-8 database,
the same "select * from MYÉCLASS" that used to work now fails:

u8=# show server_encoding ;
 server_encoding 
-
 UTF8
(1 row)

u8=# select * from MYÉCLASS;
ERROR:  relation "myÉclass" does not exist

The compromise that is mentioned in downcase_identifier() justifying
this inconsistency is not very convincing, because the issues in case
folding due to linguistic differences exist both in mono-byte and
multi-byte encodings. For instance, if it's fine to trust the locale
to downcase 'İ' in a LATIN5 db, it should be okay in a UTF-8 db too.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Re: Question: PostgreSQL on Amazon linux EC2

2020-07-07 Thread mailajaypatel

Thanks Flavio.

I believe trying is the best way forward. Thank you for the guidance.

Warm regards,

Ajay Patel



> On Jul 7, 2020, at 3:20 AM, Flavio Henrique Araque Gurgel  
> wrote:
> 
> 
> 
>> Thank you Flavio, this is helpful.
>> 
>> Have you faced any other challenges or any other problems after installing 
>> Postgres?
> 
> No problem related to software installation whatsoever, you'll need to take 
> care of the same points as any Postgres usage and some cloud specific care 
> like replication between geographically separated instances (what they call 
> availability zones) and disk performance.
> I suggest you to try it and read a bit, you'll find a lot of feedback from 
> people that already did it on the internet and AWS documentation is key when 
> dealing with their hardware specifics. I only learned how to do some stuff 
> their way when I tried, even with AWS premium support at hand. Because every 
> database is different.
> Questions on this list are better handled when you have a more specific 
> question with a problem you're experiencing (like your first one) and the 
> hackers list is more aimed at Postgres development, post user questions on 
> pgsql-general.
> 
> Flavio Gurgel
>

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread David Rowley

On Tue, 7 Jul 2020 at 16:57, Jeff Davis  wrote:
>
> On Sun, 2020-07-05 at 16:47 -0700, Peter Geoghegan wrote:
> > Where does that leave the hash_mem idea (or some other similar
> > proposal)?
>
> hash_mem is acceptable to me if the consensus is moving toward that,
> but I'm not excited about it.

FWIW, I'm not a fan of the hash_mem idea. It was my impression that we
aimed to provide an escape hatch for people we have become accustomed
to <= PG12 behaviour and hash_mem sounds like it's not that. Surely a
GUC by that name would control what Hash Join does too?  Otherwise, it
would be called hashagg_mem. I'd say changing the behaviour of Hash
join is not well aligned to the goal of allowing users to get
something closer to what PG12 did.

I know there has been talk over the years to improve how work_mem
works. I see Tomas mentioned memory grants on the other thread [1]. I
do imagine this is the long term solution to the problem where users
must choose very conservative values for work_mem. We're certainly not
going to get that for PG13, so I do think what we need here is just a
simple escape hatch. I mentioned my thoughts in [2], so won't go over
it again here. Once we've improved the situation in some future
version of postgres, perhaps along the lines of what Tomas mentioned,
then we can get rid of the escape hatch.

Here are my reasons for not liking the hash_mem idea:

1. if it also increases the amount of memory that Hash Join can use
then that makes the partition-wise hash join problem of hash_mem *
npartitions even bigger when users choose to set hash_mem higher than
work_mem to get Hash Agg doing what they're used to.
2. Someone will one day ask for sort_mem and then materialize_mem.
Maybe then cte_mem. Once those are done we might as well just add a
GUC to control every executor node that uses work_mem.
3. I'm working on a Result cache node [3]. It uses a hash table
internally. Should it constraint its memory consumption according to
hash_mem or work_mem? It's not really that obvious to people that it
internally uses a hash table. "Hash" does not appear in the node name.
Do people need to look that up in the documents?

David

[1] 
https://www.postgresql.org/message-id/20200626235850.gvl3lpfyeobu4evi@development
[2] 
https://www.postgresql.org/message-id/CAApHDvqFZikXhAGW=ukzkq1_fzhy+xzmuzajinj6rwythh4...@mail.gmail.com
[3] 
https://www.postgresql.org/message-id/caaphdvrpcqyqdwergywx8j+2dlungxu+fosbq1uscxrunyx...@mail.gmail.com

Re: Creating a function for exposing memory usage of backend process

2020-07-07 Thread torikoshia


On 2020-07-06 15:16, Fujii Masao wrote:

On 2020/07/06 12:12, torikoshia wrote:
On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao 
 wrote:


Thanks for your review!


I like more specific name like pg_backend_memory_contexts.


Agreed.

When I was trying to add this function as statistics function,
I thought that naming pg_stat_getbackend_memory_context()
might make people regarded it as a "per-backend statistics
function", whose parameter is backend ID number.
So I removed "backend", but now there is no necessity to do
so.


But I'd like to hear more opinions about the name from others.


I changed the name to pg_backend_memory_contexts for the time
being.


+1



- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c



+       Identification information of the memory context. This field 
is truncated if the identification field is longer than 1024 
characters


"characters" should be "bytes"?


Fixed, but I used "characters" while referring to the
descriptions on the manual of pg_stat_activity.query
below.

| By default the query text is truncated at 1024 characters;

It has nothing to do with this thread, but considering
multibyte characters, it also may be better to change it
to "bytes".


Yeah, I agree we should write the separate patch fixing that. You will?
If not, I will do that later.


Thanks, I will try it!


Regarding the other comments, I revised the patch as you pointed.


Thanks for updating the patch! The patch basically looks good to me/
Here are some minor comments:

+#define MEMORY_CONTEXT_IDENT_SIZE  1024

This macro varible name sounds like the maximum allowed length of ident 
that
each menory context has. But actually this limits the maximum bytes of 
ident
to display. So I think that it's better to rename this macro to 
something like

MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?


Agreed.
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE seems more accurate.


+#define PG_GET_MEMORY_CONTEXTS_COLS9
+   Datum   values[PG_GET_MEMORY_CONTEXTS_COLS];
+   boolnulls[PG_GET_MEMORY_CONTEXTS_COLS];

This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
for the consistency with the function name?


Thanks! Fixed it.



+{ oid => '2282', descr => 'statistics: information about all memory
contexts of local backend',

Isn't it better to remove "statistics: " from the above description?


Yeah, it's my oversight.



+ 
+  role="column_definition">

+   parent text

There can be multiple memory contexts with the same name. So I'm afraid
that it's difficult to identify the actual parent memory context from 
this

"parent" column. This is ok when logging memory contexts by calling
MemoryContextStats() via gdb. Because child memory contexts are printed
just under their parent, with indents. But this doesn't work in the 
view.
To identify the actual parent memory or calculate the memory contexts 
tree

from the view, we might need to assign unique ID to each memory context
and display it. But IMO this is overkill. So I'm fine with current 
"parent"

column. Thought? Do you have any better idea?


Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.

We show each context using a recursive function and this is
a kind of depth-first search. So, as far as I understand,
we can identify its parent using both the "parent" column
and the order of the rows.

Documenting these things may worth for who want to identify
the relation between parents and children.

Of course, in the relational model, the order of relation
does not have meaning so it's also unusual in this sense..


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 7 Jul 2020 21:20:29 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

This patch implements a new SQL-callable function
pg_get_backend_memory_contexts which exposes memory usage of the
local backend.
It also adds a new view pg_backend_memory_contexts for exposing
local backend memory contexts.
---
 doc/src/sgml/catalogs.sgml   | 126 +
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 132 ++-
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 ++
 5 files changed, 279 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 003d278370..174717616b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9248,6 +9248,11 @@ SCRAM-SHA-256$:&l
   materialized views
  
 
+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_policies
   policies
@@ -10526,6 +10531,127 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_backend_memory_context

Re: Multi-byte character case-folding

2020-07-07 Thread Tom Lane

"Daniel Verite"  writes:
>   Tom Lane wrote:
>> If we start to case-fold É, then the only way to access this table will
>> be by double-quoting its name, which the application probably is not
>> expecting (else it would have double-quoted in the original CREATE TABLE).

> This problem already exists when migrating from a mono-byte database
> to a multi-byte database, since downcase_identifier()  does use
> tolower() for mono-byte databases.

Sure, but that's a tiny minority of use-cases.  In particular it would
not bite you after a straight upgrade to a new PG version.

[ thinks... ]  Wait, actually the described case would occur if you
migrated *from* UTF8 (no folding) to LATINn (with folding).  That's
gotta be an even tinier minority.  Migration to UTF8 would show
different, though perhaps just as annoying, symptoms.

Anyway, I freely concede that I'm ill-equipped to judge how annoying
this is, since I don't program in any languages where it'd make a
difference.  But we mustn't fool ourselves: changing this would be
just as dangerous as the standard_conforming_strings changeover was.
I'm not really convinced it's worth it.  In particular, I don't find
the "it's required by the standard" argument convincing.  The standard
requires us to fold to upper case, too, but we've long ago decided to
just say no to that.  (Which reminds me: there are extensive threads in
the archives analyzing whether it's practical to support more than one
folding behavior.  Those discussions would likely be relevant here.)

regards, tom lane

Re: Added tab completion for the missing options in copy statement

2020-07-07 Thread ahsan hadi

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Tested the tab complete for copy command, it provides the tab completion after 
providing the "TO|FROM filename With|Where". Does this require any doc change?

The new status of this patch is: Waiting on Author

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Pavel Stehule

út 7. 7. 2020 v 14:55 odesílatel David Rowley  napsal:

> On Tue, 7 Jul 2020 at 16:57, Jeff Davis  wrote:
> >
> > On Sun, 2020-07-05 at 16:47 -0700, Peter Geoghegan wrote:
> > > Where does that leave the hash_mem idea (or some other similar
> > > proposal)?
> >
> > hash_mem is acceptable to me if the consensus is moving toward that,
> > but I'm not excited about it.
>
> FWIW, I'm not a fan of the hash_mem idea. It was my impression that we
> aimed to provide an escape hatch for people we have become accustomed
> to <= PG12 behaviour and hash_mem sounds like it's not that. Surely a
> GUC by that name would control what Hash Join does too?  Otherwise, it
> would be called hashagg_mem. I'd say changing the behaviour of Hash
> join is not well aligned to the goal of allowing users to get
> something closer to what PG12 did.
>
> I know there has been talk over the years to improve how work_mem
> works. I see Tomas mentioned memory grants on the other thread [1]. I
> do imagine this is the long term solution to the problem where users
> must choose very conservative values for work_mem. We're certainly not
> going to get that for PG13, so I do think what we need here is just a
> simple escape hatch. I mentioned my thoughts in [2], so won't go over
> it again here. Once we've improved the situation in some future
> version of postgres, perhaps along the lines of what Tomas mentioned,
> then we can get rid of the escape hatch.
>
> Here are my reasons for not liking the hash_mem idea:
>
> 1. if it also increases the amount of memory that Hash Join can use
> then that makes the partition-wise hash join problem of hash_mem *
> npartitions even bigger when users choose to set hash_mem higher than
> work_mem to get Hash Agg doing what they're used to.
> 2. Someone will one day ask for sort_mem and then materialize_mem.
> Maybe then cte_mem. Once those are done we might as well just add a
> GUC to control every executor node that uses work_mem.
> 3. I'm working on a Result cache node [3]. It uses a hash table
> internally. Should it constraint its memory consumption according to
> hash_mem or work_mem? It's not really that obvious to people that it
> internally uses a hash table. "Hash" does not appear in the node name.
> Do people need to look that up in the documents?
>

+1

I share your opinion.



> David
>
> [1]
> https://www.postgresql.org/message-id/20200626235850.gvl3lpfyeobu4evi@development
> [2]
> https://www.postgresql.org/message-id/CAApHDvqFZikXhAGW=ukzkq1_fzhy+xzmuzajinj6rwythh4...@mail.gmail.com
> [3]
> https://www.postgresql.org/message-id/caaphdvrpcqyqdwergywx8j+2dlungxu+fosbq1uscxrunyx...@mail.gmail.com
>
>
>

Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

2020-07-07 Thread Tom Lane

Bharath Rupireddy  writes:
> Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as
> on_shmem_exit call back which will
> add this function to the before_shmem_exit_list array which is
> supposed to be removed on pg_stop_backup
> so that we can do the pending cleanup and issue a warning for each
> pg_start_backup for which we did not call
> the pg_stop backup. Now, I executed a query for which JIT is picked
> up, then the the llvm compiler inserts it's
> own exit callback i.e. llvm_shutdown at the end of
> before_shmem_exit_list array. Now, I executed pg_stop_backup
> and call to cancel_before_shmem_exit() is made with the expectation
> that the nonexclusive_base_backup_cleanup
> callback is removed from before_shmem_exit_list array.

I'm of the opinion that the JIT code is abusing this mechanism, and the
right thing to do is fix that.  The restriction you propose to remove is
not just there out of laziness, it's an expectation about what safe use of
this mechanism would involve.  Un-ordered removal of callbacks seems
pretty risky: it would mean that whatever cleanup is needed is not going
to be done in LIFO order.

regards, tom lane

Re: Binary support for pgoutput plugin

2020-07-07 Thread Daniel Gustafsson

> On 7 Jul 2020, at 02:16, Dave Cramer  wrote:

> OK, rebased it down to 2 patches, attached.

I took a look at this patchset today.  The feature clearly seems like something
which we'd benefit from having, especially if it allows for the kind of
extensions that were discussed at the beginning of this thread.  In general I
think it's in pretty good shape, there are however a few comments:

The patch lacks any kind of test, which I think is required for it to be
considered for committing.  It also doesn't update the \dRs view in psql to
include the subbinary column which IMO it should.  I took the liberty to write
this as well as tests as I was playing with the patch, the attached 0003
contains this, while 0001 and 0002 are your patches included to ensure the
CFBot can do it's thing.  This was kind of thrown together to have something
while testing, so it definately need a once-over or two.

The comment here implies that unchanged is the default value for format, but
isn't this actually setting it to text formatted value?
+   /* default is unchanged */
+   tuple->format = palloc(natts * sizeof(char));
+   memset(tuple->format, 't', natts * sizeof(char));
Also, since the values member isn't memset() with a default, this seems a bit
misleading at best no?

For the rest of the backend we aren't including the defname in the errmsg like
what is done here.  Maybe we should, but I think that should be done
consistently if so, and we should stick to just "conflicting or redundant
options" for now.  At the very least, this needs a comma between "options" and
the defname and ideally the defname wrapped in \".
-errmsg("conflicting or redundant options")));
+errmsg("conflicting or redundant options %s already 
provided", defel->defname)));

These types of constructs are IMHO quite hard to read:
+ if(
+ #ifdef WORDS_BIGENDIAN
+ true
+ #else
+ false
+ #endif
+ != bigendian)
How about spelling out the statement completely for both cases, or perhaps
encapsulating the logic in a macro? Something like the below perhaps?
+ #ifdef WORDS_BIGENDIAN
+ if (bigendian != true)
+ #else
+ if (bigendian != false)
+ #endif

This change needs to be removed before a commit, just highlighting that here to
avoid it going unnoticed.
-/* #define WAL_DEBUG */
+#define WAL_DEBUG

Reading this I'm left wondering if we shoulnd't introduce macros for the kinds,
since we now compare with 'u', 't' etc in quite a few places and add comments
explaining the types everywhere.  A descriptive name would make it easier to
grep for all occurrences, and avoid the need for the comment lines.  Thats not
necesarily for this patch though, but an observation from reading it.

I found a few smaller nitpicks as well, some of these might go away by a
pg_indent run but I've included them all here regardless:

This change, and the subsequent whitespace removal later in the same function,
seems a bit pointless:
-   /* read the relation id */
relid = pq_getmsgint(in, 4);
-

Braces should go on the next line:
+   if (options->proto.logical.binary) {

This should be a C /* ...  */ comment, or perhaps just removed since the code
is quite self explanatory:
+   // default to false
+   *binary_basetypes = false;

Indentation here:
-errmsg("conflicting or redundant options")));
+   errmsg("conflicting or redundant options %s already 
provided", defel->defname)));

..as well as here (there are a few like this one):
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+   errmsg("incompatible datum size")));

Capitalization of "after" to make it a proper sentence:
+ * after we know that the subscriber is requesting binary check to make sure

Excessive whitespace and indentation in a few places, and not enough in some:
+   if (binary_given)
+   {
+   values[Anum_pg_subscription_subbinary - 1] =
...
+   if ( *binary_basetypes == true )
...
+   if (sizeof(int)  != int_size)
...
+   if( float4_byval !=
...
+ if (sizeof(long)  != long_size)
+ ereport(ERROR,
...
+   if (tupleData->format[remoteattnum] =='u')
...
+   bool   binary_basetypes;

That's all for now.

cheers ./daniel

0001-Add-binary-protocol-for-publications-and-subscriptio.patch
Description: Binary data

0002-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
Description: Binary data

0003-Add-psql-support-and-tests.patch
Description: Binary data

Re: posgres 12 bug (partitioned table)

2020-07-07 Thread Amit Langote

Hi Soumyadeep,

Thanks for picking this up.

On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
 wrote:
> Upon investigation, it seems that the problem is caused by the
> following:
>
> The slot passed to the call to ExecProcessReturning() inside
> ExecInsert() is often a virtual tuple table slot.

Actually, not that often in practice.  The slot is not virtual, for
example, when inserting into a regular non-partitioned table.  Whether
or not it is virtual depends on the following piece of code in
ExecInitModifyTable():

mtstate->mt_scans[i] =
ExecInitExtraTupleSlot(mtstate->ps.state,
ExecGetResultType(mtstate->mt_plans[i]),

table_slot_callbacks(resultRelInfo->ri_RelationDesc));

Specifically, the call to table_slot_callbacks() above determines what
kind of slot is assigned for a given target relation.  For partitioned
tables, it happens to return a virtual slot currently, per this
implementation:

if (relation->rd_tableam)
tts_cb = relation->rd_tableam->slot_callbacks(relation);
else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
{
/*
 * Historically FDWs expect to store heap tuples in slots. Continue
 * handing them one, to make it less painful to adapt FDWs to new
 * versions. The cost of a heap slot over a virtual slot is pretty
 * small.
 */
tts_cb = &TTSOpsHeapTuple;
}
else
{
/*
 * These need to be supported, as some parts of the code (like COPY)
 * need to create slots for such relations too. It seems better to
 * centralize the knowledge that a heap slot is the right thing in
 * that case here.
 */
Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
   relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
tts_cb = &TTSOpsVirtual;
}

If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached).  In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.

> I have attached two alternate patches to solve the problem.

IMHO, they are solving the problem at the wrong place.  We should
really fix things so that the slot that gets passed down to
ExecProcessReturning() is of the correct type to begin with.  We could
do what I suggest above or maybe find some other way.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

partitioned-table-heap-slot.patch
Description: Binary data

Re: Proposal: Automatic partition creation

2020-07-07 Thread Robert Haas

On Mon, Jul 6, 2020 at 12:10 PM Tom Lane  wrote:
> We did indeed solve this in connection with window functions, cf
> 0a459cec9.  I may be misunderstanding what the problem is here,
> but I think trying to reuse that infrastructure might help.

Ah, nice. I didn't realize that we'd added that. But I'm not sure that
it helps here, because I think we need to compute the end of the
range, not just test whether something is in a range. Like, if someone
wants monthly range partitions starting on 2020-01-01, we need to be
able to figure out that the subsequent months start on 2020-02-01,
2020-03-01, 2020-04-01, etc. Is there a way to use in_range to achieve
that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: TLS checking in pgstat

2020-07-07 Thread Magnus Hagander

On Sun, Jun 28, 2020 at 1:39 PM Daniel Gustafsson  wrote:

> As I mentioned in [1], checking (struct Port)->ssl for NULL to determine
> whether TLS is used for connection is a bit of a leaky abstraction, as
> that's
> an OpenSSL specific struct member.  This sets the requirement that all TLS
> implementations use a pointer named SSL, and that the pointer is set to
> NULL in
> case of a failed connection, which may or may not fit.
>
> Is there a reason to not use (struct Port)->ssl_in_use flag which tracks
> just
> what we're looking for here?  This also maps against other parts of the
> abstraction in be-secure.c which do just that.  The attached implements
> this.
>

Yeah, this seems perfectly reasonable.

I would argue this is a bug, but given how internal it is I don't think it
has any user visible effects yet (since we don't have more than one
provider), and thus isn't worthy of a backpatch.

Pushed.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: pg_resetwal --next-transaction-id may cause database failed to restart.

2020-07-07 Thread Alvaro Herrera

On 2020-Jul-07, movead...@highgo.ca wrote:

>  >ISTM that a reasonable compromise is that if you use -x (or -c, -m, -O)
> >and the input value is outside the range supported by existing files,
> >then it's a fatal error; unless you use --force, which turns it into
> >just a warning.
>
> I do not think '--force' is a good choice, so I add a '--test, -t' option to
> force to write a unsafe value to pg_control.
> Do you think it is an acceptable method?

The rationale for this interface is unclear to me.  Please explain what
happens in each case?

In my proposal, we'd have:

* Bad value, no --force:
  - program raises error, no work done.
* Bad value with --force:
  - program raises warning but changes anyway.
* Good value, no --force:
  - program changes value without saying anything
* Good value with --force:
  - same

The rationale for this interface is convenient knowledgeable access: the
DBA runs the program with value X, and if the value is good, then
they're done.  If the program raises an error, DBA has a choice: either
run with --force because they know what they're doing, or don't do
anything because they know that they would make a mess.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Cache lookup errors with functions manipulation object addresses

2020-07-07 Thread Alvaro Herrera

On 2020-Jul-07, Michael Paquier wrote:

> On Mon, Jul 06, 2020 at 10:56:53AM -0400, Alvaro Herrera wrote:
> > I think "routine" was more correct.  We introduced that terminology to
> > encompass both functions and procedures, back when real procedures were
> > added.  See the definitions in the glossary for example.
> 
> Okay, fine by me, while we have getProcedureTypeDescription() working
> on pg_proc entries ;)

That's a holdover from old times, when we thought functions were
procedures.  That's no longer the case.




-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Proposal: Automatic partition creation

2020-07-07 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jul 6, 2020 at 12:10 PM Tom Lane  wrote:
>> We did indeed solve this in connection with window functions, cf
>> 0a459cec9.  I may be misunderstanding what the problem is here,
>> but I think trying to reuse that infrastructure might help.

> Ah, nice. I didn't realize that we'd added that. But I'm not sure that
> it helps here, because I think we need to compute the end of the
> range, not just test whether something is in a range.

Yeah, I was thinking about that later, and I agree that the in_range
support function doesn't quite do the job.  But we could expand on the
principle, and register addition (and subtraction?) functions as btree
support functions under the same rules as for in_range functions.

The reason in_range isn't just addition is that we wanted it to be able
to give correct answers even in cases where addition would overflow.
That's still valid for that use-case, but it doesn't apply here.

So it'd be something like "btree support function 4, registered under
amproclefttype x and amprocrighttype y, must have the signature
plus(x, y) returns x
and it gives results compatible with the opfamily's ordering of type x".
Similarly for subtraction if we think we need that.

I'm not sure if we need a formal notion of what "compatible results"
means, but it probably would be something like "if x < z according to the
opfamily sort ordering, then plus(x, y) < plus(z, y) for any given y".
Now this falls to the ground when y is a weird value like Inf or NaN,
but we'd want to exclude those as partitioning values anyway.  Do we
also need some datatype-independent way of identifying such "weird
values"?

regards, tom lane

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-07 Thread Dilip Kumar

On Mon, Jul 6, 2020 at 3:09 PM Amit Kapila  wrote:
>
> On Mon, Jul 6, 2020 at 11:44 AM Dilip Kumar  wrote:
> >
> > On Mon, Jul 6, 2020 at 11:31 AM Amit Kapila  wrote:
> > >
> >
> > > > > 10.  I have got the below failure once.  I have not investigated this
> > > > > in detail as the patch is still under progress.  See, if you have any
> > > > > idea?
> > > > > #   Failed test 'check extra columns contain local defaults'
> > > > > #   at t/013_stream_subxact_ddl_abort.pl line 81.
> > > > > #  got: '2|0'
> > > > > # expected: '1000|500'
> > > > > # Looks like you failed 1 test of 2.
> > > > > make[2]: *** [check] Error 1
> > > > > make[1]: *** [check-subscription-recurse] Error 2
> > > > > make[1]: *** Waiting for unfinished jobs
> > > > > make: *** [check-world-src/test-recurse] Error 2
> > > >
> > > > Even I got the failure once and after that, it did not reproduce.  I
> > > > have executed it multiple time but it did not reproduce again.  Are
> > > > you able to reproduce it consistently?
> > > >
> > >
> > > No, I am also not able to reproduce it consistently but I think this
> > > can fail if a subscriber sends the replay_location before actually
> > > replaying the changes.  First, I thought that extra send_feedback we
> > > have in apply_handle_stream_commit might have caused this but I guess
> > > that can't happen because we need the commit time location for that
> > > and we are storing the same at the end of apply_handle_stream_commit
> > > after applying all messages.  I am not sure what is going on here.  I
> > > think we somehow need to reproduce this or some variant of this test
> > > consistently to find the root cause.
> >
> > And I think it appeared first time for me,  so maybe either induced
> > from past few versions so some changes in the last few versions might
> > have exposed it.  I have noticed that almost 50% of the time I am able
> > to reproduce after the clean build so I can trace back from which
> > version it started appearing that way it will be easy to narrow down.
> >
>
> One more comment
> ReorderBufferLargestTopTXN
> {
> ..
> dlist_foreach(iter, &rb->toplevel_by_lsn)
>   {
>   ReorderBufferTXN *txn;
> + Size size = 0;
> + Size largest_size = 0;
>
>   txn = dlist_container(ReorderBufferTXN, node, iter.cur);
>
> - /* if the current transaction is larger, remember it */
> - if ((!largest) || (txn->size > largest->size))
> + /*
> + * If this transaction have some incomplete changes then only consider
> + * the size upto last complete lsn.
> + */
> + if (rbtxn_has_incomplete_tuple(txn))
> + size = txn->complete_size;
> + else
> + size = txn->total_size;
> +
> + /* If the current transaction is larger then remember it. */
> + if ((largest != NULL || size > largest_size) && size > 0)
>
> Here largest_size is a local variable inside the loop which is
> initialized to 0 in each iteration and that will lead to picking each
> next txn as largest.  This seems wrong to me.

You are right, will fix.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: warnings for invalid function casts

2020-07-07 Thread Tom Lane

Peter Eisentraut  writes:
> On 2020-07-04 16:16, Tom Lane wrote:
>> I'm for a typedef.  There is *nothing* readable about "(void (*) (void))",
>> and the fact that it's theoretically incorrect for the purpose doesn't
>> exactly aid intelligibility either.  With a typedef, not only are
>> the uses more readable but there's a place to put a comment explaining
>> that this is notionally wrong but it's what gcc specifies to use
>> to suppress thus-and-such warnings.

> Makes sense.  New patch here.

I don't have a compiler handy that emits these warnings, but this
passes an eyeball check.

>>> But if we prefer a typedef then I'd propose
>>> GenericFuncPtr like in the initial patch.

>> That name is OK by me.

> I changed that to pg_funcptr_t, to look a bit more like C and less like 
> Java. ;-)

I liked the first proposal better.  Not gonna fight about it though.

regards, tom lane

Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

2020-07-07 Thread Zidenberg, Tsahi

On 01/07/2020, 18:40, "Zidenberg, Tsahi"  wrote:

> Outline-atomics is a gcc compilation flag that adds runtime detection of 
> weather or not the cpu
> supports atomic instructions. CPUs that don't support atomic instructions 
> will use the old 
> load-exclusive/store-exclusive instructions. If a different compilation flag 
> defined an architecture
> that unconditionally supports atomic instructions (e.g. -march=armv8.2), the 
> outline-atomic flag
> will have no effect.
>
> The patch was tested to improve pgbench simple-update by 10% and sysbench 
> write-only by 3%
> on a 64-core armv8.2 machine (AWS m6g.16xlarge). Select-only and read-only 
> benchmarks were
> not significantly affected, and neither was performance on a 16-core armv8.0 
> machine that does
> not support atomic instructions (AWS a1.4xlarge).
>
> The patch uses an existing configure.in macro to detect compiler support of 
> the flag. Checking for
> aarch64 machine is not strictly necessary, but was added for readability.

Added a commitfest entry:
https://commitfest.postgresql.org/29/2637/

Thank you!
Tsahi

Re: [HACKERS] Look-behind regular expressions

2020-07-07 Thread Thom Brown

On Tue, 29 Jun 2010 at 17:49, David E. Wheeler  wrote:
>
> On Jun 29, 2010, at 7:44 AM, Thom Brown wrote:
>
> >> No.  Or are you volunteering?
> >
> > A n00b like me volunteer for that?  It's more of a suggestion.
>
> N00bs gotta start somewhere…

10 years later, and I've noticed that both look-behind and negative
look-behind have been implemented.

Thanks to whomever did this.

Thom

Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

2020-07-07 Thread Andres Freund

Hi,

On 2020-07-07 09:44:41 -0400, Tom Lane wrote:
> Bharath Rupireddy  writes:
> > Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as
> > on_shmem_exit call back which will
> > add this function to the before_shmem_exit_list array which is
> > supposed to be removed on pg_stop_backup
> > so that we can do the pending cleanup and issue a warning for each
> > pg_start_backup for which we did not call
> > the pg_stop backup. Now, I executed a query for which JIT is picked
> > up, then the the llvm compiler inserts it's
> > own exit callback i.e. llvm_shutdown at the end of
> > before_shmem_exit_list array. Now, I executed pg_stop_backup
> > and call to cancel_before_shmem_exit() is made with the expectation
> > that the nonexclusive_base_backup_cleanup
> > callback is removed from before_shmem_exit_list array.
> 
> I'm of the opinion that the JIT code is abusing this mechanism, and the
> right thing to do is fix that.

What are you proposing? For now we could easily enough work around this
by just making it a on_proc_exit() callback, but that doesn't really
change the fundamental issue imo.

> The restriction you propose to remove is not just there out of
> laziness, it's an expectation about what safe use of this mechanism
> would involve.  Un-ordered removal of callbacks seems pretty risky: it
> would mean that whatever cleanup is needed is not going to be done in
> LIFO order.

Maybe I am confused, but isn't it <13's pg_start_backup() that's
violating the protocol much more clearly than the JIT code? Given that
it relies on there not being any callbacks registered between two SQL
function calls?  I mean, what it does is basically:

1) before_shmem_exit(nonexclusive_base_backup_cleanup...
2) arbitrary code executed for a long time
3) cancel_before_shmem_exit(nonexclusive_base_backup_cleanup...

which pretty obviously can't at all deal with any other
before_shmem_exit callbacks being registered in 2).  Won't this be a
problem for every other before_shmem_exit callback that we register
on-demand? Say Async_UnlistenOnExit, RemoveTempRelationsCallback?

Greetings,

Andres Freund

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Andres Freund

Hi,

On 2020-07-03 10:08:08 -0400, Bruce Momjian wrote:
> Well, the bottom line is that we are designing features during beta.
> People are supposed to be testing PG 13 behavior during beta, including
> optimizer behavior.

I think it makes no too much sense to plan invent something like
hash_mem for v13, it's clearly too much work. That's a seperate
discussion from having something like it for v14.

> We don't even have a user report yet of a
> regression compared to PG 12, or one that can't be fixed by increasing
> work_mem.

I posted a repro, and no you can't fix it by increasing work_mem without
increasing memory usage in the whole query / all queries.

> If we add a new behavior to PG 13, we then have the pre-PG 13 behavior,
> the pre-patch behavior, and the post-patch behavior.  How are people
> supposed to test all of that?

I don't really buy this as a problem. It's not like the pre-13 behaviour
would be all new. It's how PG has behaved approximately forever.

My conclusion about this topic is that I think we'll be doing our users
a disservice by not providing an escape hatch, but that I also don't
have the energy / time to fight for it further. This is a long thread
already, and I sense little movement towards a conclusion.

Greetings,

Andres Freund

Re: OpenSSL 3.0.0 compatibility

2020-07-07 Thread Peter Eisentraut


On 2020-05-30 11:29, Peter Eisentraut wrote:

My proposal would be to introduce OPENSSL_API_COMPAT=10001 into master
after the 13/14 branching, along with any other changes to make it
compile cleanly against OpenSSL 3.0.0.  Once that has survived some
scrutiny from the buildfarm and also from folks building against
LibreSSL etc., it should probably be backpatched into PG13.  In the
immediate future, I wouldn't bother about the older branches (<=PG12) at
all.  As long as they still compile, users can just disable deprecation
warnings, and we may add some patches to that effect at some point, but
it's not like OpenSSL 3.0.0 will be adopted into production builds any
time soon.


Trying to move this along, where would be a good place to define 
OPENSSL_API_COMPAT?  The only place that's shared between frontend and 
backend code is c.h.  The attached patch does it that way.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 0c5c83c6051688a643194388000ea356a8d055d5 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 7 Jul 2020 19:43:22 +0200
Subject: [PATCH] Define OPENSSL_API_COMPAT

This avoids deprecation warnings from newer OpenSSL versions (3.0.0 in
particular).

Discussion: 
https://www.postgresql.org/message-id/flat/FEF81714-D479-4512-839B-C769D2605F8A%40yesql.se
---
 src/include/c.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/include/c.h b/src/include/c.h
index a904b49a37..03e87ae3a8 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -1080,6 +1080,14 @@ extern void ExceptionalCondition(const char 
*conditionName,
 #define HAVE_UNIX_SOCKETS 1
 #endif
 
+/*
+ * OpenSSL API compatibility level
+ *
+ * Set this to the minimum supported version.  This avoids deprecation
+ * warnings when building with newer OpenSSL versions.
+ */
+#define OPENSSL_API_COMPAT 10001
+
 /*
  * Invert the sign of a qsort-style comparison result, ie, exchange negative
  * and positive integer values, being careful not to get the wrong answer
-- 
2.27.0

Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-07-07 Thread Alvaro Herrera

On 2020-Jun-24, Andres Freund wrote:

> As I said before, I've utilized being able to do both over a single
> connection (among others to initialize a logical replica using a base
> backup). And I've seen at least one other codebase (developed without my
> input) doing so. I really don't understand how you just dismiss this
> without any sort of actual argument.  Yes, those uses can be fixed to
> reconnect with a different replication parameter, but that's code that
> needs to be adjusted and it requires adjustments to pg_hba.conf etc.
> 
> And obviously you'd lock out older versions of jdbc, and possibly other
> drivers.

Well, I had understood that you were talking from a hypothetical
position, not that you were already using the thing that way.  After
these arguments, I agree to leave things alone, and nobody else seems to
be arguing in that direction, so I'll mark the open item as closed.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: min_safe_lsn column in pg_replication_slots view

2020-07-07 Thread Alvaro Herrera

On 2020-Jul-06, Alvaro Herrera wrote:

> On 2020-Jul-07, Kyotaro Horiguchi wrote:

> > Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and
> > use it intead of the bare expression?
> 
> I was of two minds about that, and the only reason I didn't do it is
> that we'll need to give it a better name if we do it ...  I'm open to
> suggestions.

In absence of other suggestions I gave this the name XLogMBVarToSegs,
and redefined ConvertToXSegs to use that.  Didn't touch callers in
xlog.c to avoid pointless churn.  Pushed to both master and 13.

I hope this satisfies everyone ... Masao-san, thanks for reporting the
problem, and thanks Horiguchi-san for providing the fix.  (Also thanks
to Amit and Michael for discussion.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: OpenSSL 3.0.0 compatibility

2020-07-07 Thread Tom Lane

Peter Eisentraut  writes:
> Trying to move this along, where would be a good place to define 
> OPENSSL_API_COMPAT?  The only place that's shared between frontend and 
> backend code is c.h.  The attached patch does it that way.

pg_config_manual.h, perhaps?

regards, tom lane

Re: Question: PostgreSQL on Amazon linux EC2

2020-07-07 Thread Christopher Browne

On Mon, 6 Jul 2020 at 15:55, Ajay Patel  wrote:

> Hi Postgres team,
>
> I would like to know if PostgreSQL can be installed and used without any
> issues on Amazon Linux EC2 machines.
>
> https://www.postgresql.org/docs/11/supported-platforms.html
>
> I was going through the documentation and couldn't find very specific
> details related to support.
>
> Any input will be much helpful.
>

 In a way, this is not a whole lot different from asking,

"I would like to know if PostgreSQL can be installed and used without any
issues on Dell server machines."

In that case, there could be questions about whether there are good drivers
for disk controllers that would vary from model to model, and some things
like that.  But there are few up-front answers the way there used to be for
how to handle (say) different versions of AIX.

Amazon EC2 provides virtualized "gear" that simulates x86-64 hardware
reasonably decently; there can certainly be performance issues relating to
how fast their simulated disk is, and how fast their simulated network is.

But there are no highly-specific-to-EC2 details related to hardware
support, as you noticed on that web page.

If you do not have performance or load requirements that are so high that
they point at edge cases where the EC2 virtualized environment starts to
break down, then it's probably mostly smooth sailing.

You need to be aware that they do not promise super-high-availability, so
you should be sure to keep good backups lest your server gets dropped on
the floor and you lose all your data.  I'm not sure there's good stats just
yet as to how often that happens.  But it isn't difficult to provision a
pgbackrest server that will capture backups into S3 cloud storage to help
protect from that.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Peter Geoghegan

On Tue, Jul 7, 2020 at 5:55 AM David Rowley  wrote:
> FWIW, I'm not a fan of the hash_mem idea. It was my impression that we
> aimed to provide an escape hatch for people we have become accustomed
> to <= PG12 behaviour and hash_mem sounds like it's not that.

The exact scope of the problem is unclear. If it was clear, then we'd
be a lot closer to a resolution than we seem to be. Determining the
scope of the problem is the hardest part of the problem.

All that is ~100% clear now is that some users will experience what
they'll call a regression. Those users will be unhappy, even if and
when they come to understand that technically they're just "living
within their means" for the first time, and were theoretically not
entitled to the performance from earlier versions all along. That's
bad, and we should try our best to avoid or mitigate it.

Sometimes the more ambitious plan (in this case hash_mem) actually has
a greater chance of succeeding, despite solving more problems than the
immediate problem. I don't think that it's reasonable to hold that
against my proposal. It may be that hash_mem is a bad idea based on
the costs and benefits, or the new risks, in which case it should be
rejected. But if it's the best proposal on the table by a clear
margin, then it shouldn't be disqualified for not satisfying the
original framing of the problem.

> Surely a
> GUC by that name would control what Hash Join does too?  Otherwise, it
> would be called hashagg_mem. I'd say changing the behaviour of Hash
> join is not well aligned to the goal of allowing users to get
> something closer to what PG12 did.

My tentative hash_mem proposal assumed that hash join would be
affected alongside hash agg, in the obvious way. Yes, that's clearly
beyond the scope of the open item.

The history of some other database systems is instructive. At least a
couple of these systems had something like a work_mem/sort_mem GUC, as
well as a separate hash_mem-like GUC that only affects hashing. It's
sloppy, but nevertheless better than completely ignoring the
fundamental ways in which hashing really is special. This is a way of
papering-over one of the main deficiencies of the general idea of a
work_mem style per-node allocation. Yes, that's pretty ugly.

I think that work_mem would be a lot easier to tune if you assume that
hash-based nodes don't exist (i.e. only merge joins and nestloop joins
are available, plus group aggregate for aggregation). You don't need
to do this as a thought experiment. That really was how things were up
until about the mid-1980s, when increasing memory capacities made hash
join and hash agg in database systems feasible for the first time.
Hashing came after most of the serious work on cost-based optimizers
had already been done. This argues for treating hash-based nodes as
special now, if only to extend work_mem beyond its natural life as a
pragmatic short-term measure. Separately, it argues for a *total
rethink* of how memory is used in the executor in the long term -- it
shouldn't be per-node in a few important cases (I'm thinking of the
"hash teams" design I mentioned on this thread recently, which seems
like a fundamentally better way of doing it).

> We're certainly not
> going to get that for PG13, so I do think what we need here is just a
> simple escape hatch. I mentioned my thoughts in [2], so won't go over
> it again here. Once we've improved the situation in some future
> version of postgres, perhaps along the lines of what Tomas mentioned,
> then we can get rid of the escape hatch.

If it really has to be a simple escape hatch in Postgres 13, then I
could live with a hard disabling of spilling at execution time. That
seems like the most important thing that is addressed by your
proposal. I'm concerned that way too many users will have to use the
escape hatch, and that that misses the opportunity to provide a
smoother experience.

> Here are my reasons for not liking the hash_mem idea:

I'm sure that your objections are valid to varying degrees. But they
could almost be thought of as problems with work_mem itself. I am
trying to come up with a practical way of ameliorating the downsides
of work_mem. I don't for a second imagine that this won't create new
problems. I think that it's the least worst thing right now. I have my
misgivings.

--
Peter Geoghegan

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Alvaro Herrera

On 2020-Jul-07, Amit Kapila wrote:

> I don't think this is true.  We seem to have introduced three new guc
> variables in a 9.3.3 minor release.

Yeah, backporting GUCs is not a big deal.  Sure, the GUC won't appear in
postgresql.conf files generated by initdb prior to the release that
introduces it.  But users that need it can just edit their .confs and
add the appropriate line, or just do ALTER SYSTEM after the minor
upgrade.  For people that don't need it, it would have a reasonable
default (probably work_mem, so that behavior doesn't change on the minor
upgrade).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Peter Geoghegan

On Tue, Jul 7, 2020 at 1:18 PM Alvaro Herrera  wrote:
> Yeah, backporting GUCs is not a big deal.  Sure, the GUC won't appear in
> postgresql.conf files generated by initdb prior to the release that
> introduces it.  But users that need it can just edit their .confs and
> add the appropriate line, or just do ALTER SYSTEM after the minor
> upgrade.

I don't buy that argument myself. At a minimum, if we do it then we
ought to feel bad about it. It should be rare.

The fact that you can have a replica on an earlier point release
enforces the idea that it ought to be broadly compatible. Technically
users are not guaranteed that this will work, just like there are no
guarantees about WAL compatibility across point releases. We
nevertheless tacitly provide a "soft" guarantee that we won't break
WAL -- and that we won't add entirely new GUCs in a point release.

-- 
Peter Geoghegan

Re: OpenSSL 3.0.0 compatibility

2020-07-07 Thread Daniel Gustafsson

> On 7 Jul 2020, at 19:53, Tom Lane  wrote:
> 
> Peter Eisentraut  writes:
>> Trying to move this along,

Thanks, this has stalled a bit on my TODO.

>> where would be a good place to define 
>> OPENSSL_API_COMPAT?  The only place that's shared between frontend and 
>> backend code is c.h.  The attached patch does it that way.
> 
> pg_config_manual.h, perhaps?

I don't have a strong preference.  When starting hacking on this I went for the
quick and simple option of adding it to CFLAGS in configure.in for the time
being since I wasn't sure where to put it.

A slightly more complicated problem arise when trying to run the pgcrypto
regress tests, and make it run the tests for the now deprecated ciphers, as
they require the legacy provider to be loaded via the openssl configuration
file.  As was mentioned upthread, this requires us to inject our own
openssl.cnf in OPENSSL_CONF, load the legacy provider there and then from that
file include the system openssl.cnf (or override the system one completely
during testing which doesn't seem like a good idea).

Hacking this up in a crude PoC I added a REGRESS_ENV option in pgxs.mk which
then pgcrypto/Makefile could use to set an OPENSSL_CONF, which in turn ends
with a .include= for my system config.  This enables pgcrypto to load
the now deprecated ciphers, but even as PoC's goes this is awfully brittle and
a significant amount of bricks shy.

Actually running the tests with the legacy provider loaded yields a fair number
of errors like these, and somewhere around there I ran out of time for now as
the CF started.

-  decrypt
-
- Lets try a longer message.
+ decrypt
+--
+ Lets try a longer messag\177\177\177\177\177\177\177\177

Memorizing the "cannot load cipher" errors in an alternative output and
documenting how to use old ciphers in pgcrypto together with OpenSSL 3.0.0+
might be the least bad option?  Anyone else have any good ideas on how to get
this into the testrunner?

cheers ./daniel

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Alvaro Herrera

On 2020-Jul-07, Peter Geoghegan wrote:

> On Tue, Jul 7, 2020 at 1:18 PM Alvaro Herrera  
> wrote:
> > Yeah, backporting GUCs is not a big deal.  Sure, the GUC won't appear in
> > postgresql.conf files generated by initdb prior to the release that
> > introduces it.  But users that need it can just edit their .confs and
> > add the appropriate line, or just do ALTER SYSTEM after the minor
> > upgrade.
> 
> I don't buy that argument myself. At a minimum, if we do it then we
> ought to feel bad about it. It should be rare.

Judging history, it's pretty clear that it *is* rare.  I'm not
suggesting we do it now.  I'm just contesting the assertion that it's
impossible.

> The fact that you can have a replica on an earlier point release
> enforces the idea that it ought to be broadly compatible.

A replica without hash_mem is not going to fail if the primary is
upgraded to a version with hash_mem, so I'm not sure this argument
means anything in this case.  In any case, when we add WAL message types
in minor releases, users are suggested to upgrade the replicas first; if
they fail to do so, the replicas shut down when they reach a WAL point
where the primary emitted the new message.  Generally speaking, we *don't*
promise that running a replica with an older minor always works, though
obviously it does work most of the time.

> Technically
> users are not guaranteed that this will work, just like there are no
> guarantees about WAL compatibility across point releases. We
> nevertheless tacitly provide a "soft" guarantee that we won't break
> WAL -- and that we won't add entirely new GUCs in a point release.

Agreed, we do provide those guarantees.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary support for pgoutput plugin

2020-07-07 Thread Dave Cramer

On Tue, 7 Jul 2020 at 10:01, Daniel Gustafsson  wrote:

> > On 7 Jul 2020, at 02:16, Dave Cramer  wrote:
>
> > OK, rebased it down to 2 patches, attached.
>
> I took a look at this patchset today.  The feature clearly seems like
> something
> which we'd benefit from having, especially if it allows for the kind of
> extensions that were discussed at the beginning of this thread.  In
> general I
> think it's in pretty good shape, there are however a few comments:
>
> The patch lacks any kind of test, which I think is required for it to be
> considered for committing.  It also doesn't update the \dRs view in psql to
> include the subbinary column which IMO it should.  I took the liberty to
> write
> this as well as tests as I was playing with the patch, the attached 0003
> contains this, while 0001 and 0002 are your patches included to ensure the
> CFBot can do it's thing.  This was kind of thrown together to have
> something
> while testing, so it definately need a once-over or two.
>

I have put all your requests other than the indentation as that can be
dealt with by pg_indent into another patch which I reordered ahead of yours
This should make it easier to see that all of your issues have been
addressed.

I did not do the macro for updated, inserted, deleted, will give that a go
tomorrow.


>
> The comment here implies that unchanged is the default value for format,
> but
> isn't this actually setting it to text formatted value?
> +   /* default is unchanged */
> +   tuple->format = palloc(natts * sizeof(char));
> +   memset(tuple->format, 't', natts * sizeof(char));
> Also, since the values member isn't memset() with a default, this seems a
> bit
> misleading at best no?
>
> For the rest of the backend we aren't including the defname in the errmsg
> like
> what is done here.  Maybe we should, but I think that should be done
> consistently if so, and we should stick to just "conflicting or redundant
> options" for now.  At the very least, this needs a comma between "options"
> and
> the defname and ideally the defname wrapped in \".
> -errmsg("conflicting or redundant options")));
> +errmsg("conflicting or redundant options %s
> already provided", defel->defname)));
>

I added these since this will now be used outside of logical replication
and getting reasonable error messages when setting up
replication is useful. I added the \" and ,


>
> These types of constructs are IMHO quite hard to read:
> + if(
> + #ifdef WORDS_BIGENDIAN
> + true
> + #else
> + false
> + #endif
> + != bigendian)
> How about spelling out the statement completely for both cases, or perhaps
> encapsulating the logic in a macro? Something like the below perhaps?
> + #ifdef WORDS_BIGENDIAN
> + if (bigendian != true)
> + #else
> + if (bigendian != false)
> + #endif
>
> This change needs to be removed before a commit, just highlighting that
> here to
> avoid it going unnoticed.
> -/* #define WAL_DEBUG */
> +#define WAL_DEBUG
>
> Done


> Reading this I'm left wondering if we shoulnd't introduce macros for the
> kinds,
> since we now compare with 'u', 't' etc in quite a few places and add
> comments
> explaining the types everywhere.  A descriptive name would make it easier
> to
> grep for all occurrences, and avoid the need for the comment lines.  Thats
> not
> necesarily for this patch though, but an observation from reading it.
>

I'll take a look at adding macros tomorrow.

I've taken care of much of this below

>
>
> I found a few smaller nitpicks as well, some of these might go away by a
> pg_indent run but I've included them all here regardless:
>
> This change, and the subsequent whitespace removal later in the same
> function,
> seems a bit pointless:
> -   /* read the relation id */
> relid = pq_getmsgint(in, 4);
> -
>
> Braces should go on the next line:
> +   if (options->proto.logical.binary) {
>
> This should be a C /* ...  */ comment, or perhaps just removed since the
> code
> is quite self explanatory:
> +   // default to false
> +   *binary_basetypes = false;
>
> Indentation here:
> -errmsg("conflicting or redundant options")));
> +   errmsg("conflicting or redundant options %s
> already provided", defel->defname)));
>
> ..as well as here (there are a few like this one):
> +   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +   errmsg("incompatible datum size")));
>
> Capitalization of "after" to make it a proper sentence:
> + * after we know that the subscriber is requesting binary check to make
> sure
>
> Excessive whitespace and indentation in a few places, and not enough in
> some:
> +   if (binary_given)
> +   {
> +   values[Anum_pg_subscription_subbinary - 1]
> =
> ...
> +   if ( *binary_basetypes == true )
> ...
> +   if (sizeof(i

Re: output columns of \dAo and \dAp

2020-07-07 Thread Alvaro Herrera

Sergey, Nikita, Alexander, if you can please see this thread and propose
a solution, that'd be very welcome.


On 2020-Jun-06, Tom Lane wrote:

> Peter Eisentraut  writes:
> > I'm also wondering whether this is fully correct.  Would it be possible for 
> > the
> > argument types of the operator/function to differ from the left arg/right 
> > arg
> > types?  (Perhaps binary compatible?)
> 
> pg_amop.h specifies that
> 
>  * The primary key for this table is   * amopstrategy>.  amoplefttype and amoprighttype are just copies of the
>  * operator's oprleft/oprright, ie its declared input data types.
> 
> Perhaps it'd be a good idea for opr_sanity.sql to verify that, since
> it'd be an easy thing to mess up in handmade pg_amop entries.  But
> at least for the foreseeable future, there's no reason for \dAo to show
> amoplefttype/amoprighttype separately.
> 
> I agree that \dAo ought to be showing amopstrategy; moreover that ought
> to be much higher in the sort key than it is.  Also, if we're not going
> to show amoppurpose, then the view probably ought to hide non-search
> operators altogether.  It is REALLY misleading to not distinguish search
> and ordering operators.
>  
> The situation is different for pg_amproc: if you look for discrepancies
> you will find plenty, since in many cases a support function's signature
> has little to do with what types it is registered under.  Perhaps it'd be
> worthwhile for \dAp to show the functions as regprocedure in addition to
> showing amproclefttype/amprocrighttype explicitly.  In any case, I think
> it's rather misleading for \dAp to label amproclefttype/amprocrighttype as
> "Left arg type" and "Right arg type", because for almost everything except
> btree/hash, that's not what the support function's arguments actually are.
> Perhaps names along the lines of "Registered left type" and "Registered
> right type" would put readers in a better frame of mind to understand
> the entries.
> 
>   regards, tom lane
> 
> 


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Peter Geoghegan

On Tue, Jul 7, 2020 at 10:12 AM Andres Freund  wrote:
> I think it makes no too much sense to plan invent something like
> hash_mem for v13, it's clearly too much work. That's a seperate
> discussion from having something like it for v14.

Can you explain why you believe that to be the case? It seems quite
possible that there is some subtlety that I missed in grouping sets or
something like that. I would like to know the specifics, if there are
any specifics.

> My conclusion about this topic is that I think we'll be doing our users
> a disservice by not providing an escape hatch, but that I also don't
> have the energy / time to fight for it further. This is a long thread
> already, and I sense little movement towards a conclusion.

An escape hatch seems necessary. I accept that a hard disabling of
spilling at execution time meets that standard, and that may be enough
for Postgres 13. But I am concerned that an uncomfortably large
proportion of our users will end up needing this. (Perhaps I should
say a large proportion of the subset of users that might be affected
either way. You get the idea.)

I have to wonder if this escape hatch is an escape hatch for our
users, or an escape hatch for us. There is a difference.

-- 
Peter Geoghegan

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2020-07-07 Thread Daniel Gustafsson

> On 25 Mar 2020, at 06:52, Fujii Masao  wrote:
> 
> On 2020/01/28 0:24, Fujii Masao wrote:
>> On 2020/01/15 19:18, Paul Guo wrote:
>>> I further fixed the last test failure (due to a small bug in the test, not 
>>> in code). Attached are the new patch series. Let's see the CI pipeline 
>>> result.
>> Thanks for updating the patches!
>> I started reading the 0003 patch.
> 
> I marked this patch as Waiting on Author in CF because there is no update
> since my last review comments. Could you mark it as Needs Review again
> if you post the updated version of the patch.

This thread has been stalled since effectively January, so I'm marking this
patch Returned with Feedback.  Feel free to open a new entry once the review
comments have been addressed.

cheers ./daniel

Re: output columns of \dAo and \dAp

2020-07-07 Thread Alexander Korotkov

On Sun, Jun 7, 2020 at 12:34 AM Tom Lane  wrote:
> Peter Eisentraut  writes:
> > I'm also wondering whether this is fully correct.  Would it be possible for 
> > the
> > argument types of the operator/function to differ from the left arg/right 
> > arg
> > types?  (Perhaps binary compatible?)
>
> pg_amop.h specifies that
>
>  * The primary key for this table is   * amopstrategy>.  amoplefttype and amoprighttype are just copies of the
>  * operator's oprleft/oprright, ie its declared input data types.
>
> Perhaps it'd be a good idea for opr_sanity.sql to verify that, since
> it'd be an easy thing to mess up in handmade pg_amop entries.  But
> at least for the foreseeable future, there's no reason for \dAo to show
> amoplefttype/amoprighttype separately.

+1 for checking consistency of amoplefttype/amoprighttype in opr_sanity.sql

> I agree that \dAo ought to be showing amopstrategy;

I agree that the strategy and purpose of an operator is valuable
information.  And we probably shouldn't hide it in \dAo. If we do so,
then \dAo and \dAo+ differ by only "sort opfamily" column.  Is it
worth keeping the \dAo+ command for single-column difference?

> moreover that ought
> to be much higher in the sort key than it is.

Do you mean we should sort by strategy number and only then by
arg types?  Current output shows operators grouped by opclasses,
after that cross-opclass operators are shown.  This order seems to me
more worthwhile than seeing all the variations of the same strategy
together.

> Also, if we're not going
> to show amoppurpose, then the view probably ought to hide non-search
> operators altogether.  It is REALLY misleading to not distinguish search
> and ordering operators.

+1

> The situation is different for pg_amproc: if you look for discrepancies
> you will find plenty, since in many cases a support function's signature
> has little to do with what types it is registered under.  Perhaps it'd be
> worthwhile for \dAp to show the functions as regprocedure in addition to
> showing amproclefttype/amprocrighttype explicitly.  In any case, I think
> it's rather misleading for \dAp to label amproclefttype/amprocrighttype as
> "Left arg type" and "Right arg type", because for almost everything except
> btree/hash, that's not what the support function's arguments actually are.
> Perhaps names along the lines of "Registered left type" and "Registered
> right type" would put readers in a better frame of mind to understand
> the entries.

+1 for rename "Left arg type"/"Right arg type" to "Registered left
type"/"Registered right type".

--
Regards,
Alexander Korotkov

Re: jsonpath versus NaN

2020-07-07 Thread Alexander Korotkov

On Mon, Jul 6, 2020 at 3:19 PM Alexander Korotkov  wrote:
> The patchset is attached, sorry for the delay.
>
> The first patch improves error messages, which appears to be unclear
> for me.  If one applies .double() method to a numeric value, we
> restrict that this numeric value should fit to double precision type.
> If it doesn't fit, the current error message just says the following.
>
> ERROR: jsonpath item method .double() can only be applied to a numeric value
>
> But that's confusing, because .double() method is naturally applied to
> a numeric value.  Patch makes this message explicitly report that
> numeric value is out of range for double type.  This patch also adds
> test exercising this error.  When string can't be converted to double
> precision, I think it's better to explicitly say that we expected
> valid string representation of double precision type.
>
> Second patch forbids to convert NaN using .double() method.  As I get,
> NaN can't be result of any jsonpath computations assuming there is no
> NaN input.  So, I just put an assert to convertJsonbScalar() ensuring
> there is no NaN in JsonbValue.

I'm going to push 0002 if there is no objection.

Regarding 0001, I think my new error messages need review.

--
Regards,
Alexander Korotkov

Re: jsonpath versus NaN

2020-07-07 Thread Tom Lane

Alexander Korotkov  writes:
> I'm going to push 0002 if there is no objection.
> Regarding 0001, I think my new error messages need review.

I do intend to review these, just didn't get to it yet.

regards, tom lane

Re: jsonpath versus NaN

2020-07-07 Thread Alexander Korotkov

On Wed, Jul 8, 2020 at 1:16 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > I'm going to push 0002 if there is no objection.
> > Regarding 0001, I think my new error messages need review.
>
> I do intend to review these, just didn't get to it yet.

OK, that you for noticing.  I wouldn't push anything before your review.

--
Regards,
Alexander Korotkov

Re: change a function name in a comment correctly

2020-07-07 Thread Masahiro Ikeda


There is the comment which related function name is not same.
I attached the patch to fix it. Please review.


Thanks for the report and patch! LGTM.
I will commit this later.


Thanks for checking.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: posgres 12 bug (partitioned table)

2020-07-07 Thread Soumyadeep Chakraborty

Hi Amit,

Thanks for your reply!

On Tue, Jul 7, 2020 at 7:18 AM Amit Langote  wrote:
>
> Hi Soumyadeep,
>
> Thanks for picking this up.
>
> On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
>  wrote:
> > Upon investigation, it seems that the problem is caused by the
> > following:
> >
> > The slot passed to the call to ExecProcessReturning() inside
> > ExecInsert() is often a virtual tuple table slot.
>
> Actually, not that often in practice.  The slot is not virtual, for
> example, when inserting into a regular non-partitioned table.

Indeed! I meant partitioned tables are a common use case. Sorry, I
should have elaborated.

> If I change this to return a "heap" slot for partitioned tables, just
> like for foreign tables, the problem goes away (see the attached).  In
> fact, even make check-world passes, so I don't know why it isn't that
> way to begin with.
>

This is what I had thought of initially but I had taken a step back for 2
reasons:

1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1], a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.

2. This is a secondary aspect but I will mention it here for
completeness. Not knowing enough about this code: will demanding heap
tuples for partitioned tables all throughout the code have a performance
impact? At a first glance it didn't seem to be the case.  However, I did
find lots of callsites for partitioning or otherwise where we kind of
expect a virtual tuple table slot (as evidenced with the calls to
ExecStoreVirtualTuple()). With your patch, we seem to be calling
ExecStoreVirtualTuple() on a heap tuple table slot, in various places:
such as inside execute_attr_map_slot(). It seems to be harmless to do so
however, in accordance with my limited investigation.

All in all, I think we have to explicitly supply those system columns. I
heard from Daniel that one of the motivations for having table AMs
was to ensure that transaction meta-data storage is not demanded off any
AM.

[1] https://github.com/greenplum-db/postgres/tree/zedstore

Regards,
Soumyadeep

Re: Is it useful to record whether plans are generic or custom?

2020-07-07 Thread torikoshia


On 2020-07-06 22:16, Fujii Masao wrote:

On 2020/06/11 14:59, torikoshia wrote:

On 2020-06-10 18:00, Kyotaro Horiguchi wrote:



+    TupleDescInitEntry(tupdesc, (AttrNumber) 8, "last_plan",

This could be a problem if we showed the last plan in this view.  I
think "last_plan_type" would be better.

+    if (prep_stmt->plansource->last_plan_type == 
PLAN_CACHE_TYPE_CUSTOM)

+    values[7] = CStringGetTextDatum("custom");
+    else if (prep_stmt->plansource->last_plan_type == 
PLAN_CACHE_TYPE_GENERIC)

+    values[7] = CStringGetTextDatum("generic");
+    else
+    nulls[7] = true;

Using swith-case prevents future additional type (if any) from being
unhandled.  I think we are recommending that as a convension.


Thanks for your reviewing!

I've attached a patch that reflects your comments.


Thanks for the patch! Here are the comments.


Thanks for your review!


+Number of times generic plan was choosen
+Number of times custom plan was choosen

Typo: "choosen" should be "chosen"?


Thanks, fixed them.

+  role="column_definition">

+   last_plan_type text
+  
+  
+Tells the last plan type was generic or custom. If the 
prepared

+statement has not executed yet, this field is null
+  

Could you tell me how this information is expected to be used?
I think that generic_plans and custom_plans are useful when 
investigating

the cause of performance drop by cached plan mode. But I failed to get
how much useful last_plan_type is.


This may be an exceptional case, but I once had a case needed
to ensure whether generic or custom plan was chosen for specific
queries in a development environment.

Of course, we can know it from adding EXPLAIN and ensuring whether $n
is contained in the plan, but I feel using the view is easier to use
and understand.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom f2155b1a9f3b500eca490827dd044f5c0f704dd3 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi
Date: Wed, 8 Jul 2020 09:17:22 +0900
Subject: [PATCH] We didn't have an easy way to find how many times generic or
 custom plans of a prepared statement have been executed.  This patch exposes
 such numbers of both plans and the last plan type in pg_prepared_statements.
From 08183ee7ef827761f1ed38d2ab62b4f8f748baca Mon Sep 17 00:00:00 2001
---
 doc/src/sgml/catalogs.sgml| 28 +++
 src/backend/commands/prepare.c| 29 ---
 src/backend/utils/cache/plancache.c   | 23 
 src/include/catalog/pg_proc.dat   |  6 ++--
 src/include/utils/plancache.h | 12 ++-
 src/test/regress/expected/prepare.out | 51 +++
 src/test/regress/expected/rules.out   |  7 ++--
 src/test/regress/sql/prepare.sql  | 15 
 8 files changed, 155 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 003d278370..146dc2208b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10829,6 +10829,34 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
frontend/backend protocol
   
  
+
+ 
+  
+   generic_plans bigint
+  
+  
+Number of times generic plan was chosen
+  
+ 
+
+ 
+  
+   custom_plans bigint
+  
+  
+Number of times custom plan was chosen
+  
+ 
+
+ 
+  
+   last_plan_type text
+  
+  
+Tells the last plan type was generic or custom. If the prepared
+statement has not executed yet, this field is null
+  
+ 
 

   
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80d6df8ac1..b4b1865d48 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -694,7 +694,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 /*
  * This set returning function reads all the prepared statements and
- * returns a set of (name, statement, prepare_time, param_types, from_sql).
+ * returns a set of (name, statement, prepare_time, param_types, from_sql,
+ * generic_plans, custom_plans, last_plan_type).
  */
 Datum
 pg_prepared_statement(PG_FUNCTION_ARGS)
@@ -723,7 +724,7 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 	 * build tupdesc for result tuples. This must match the definition of the
 	 * pg_prepared_statements view in system_views.sql
 	 */
-	tupdesc = CreateTemplateTupleDesc(5);
+	tupdesc = CreateTemplateTupleDesc(8);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "name",
 	   TEXTOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "statement",
@@ -734,6 +735,12 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 	   REGTYPEARRAYOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "from_sql",
 	   BOOLOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "generic_plans",
+	   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (Attr

Re: pg_resetwal --next-transaction-id may cause database failed to restart.

2020-07-07 Thread movead...@highgo.ca


>The rationale for this interface is unclear to me.  Please explain what
>happens in each case?
>In my proposal, we'd have:
>* Bad value, no --force:
>  - program raises error, no work done.
>* Bad value with --force:
>  - program raises warning but changes anyway.
>* Good value, no --force:
>  - program changes value without saying anything
>* Good value with --force:
>  - same
You have list all cases, maybe you are right it needs to raise a warning
when force a Bad value write which missed in the patch.
And I use '--test' in the patch, not '--force' temporary, maybe it needs
a deep research and discuss.

>The rationale for this interface is convenient knowledgeable access: the
>DBA runs the program with value X, and if the value is good, then
>they're done.  If the program raises an error, DBA has a choice: either
>run with --force because they know what they're doing, or don't do
>anything because they know that they would make a mess.
Yes that's it, in addition the raised error, can tell the DBA to input a good
value.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: A patch for get origin from commit_ts.

2020-07-07 Thread movead...@highgo.ca


>Cool, thanks.  I have gone through your patch in details, and updated
>it as the attached.  Here are some comments.
 
>'8000' as OID for the new function was not really random, so to be
>fair with the other patches, I picked up the first random value
>unused_oids has given me instead.
> 
>There were some indentation issues, and pgindent got that fixed.
 
>I think that it is better to use "replication origin" in the docs
>instead of just origin.  I have kept "origin" in the functions for
>now as that sounded cleaner to me, but we may consider using something
>like "reporigin" as well as attribute name.
> 
>The tests could just use tstzrange() to make sure that the timestamps
>have valid values, so I have switched to that, and did not resist to
>do the same in the existing tests.
> 
>+-- Test when it can not find the transaction
>+SELECT * FROM pg_xact_commit_timestamp_origin((:'txid_set_origin'::text::int +
>10)::text::xid) x;
>This test could become unstable, particularly if it gets used in a
>parallel environment, so I have removed it.  Perhaps I am just
>over-pessimistic here though..
> 
>As a side note, I think that we could just remove the alternate output
>of commit_ts/, as it does not really get used because of the
>NO_INSTALLCHECK present in the module's Makefile.  That would be the
>job of a different patch, so I have updated it accordingly.  Glad to
>see that you did not forget to adapt it in your own patch.
> 
>(The change in catversion.h is a self-reminder...)
Thanks for all of that, so many details I still need to pay attention when 
submit a patch.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

RE: Performing partition pruning using row value

2020-07-07 Thread kato-...@fujitsu.com

Fujita san

On Tuesday, July 7, 2020 6:31 PM Etsuro Fujita  wrote:
> Just to be clear, the condition (c1, c2) < (99, 99) is not equivalent to the
> condition c1 < 99 and c2 < 99 (see the documentation note in [1]).

Thanks for sharing this document. I have understood.

> but I don't think the main reason for that is that it takes time to parse
> expressions.
> Yeah, I think it's great to support row-wise comparison not only with the 
> small
> number of args but with the large number of them.

These comments are very helpful.
Ok, I try to make POC that allows row-wise comparison with partition-pruning.

Regards, 
sho kato
> -Original Message-
> From: Etsuro Fujita 
> Sent: Tuesday, July 7, 2020 6:31 PM
> To: Kato, Sho/加藤 翔 
> Cc: PostgreSQL-development 
> Subject: Re: Performing partition pruning using row value
> 
> Kato-san,
> 
> On Mon, Jul 6, 2020 at 5:25 PM kato-...@fujitsu.com 
> wrote:
> > I would like to ask about the conditions under which partition pruning is
> performed.
> > In PostgreSQL 12, when I executed following SQL, partition pruning is not
> performed.
> >
> > postgres=# explain select * from a where (c1, c2) < (99, 99);
> >QUERY PLAN
> > 
> >  Append  (cost=0.00..60.00 rows=800 width=40)
> >->  Seq Scan on a1 a_1  (cost=0.00..28.00 rows=400 width=40)
> >  Filter: (ROW(c1, c2) < ROW(99, 99))
> >->  Seq Scan on a2 a_2  (cost=0.00..28.00 rows=400 width=40)
> >  Filter: (ROW(c1, c2) < ROW(99, 99))
> > (5 rows)
> >
> > However, pruning is performed when I changed the SQL as follows.
> >
> > postgres=# explain select * from a where c1  < 99 and c2 < 99;
> >QUERY PLAN
> > 
> >  Seq Scan on a1 a  (cost=0.00..28.00 rows=133 width=40)
> >Filter: ((c1 < 99) AND (c2 < 99))
> > (2 rows)
> 
> Just to be clear, the condition (c1, c2) < (99, 99) is not equivalent to the
> condition c1 < 99 and c2 < 99 (see the documentation note in [1]).
> 
> > Looking at the code, "(c1, c2) < (99, 99)" is recognized as RowCompExpr and
> "c1 < 99 and c2 < 99" is recognized combination of OpExpr.
> >
> > Currently, pruning is not performed for RowCompExpr, is this correct?
> 
> Yeah, I think so.
> 
> > Because it would take a long time to parse all Expr nodes, does
> match_cluause_to_partition_key() return PART_CLAUSE_UNSUPPORTED
> when such Expr node is passed?
> 
> I don't know the reason why that function doesn't support row-wise comparison,
> but I don't think the main reason for that is that it takes time to parse
> expressions.
> 
> > If the number of args in RowCompExpr is small, I would think that expanding
> it would improve performance.
> 
> Yeah, I think it's great to support row-wise comparison not only with the 
> small
> number of args but with the large number of them.
> 
> Best regards,
> Etsuro Fujita
> 
> [1]
> https://www.postgresql.org/docs/current/functions-comparisons.html#ROW-
> WISE-COMPARISON

Re: [HACKERS] Look-behind regular expressions

2020-07-07 Thread Andreas Karlsson


On 7/7/20 6:51 PM, Thom Brown wrote:

On Tue, 29 Jun 2010 at 17:49, David E. Wheeler  wrote:


On Jun 29, 2010, at 7:44 AM, Thom Brown wrote:


No.  Or are you volunteering?


A n00b like me volunteer for that?  It's more of a suggestion.


N00bs gotta start somewhere…


10 years later, and I've noticed that both look-behind and negative
look-behind have been implemented.

Thanks to whomever did this.


I think that it is Tom Lane that you should thank for this.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=12c9a04008870c283931d6b3b648ee21bbc2cfda

Andreas

Re: Quick doc patch

2020-07-07 Thread Michael Paquier

On Tue, Jul 07, 2020 at 06:36:10PM +0900, Michael Paquier wrote:
> On Tue, Jul 07, 2020 at 09:58:59AM +0200, Daniel Gustafsson wrote:
>> I agree, it looks like a copy-pasteo in 15cb2bd2700 which introduced the
>> paragraph for both GIN and BRIN.  LGTM.  Adding Alexander who committed in on
>> cc.
> 
> +1.

Alexander does not seem to be around, so I have just applied the fix.
There were more inconsistencies in gin.sgml and spgist.sgml missed in
14903f2, making the docs of GIN/SP-GiST less in line with the BRIN
equivalent, so I have fixed both while on it.
--
Michael


signature.asc
Description: PGP signature

Re: A patch for get origin from commit_ts.

2020-07-07 Thread Michael Paquier

On Wed, Jul 08, 2020 at 09:31:24AM +0800, movead...@highgo.ca wrote:
> Thanks for all of that, so many details I still need to pay attention when 
> submit a patch.

No problem.  We are all here to learn, and nothing can be perfect, if
perfection is even possible :)

Regarding the attribute name, I was actually considering to just use
"roident" instead.  This is more consistent with pglogical, and that's
also the field name we use in ReplicationState[OnDisk].  What do you
think?
--
Michael


signature.asc
Description: PGP signature

[doc] modifying unit from characters to bytes

2020-07-07 Thread torikoshia


Hi,

The manual describes the size of pg_stat_activity.query
as below:

 | By default the query text is truncated at 1024 characters;

When considering multibyte characters, it seems more
accurate to change the unit from "characters" to "bytes".

I also searched other "[0-9] characters" in the manual.
I may overlook something, but apparently it seems ok
because of their contexts which are limited to ASCII
character or other reasons.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 211d279094..e03e612e8a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -894,7 +894,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
state is active this field shows the
currently executing query. In all other states, it shows the last query
that was executed. By default the query text is truncated at 1024
-   characters; this value can be changed via the parameter
+   bytes; this value can be changed via the parameter
.

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread David Rowley

On Wed, 8 Jul 2020 at 07:25, Peter Geoghegan  wrote:
>
> On Tue, Jul 7, 2020 at 5:55 AM David Rowley  wrote:
> > We're certainly not
> > going to get that for PG13, so I do think what we need here is just a
> > simple escape hatch. I mentioned my thoughts in [2], so won't go over
> > it again here. Once we've improved the situation in some future
> > version of postgres, perhaps along the lines of what Tomas mentioned,
> > then we can get rid of the escape hatch.
>
> If it really has to be a simple escape hatch in Postgres 13, then I
> could live with a hard disabling of spilling at execution time. That
> seems like the most important thing that is addressed by your
> proposal. I'm concerned that way too many users will have to use the
> escape hatch, and that that misses the opportunity to provide a
> smoother experience.

Yeah. It's a valid concern. I'd rather nobody would ever have to exit
through the escape hatch either. I don't think anyone here actually
wants that to happen. It's only been proposed to allow users a method
to escape the new behaviour and get back what they're used to.

I think the smoother experience will come in some future version of
PostgreSQL with generally better memory management for work_mem all
round. It's certainly been talked about enough and I don't think
anyone here disagrees that there is a problem with N being unbounded
when it comes to N * work_mem.

I'd really like to see this thread move forward to a solution and I'm
not sure how best to do that. I started by reading back over both this
thread and the original one and tried to summarise what people have
suggested.

I understand some people did change their minds along the way, so I
may have made some mistakes. I could have assumed the latest mindset
overruled, but it was harder to determine that due to the thread being
split.

For hash_mem = Justin [16], PeterG [15], Tomas [7]
hash_mem out of scope for PG13 = Bruce [8], Andres [9]
Wait for reports from users = Amit [10]
Escape hatch that can be removed later when we get something better =
Jeff [11], David [12], Pavel [13], Andres [14], Justin [1]
Add enable_hashagg_spill = Tom [2] (I'm unclear on this proposal. Does
it affect the planner or executor or both?)
Maybe do nothing until we see how things go during beta = Bruce [3]
Just let users set work_mem = Alvaro [4] (I think he changed his mind
after Andres pointed out that changes other nodes in the plan too)
Swap enable_hashagg for a GUC that specifies when spilling should
occur. -1 means work_mem = Robert [17], Amit [18]
hash_mem does not solve the problem = Tomas [6]

David

[1] https://www.postgresql.org/message-id/20200624031443.gv4...@telsasoft.com
[2] https://www.postgresql.org/message-id/2214502.1593019...@sss.pgh.pa.us
[3] https://www.postgresql.org/message-id/20200625182512.gc12...@momjian.us
[4] https://www.postgresql.org/message-id/20200625224422.GA9653@alvherre.pgsql
[5] 
https://www.postgresql.org/message-id/caa4ek1k0cgk_8hryxsvppgoh_z-ny+uztcfwb2we6baj9dx...@mail.gmail.com
[6] 
https://www.postgresql.org/message-id/20200627104141.gq7d3hm2tvoqgjjs@development
[7] 
https://www.postgresql.org/message-id/20200629212229.n3afgzq6xpxrr4cu@development
[8] https://www.postgresql.org/message-id/20200703030001.gd26...@momjian.us
[9] 
https://www.postgresql.org/message-id/20200707171216.jqxrld2jnxwf5...@alap3.anarazel.de
[10] 
https://www.postgresql.org/message-id/CAA4eK1KfPi6iz0hWxBLZzfVOG_NvOVJL=9UQQirWLpaN=ka...@mail.gmail.com
[11] 
https://www.postgresql.org/message-id/8bff2e4e8020c3caa16b61a46918d21b573eaf78.ca...@j-davis.com
[12] 
https://www.postgresql.org/message-id/CAApHDvqFZikXhAGW=ukzkq1_fzhy+xzmuzajinj6rwythh4...@mail.gmail.com
[13] 
https://www.postgresql.org/message-id/cafj8prbf1w4ndz-ynd+muptfizfbs7+cpjc4ob8v9d3x0ms...@mail.gmail.com
[14] 
https://www.postgresql.org/message-id/20200624191433.5gnqgrxfmucex...@alap3.anarazel.de
[15] 
https://www.postgresql.org/message-id/CAH2-WzmD+i1pG6rc1+Cjc4V6EaFJ_qSuKCCHVnH=oruqd-z...@mail.gmail.com
[16] https://www.postgresql.org/message-id/20200703024649.gj4...@telsasoft.com
[17] 
https://www.postgresql.org/message-id/ca+tgmobyv9+t-wjx-ctpdqurcgt1thz1ml3v1nxc4m4g-h6...@mail.gmail.com
[18] 
https://www.postgresql.org/message-id/caa4ek1k0cgk_8hryxsvppgoh_z-ny+uztcfwb2we6baj9dx...@mail.gmail.com

Modifying data type of slot_keep_segs from XLogRecPtr to XLogSegNo

2020-07-07 Thread torikoshia


Hi,

Currently, slot_keep_segs is defined as "XLogRecPtr" in KeepLogSeg(),
but it seems that should be "XLogSegNo" because this variable is
segment number.

How do you think?

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONdiff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fd93bcfaeb..a8623cf996 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9603,7 +9603,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 		/* Cap by max_slot_wal_keep_size ... */
 		if (max_slot_wal_keep_size_mb >= 0)
 		{
-			XLogRecPtr	slot_keep_segs;
+			XLogSegNo	slot_keep_segs;
 
 			slot_keep_segs =
 ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);

Re: change a function name in a comment correctly

2020-07-07 Thread Fujii Masao





On 2020/07/08 8:12, Masahiro Ikeda wrote:

There is the comment which related function name is not same.
I attached the patch to fix it. Please review.


Thanks for the report and patch! LGTM.
I will commit this later.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: A patch for get origin from commit_ts.

2020-07-07 Thread movead...@highgo.ca


>Regarding the attribute name, I was actually considering to just use
>"roident" instead.  This is more consistent with pglogical, and that's
>also the field name we use in ReplicationState[OnDisk].  What do you
>think?
Yes that's is the right way, I can see it's 'roident' in pg_replication_origin
catalog too.
What's your v6 patch based on, I can not apply it.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: Modifying data type of slot_keep_segs from XLogRecPtr to XLogSegNo

2020-07-07 Thread Fujii Masao





On 2020/07/08 11:02, torikoshia wrote:

Hi,

Currently, slot_keep_segs is defined as "XLogRecPtr" in KeepLogSeg(),
but it seems that should be "XLogSegNo" because this variable is
segment number.

How do you think?


I agree that using XLogRecPtr for slot_keep_segs is incorrect.
But this variable indicates the number of segments rather than
segment no, uint64 seems better. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: posgres 12 bug (partitioned table)

2020-07-07 Thread Amit Langote

Hi Soumyadeep,

On Wed, Jul 8, 2020 at 9:37 AM Soumyadeep Chakraborty
 wrote:
> On Tue, Jul 7, 2020 at 7:18 AM Amit Langote  wrote:
> > If I change this to return a "heap" slot for partitioned tables, just
> > like for foreign tables, the problem goes away (see the attached).  In
> > fact, even make check-world passes, so I don't know why it isn't that
> > way to begin with.
>
> This is what I had thought of initially but I had taken a step back for 2
> reasons:
>
> 1. It is not mandatory for an AM to supply a heap tuple in the slot
> passed to any AM routine. With your patch, the slot passed to
> table_tuple_insert() inside ExecInsert() for instance is now expected to
> supply a heap tuple for the subsequent call to ExecProcessReturning().
> This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
> applying your patch on Zedstore [1], a columnar AM, and consequently, I
> got incorrect values for xmin, xmax etc with the query reported in this
> issue.

Ah, I see.  You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert().  So,
table_tuple_insert() always refers to a leaf partition's AM.   Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table.  How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops?  That would be the case, for
example, if the leaf partition is of Zedstore AM.  In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: [doc] modifying unit from characters to bytes

2020-07-07 Thread Fujii Masao





On 2020/07/08 10:54, torikoshia wrote:

Hi,

The manual describes the size of pg_stat_activity.query
as below:

  | By default the query text is truncated at 1024 characters;

When considering multibyte characters, it seems more
accurate to change the unit from "characters" to "bytes".


Agreed. Barring any objection, I will commit this patch.

For record, this change derived from the discussion about other patch [1].

Regards,

[1]
https://postgr.es/m/cd0e961fd42e5708fdea70f7420bf...@oss.nttdata.com


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [HACKERS] Look-behind regular expressions

2020-07-07 Thread David E. Wheeler

On Jul 7, 2020, at 21:32, Andreas Karlsson  wrote:

> On 7/7/20 6:51 PM, Thom Brown wrote:
> 
>> 10 years later, and I've noticed that both look-behind and negative
>> look-behind have been implemented.
>> Thanks to whomever did this.
> 
> I think that it is Tom Lane that you should thank for this.
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=12c9a04008870c283931d6b3b648ee21bbc2cfda

In 2015! Thank you, Tom. I’ll have to find a use for it one of these days.

D




signature.asc
Description: Message signed with OpenPGP

Re: Performing partition pruning using row value

2020-07-07 Thread Amit Langote

Kato-san,

On Wed, Jul 8, 2020 at 10:32 AM kato-...@fujitsu.com
 wrote:
> On Tuesday, July 7, 2020 6:31 PM Etsuro Fujita  
> wrote:
> > Just to be clear, the condition (c1, c2) < (99, 99) is not equivalent to the
> > condition c1 < 99 and c2 < 99 (see the documentation note in [1]).
>
> Thanks for sharing this document. I have understood.
>
> > but I don't think the main reason for that is that it takes time to parse
> > expressions.

I think the only reason that this is not supported is that I hadn't
tested such a query when developing partition pruning, nor did anyone
else suggest doing so. :)

> > Yeah, I think it's great to support row-wise comparison not only with the 
> > small
> > number of args but with the large number of them.

+1

> These comments are very helpful.
> Ok, I try to make POC that allows row-wise comparison with partition-pruning.

That would be great, thank you.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: Modifying data type of slot_keep_segs from XLogRecPtr to XLogSegNo

2020-07-07 Thread torikoshia


On 2020-07-08 11:15, Fujii Masao wrote:

On 2020/07/08 11:02, torikoshia wrote:

Hi,

Currently, slot_keep_segs is defined as "XLogRecPtr" in KeepLogSeg(),
but it seems that should be "XLogSegNo" because this variable is
segment number.

How do you think?


I agree that using XLogRecPtr for slot_keep_segs is incorrect.
But this variable indicates the number of segments rather than
segment no, uint64 seems better. Thought?


That makes sense.
The number of segments and segment number are different.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: Warn when parallel restoring a custom dump without data offsets

2020-07-07 Thread David Gilman

On Thu, Jul 02, 2020 at 05:25:21PM -0400, Tom Lane wrote:
> I guess I'm completely confused about the purpose of these patches.
> Far from coping with the situation of an unseekable file, they appear
> to change pg_restore so that it fails altogether if it can't seek
> its input file.  Why would we want to do this?

I'm not sure where the "fails altogether if it can't seek" is. The
"Skip tables in pg_restore" patch retains the old fread() logic. The
--disable-seeking stuff was just to support tests, and thanks to
help from Justin Pryzby the tests no longer require it. I've attached
the updated patch set.

Note that this still shouldn't be merged because of Justin's bug report
in 20200706050129.gw4...@telsasoft.com which is unrelated to this change
but will leave you with flaky CI until it's fixed.

-- 
David Gilman  :DG<
https://gilslotd.com
>From a671dc328a7defa403ceb46b5107ed4ee5f53103 Mon Sep 17 00:00:00 2001
From: David Gilman 
Date: Wed, 20 May 2020 22:49:28 -0400
Subject: [PATCH 1/5] Scan all TOCs when restoring a custom dump file without
 offsets

TOC requests are not guaranteed to come in disk order. If the custom
dump file was written with data offsets, pg_restore can seek directly to
the data, making request order irrelevant. If there are no data offsets,
pg_restore would never attempt to seek backwards, even when seeking is
possible, and would be unable to find TOCs before the current read
position in the file. 548e50976 changed how pg_restore's parallel
algorithm worked at the cost of greatly increasing out-of-order TOC
requests.

This patch changes pg_restore to scan through all TOCs to service a TOC
read request when restoring a custom dump file without data offsets.
The odds of getting a successful parallel restore go way up at the cost
of a bunch of extra tiny reads when pg_restore starts up.

The pg_restore manpage now warns against running pg_dump with an
unseekable output file and suggests that if you plan on doing a parallel
restore of a custom dump, you should run pg_dump with --file.
---
 doc/src/sgml/ref/pg_restore.sgml   |  8 
 src/bin/pg_dump/pg_backup_custom.c | 25 -
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 232f88024f..23286bb076 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -279,6 +279,14 @@ PostgreSQL documentation
 jobs cannot be used together with the
 option --single-transaction.

+
+   
+pg_restore with concurrent jobs may fail
+when restoring a custom archive format dump written to an unseekable
+output stream, like stdout. To allow for concurrent restoration of
+a custom archive format dump, use pg_dump's
+--file option to specify an output file.
+   
   
  
 
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index 6ab122242c..a873ac3afa 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -413,6 +413,7 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
 	lclTocEntry *tctx = (lclTocEntry *) te->formatData;
 	int			blkType;
 	int			id;
+	bool		initialScan = true;
 
 	if (tctx->dataState == K_OFFSET_NO_DATA)
 		return;
@@ -421,13 +422,28 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
 	{
 		/*
 		 * We cannot seek directly to the desired block.  Instead, skip over
-		 * block headers until we find the one we want.  This could fail if we
-		 * are asked to restore items out-of-order.
+		 * block headers until we find the one we want.
 		 */
-		_readBlockHeader(AH, &blkType, &id);
 
-		while (blkType != EOF && id != te->dumpId)
+		for (;;)
 		{
+			_readBlockHeader(AH, &blkType, &id);
+
+			if (blkType == EOF && ctx->hasSeek && initialScan)
+			{
+/*
+ * This was possibly an out-of-order request. Try one extra
+ * pass over the file to find the TOC.
+ */
+initialScan = false;
+if (fseeko(AH->FH, ctx->dataStart, SEEK_SET) != 0)
+	fatal("error during file seek: %m");
+continue;
+			}
+
+			if (blkType == EOF || id == te->dumpId)
+break;
+
 			switch (blkType)
 			{
 case BLK_DATA:
@@ -443,7 +459,6 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
 		  blkType);
 	break;
 			}
-			_readBlockHeader(AH, &blkType, &id);
 		}
 	}
 	else
-- 
2.27.0

>From 6e8294021482fb76cb17d6c150ec6acb6d0cac37 Mon Sep 17 00:00:00 2001
From: David Gilman 
Date: Sat, 23 May 2020 10:49:33 -0400
Subject: [PATCH 2/5] Add integration test for out-of-order TOC requests in
 pg_restore

---
 src/bin/pg_dump/t/002_pg_dump.pl | 107 ++-
 src/test/perl/TestLib.pm |  12 +++-
 2 files changed, 114 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index e116235769..d99e7a5d22 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_p

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-07 Thread Amit Kapila

On Wed, Jul 8, 2020 at 7:28 AM David Rowley  wrote:
>
>
> I'd really like to see this thread move forward to a solution and I'm
> not sure how best to do that. I started by reading back over both this
> thread and the original one and tried to summarise what people have
> suggested.
>

Thanks, I think this might help us in reaching some form of consensus
by seeing what most people prefer.

> I understand some people did change their minds along the way, so I
> may have made some mistakes. I could have assumed the latest mindset
> overruled, but it was harder to determine that due to the thread being
> split.
>


> For hash_mem = Justin [16], PeterG [15], Tomas [7]
> hash_mem out of scope for PG13 = Bruce [8], Andres [9]
>

+1 for hash_mem out of scope for PG13.  Apart from the reasons you
have mentioned above, the other reason is if this is a way to allow
users to get a smooth experience for hash aggregates, then I think the
idea proposed by Robert is not yet ruled out and we should see which
one is better.  OTOH, if we want to see this as a way to give smooth
experience for current use cases for hash aggregates and improve the
situation for hash joins as well then I think this seems to be a new
behavior which should be discussed for PG14.  Having said that, I am
not saying this is not a good idea but just I don't think we should
pursue it for PG13.

> Wait for reports from users = Amit [10]

I think this is mostly inline with Bruce is intending to say ("Maybe
do nothing until we see how things go during beta").  So, probably we
can club the votes.

> Escape hatch that can be removed later when we get something better =
> Jeff [11], David [12], Pavel [13], Andres [14], Justin [1]
> Add enable_hashagg_spill = Tom [2] (I'm unclear on this proposal. Does
> it affect the planner or executor or both?)
> Maybe do nothing until we see how things go during beta = Bruce [3]
> Just let users set work_mem = Alvaro [4] (I think he changed his mind
> after Andres pointed out that changes other nodes in the plan too)
> Swap enable_hashagg for a GUC that specifies when spilling should
> occur. -1 means work_mem = Robert [17], Amit [18]
> hash_mem does not solve the problem = Tomas [6]
>

[1] - 
https://www.postgresql.org/message-id/ca+tgmobyv9+t-wjx-ctpdqurcgt1thz1ml3v1nxc4m4g-h6...@mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-07 Thread Amit Kapila

On Sun, Jul 5, 2020 at 8:37 PM Dilip Kumar  wrote:
>
> On Tue, Jun 30, 2020 at 10:13 AM Dilip Kumar  wrote:
> >
> >
> > Yeah, I have run the regression suite,  I can see a lot of failure
> > maybe we can somehow see the diff and confirm that all the failures
> > are due to rollback to savepoint only.  I will work on this.
>
> I have compared the changes logged at command end vs logged at commit
> time.  I have ignored the invalidation for the transaction which has
> any aborted subtransaction in it.  While testing this I found one
> issue, the issue is that if there are some invalidation generated
> between last command counter increment and the commit transaction then
> those were not logged.  I have fixed the issue by logging the pending
> invalidation in RecordTransactionCommit.  I will include the changes
> in the next patch set.
>

I think it would have been better if you could have given examples for
such cases where you need this extra logging.  Anyway, below are few
minor comments on this patch:

1.
+ /*
+ * Log any pending invalidations which are adding between the last
+ * command counter increment and the commit.
+ */
+ if (XLogLogicalInfoActive())
+ LogLogicalInvalidations();

I think we can change this comment slightly and extend a bit to say
for which kind of special cases we are adding this. "Log any pending
invalidations which are added between the last CommandCounterIncrement
and the commit.  Normally for DDLs, we log this at each command end,
however for certain cases where we directly update the system table
the invalidations were not logged at command end."

Something like above based on cases that are not covered by command
end WAL logging.

2.
+ * Emit WAL for invalidations.  This is currently only used for logging
+ * invalidations at the command end.
+ */
+void
+LogLogicalInvalidations()

After this is getting used at a new place, it is better to modify the
above comment to something like: "Emit WAL for invalidations.  This is
currently only used for logging invalidations at the command end or at
commit time if any invalidations are pending."

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

RE: Performing partition pruning using row value

2020-07-07 Thread kato-...@fujitsu.com

Amit-san

On Wednesday, July 8, 2020 11:53 AM, Amit Langote :
> I think the only reason that this is not supported is that I hadn't tested 
> such a
> query when developing partition pruning, nor did anyone else suggest doing
> so. :)

Thanks for the information. I'm relieved to hear this reason.

Regards, 
Sho kato
> -Original Message-
> From: Amit Langote 
> Sent: Wednesday, July 8, 2020 11:53 AM
> To: Kato, Sho/加藤 翔 
> Cc: Etsuro Fujita ; PostgreSQL-development
> 
> Subject: Re: Performing partition pruning using row value
> 
> Kato-san,
> 
> On Wed, Jul 8, 2020 at 10:32 AM kato-...@fujitsu.com
>  wrote:
> > On Tuesday, July 7, 2020 6:31 PM Etsuro Fujita 
> wrote:
> > > Just to be clear, the condition (c1, c2) < (99, 99) is not
> > > equivalent to the condition c1 < 99 and c2 < 99 (see the documentation
> note in [1]).
> >
> > Thanks for sharing this document. I have understood.
> >
> > > but I don't think the main reason for that is that it takes time to
> > > parse expressions.
> 
> I think the only reason that this is not supported is that I hadn't tested 
> such a
> query when developing partition pruning, nor did anyone else suggest doing
> so. :)
> 
> > > Yeah, I think it's great to support row-wise comparison not only
> > > with the small number of args but with the large number of them.
> 
> +1
> 
> > These comments are very helpful.
> > Ok, I try to make POC that allows row-wise comparison with
> partition-pruning.
> 
> That would be great, thank you.
> 
> --
> Amit Langote
> EnterpriseDB: http://www.enterprisedb.com

Re: some more pg_dump refactoring

2020-07-07 Thread Fabien COELHO




Hallo Peter,


Would it make sense to accumulate in the other direction, older to newer,
so that new attributes are added at the end of the select?

I think that could make sense, but the current style was introduced by
daa9fe8a5264a3f192efa5ddee8fb011ad9da365.  Should we revise that?

It seems to me more logical to do it while you're at it, but you are the
one writing the patches:-)


What do you think about this patch to reorganize the existing code from that 
old commit?


I think it is a definite further improvement.

Patch applies cleanly, compiles, global & pg_dump tap test ok, looks ok to 
me.


--
Fabien.

Re: Proposal: Automatic partition creation

2020-07-07 Thread Fabien COELHO



Hello Anastasia,

My 0.02 €:


The patch implements following syntax:

CREATE TABLE ... PARTITION BY partition_method (list_of_columns)
partition_auto_create_clause

where partition_auto_create_clause is

CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec

and partition_bound_spec is:

MODULUS integer | VALUES IN (expr [,...]) [, ] |  INTERVAL range_step 
FROM range_start TO range_end


ISTM That we should avoid new specific syntaxes when possible, and prefer 
free keyword option style, like it is being discussed for some other 
commands, because it reduces the impact on the parser.


That would suggest a more versatile partition_bound_spec which could look 
like  ( [, …]):


For modulus, looks easy:

  (MODULUS 8)

For interval, maybe something like:

  (STEP ..., FROM/START ..., TO/END ...)

The key point is that for dynamic partitioning there would be no need for 
boundaries, so that it could just set a point and an interval


  (START/INIT/FROM??? ..., STEP ...)

For lists of values, probably it would make little sense to have dynamic 
partitioning? Or maybe yes, if we could partition on a column 
value/expression?! eg "MOD(id, 8)"??


What about pg_dump? Should it be able to regenerate the initial create?


[4] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements


Good point, a wiki is better than a thread for that type of things. I'll 
look at this page.


--
Fabien.

Use incremental sort paths for window functions

2020-07-07 Thread David Rowley

Over on [1] someone was asking about chained window paths making use
of already partially sorted input.  (The thread is on -general, so I
guessed they're not using PG13.)

However, On checking PG13 to see if incremental sort would help their
case, I saw it didn't. Looking at the code I saw that
create_window_paths() and create_one_window_path() don't make any use
of incremental sort paths.

I quickly put together the attached. It's only about 15 mins of work,
but it seems worth looking at a bit more for some future commitfest.
Yeah, I'll need to add some tests as I see nothing failed by changing
this.

I'll just park this here until then so I don't forget.

David


incremental_sort_for_windowpaths.patch
Description: Binary data

Re: Proposal: Automatic partition creation

2020-07-07 Thread Amul Sul

On Wed, Jul 8, 2020 at 10:24 AM Fabien COELHO  wrote:
>
>
> Hello Anastasia,
>
> My 0.02 €:
>
> > The patch implements following syntax:
> >
> > CREATE TABLE ... PARTITION BY partition_method (list_of_columns)
> > partition_auto_create_clause
> >
> > where partition_auto_create_clause is
> >
> > CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec
> >
> > and partition_bound_spec is:
> >
> > MODULUS integer | VALUES IN (expr [,...]) [, ] |  INTERVAL range_step
> > FROM range_start TO range_end
>
> ISTM That we should avoid new specific syntaxes when possible, and prefer
> free keyword option style, like it is being discussed for some other
> commands, because it reduces the impact on the parser.
>
> That would suggest a more versatile partition_bound_spec which could look
> like  ( [, …]):
>
> For modulus, looks easy:
>
>(MODULUS 8)
>
> For interval, maybe something like:
>
>(STEP ..., FROM/START ..., TO/END ...)
>
> The key point is that for dynamic partitioning there would be no need for
> boundaries, so that it could just set a point and an interval
>
>(START/INIT/FROM??? ..., STEP ...)
>
> For lists of values, probably it would make little sense to have dynamic
> partitioning? Or maybe yes, if we could partition on a column
> value/expression?! eg "MOD(id, 8)"??
>
> What about pg_dump? Should it be able to regenerate the initial create?
>
I don't think this is needed for the proposed "Automatic partitioning (static)"
which generates a bunch of CREATE TABLE statements, IIUC.  Might be needed later
for "Automatic partitioning (dynamic)" where dynamic specifications need to be
stored.

> > [4] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements
>
> Good point, a wiki is better than a thread for that type of things. I'll
> look at this page.
+1

Regards,
Amul

Re: Global snapshots

2020-07-07 Thread Masahiko Sawada

On Tue, 7 Jul 2020 at 15:40, Amit Kapila  wrote:
>
> On Fri, Jul 3, 2020 at 12:18 PM Masahiko Sawada
>  wrote:
> >
> > On Sat, 20 Jun 2020 at 21:21, Amit Kapila  wrote:
> > >
> > > On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov
> > >  wrote:
> > >
> > > >Also, can you let us know if this
> > > > > supports 2PC in some way and if so how is it different from what the
> > > > > other thread on the same topic [1] is trying to achieve?
> > > > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
> > > > 2PC machinery. Now I'd not judge which approach is better.
> > > >
> > >
> >
> > Sorry for being late.
> >
>
> No problem, your summarization, and comparisons of both approaches are
> quite helpful.
>
> >
> > I studied this patch and did a simple comparison between this patch
> > (0002 patch) and my 2PC patch.
> >
> > In terms of atomic commit, the features that are not implemented in
> > this patch but in the 2PC patch are:
> >
> > * Crash safe.
> > * PREPARE TRANSACTION command support.
> > * Query cancel during waiting for the commit.
> > * Automatically in-doubt transaction resolution.
> >
> > On the other hand, the feature that is implemented in this patch but
> > not in the 2PC patch is:
> >
> > * Executing PREPARE TRANSACTION (and other commands) in parallel
> >
> > When the 2PC patch was proposed, IIRC it was like this patch (0002
> > patch). I mean, it changed only postgres_fdw to support 2PC. But after
> > discussion, we changed the approach to have the core manage foreign
> > transaction for crash-safe. From my perspective, this patch has a
> > minimum implementation of 2PC to work the global snapshot feature and
> > has some missing features important for supporting crash-safe atomic
> > commit. So I personally think we should consider how to integrate this
> > global snapshot feature with the 2PC patch, rather than improving this
> > patch if we want crash-safe atomic commit.
> >
>
> Okay, but isn't there some advantage with this approach (manage 2PC at
> postgres_fdw level) as well which is that any node will be capable of
> handling global transactions rather than doing them via central
> coordinator?  I mean any node can do writes or reads rather than
> probably routing them (at least writes) via coordinator node.

The postgres server where the client started the transaction works as
the coordinator node. I think this is true for both this patch and
that 2PC patch. From the perspective of atomic commit, any node will
be capable of handling global transactions in both approaches.

>  Now, I
> agree that even if this advantage is there in the current approach, we
> can't lose the crash-safety aspect of other approach.  Will you be
> able to summarize what was the problem w.r.t crash-safety and how your
> patch has dealt it?

Since this patch proceeds 2PC without any logging, foreign
transactions prepared on foreign servers are left over without any
clues if the coordinator crashes during commit. Therefore, after
restart, the user will need to find and resolve in-doubt foreign
transactions manually.

In that 2PC patch, the information of foreign transactions is WAL
logged before PREPARE TRANSACTION. So even if the coordinator crashes
after preparing some foreign transactions, the prepared foreign
transactions are recovered during crash recovery, and then the
transaction resolver resolves them automatically or the user also can
resolve them. The user doesn't need to check other participants node
to resolve in-doubt foreign transactions. Also, since the foreign
transaction information is replicated to physical standbys the new
master can take over resolving in-doubt transactions.

>
> > Looking at the commit procedure with this patch:
> >
> > When starting a new transaction on a foreign server, postgres_fdw
> > executes pg_global_snapshot_import() to import the global snapshot.
> > After some work, in pre-commit phase we do:
> >
> > 1. generate global transaction id, say 'gid'
> > 2. execute PREPARE TRANSACTION 'gid' on all participants.
> > 3. prepare global snapshot locally, if the local node also involves
> > the transaction
> > 4. execute pg_global_snapshot_prepare('gid') for all participants
> >
> > During step 2 to 4, we calculate the maximum CSN from the CSNs
> > returned from each pg_global_snapshot_prepare() executions.
> >
> > 5. assign global snapshot locally, if the local node also involves the
> > transaction
> > 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.
> >
> > Then, we commit locally (i.g. mark the current transaction as
> > committed in clog).
> >
> > After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
> > participants.
> >
>
> As per my current understanding, the overall idea is as follows.  For
> global transactions, pg_global_snapshot_prepare('gid') will set the
> transaction status as InDoubt and generate CSN (let's call it NodeCSN)
> at the node where that function is executed, it also returns the
> Nod

Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-07 Thread Masahiko Sawada

On Tue, 7 Jul 2020 at 17:50, Magnus Hagander  wrote:
>
>
>
> On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila  wrote:
>>
>> On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada
>>  wrote:
>> >
>> > On Mon, 6 Jul 2020 at 20:45, Amit Kapila  wrote:
>> > >
>> > > > Second, as long as the unique identifier is the slot name there is no
>> > > > convenient way to distinguish between the same name old and new
>> > > > replication slots, so the backend process or wal sender process sends
>> > > > a message to the stats collector to drop the replication slot at
>> > > > ReplicationSlotDropPtr(). This strategy differs from what we do for
>> > > > table, index, and function statistics. It might not be a problem but
>> > > > I’m thinking a better way.
>> > > >
>> > >
>> > > Can we rely on message ordering in the transmission mechanism (UDP)
>> > > for stats?  The wiki suggests [1] we can't.  If so, then this might
>> > > not work.
>> >
>> > Yeah, I'm also concerned about this. Another idea would be to have
>> > another unique identifier to distinguish old and new replication slots
>> > with the same name. For example, creation timestamp. And then we
>> > reclaim the stats of unused slots later like table and function
>> > statistics.
>> >
>>
>> So, we need to have 'creation timestamp' as persistent data for slots
>> to achieve this?  I am not sure of adding creation_time as a parameter
>> to identify for this case because users can change timings on systems
>> so it might not be a bullet-proof method but I agree that it can work
>> in general.
>
>
> If we need them to be persistent across time like that, perhaps we simply 
> need to assign oids to replication slots? That might simplify this problem 
> quite a bit?

Yeah, I guess assigning oids to replication slots in the same way of
oids in system catalogs might not work because physical replication
slot can be created even during recovery. But using a
monotonically-increasing integer as id seems better and straight
forward. This id is not necessarily displayed in pg_repliation_slots
view because the user already can use slot name as a unique
identifier.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: A patch for get origin from commit_ts.

2020-07-07 Thread Michael Paquier

On Wed, Jul 08, 2020 at 10:11:28AM +0800, movead...@highgo.ca wrote:
> Yes that's is the right way, I can see it's 'roident' in pg_replication_origin
> catalog too.
> What's your v6 patch based on, I can not apply it.

There is a conflict in catversion.h.  If you wish to test the patch,
please feel free to use the attached where I have updated the
attribute name to roident.
--
Michael
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d951b4a36f..c8b2baa490 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5946,12 +5946,20 @@
   prorettype => 'timestamptz', proargtypes => 'xid',
   prosrc => 'pg_xact_commit_timestamp' },
 
+{ oid => '8456',
+  descr => 'get commit timestamp and replication origin of a transaction',
+  proname => 'pg_xact_commit_timestamp_origin', provolatile => 'v',
+  prorettype => 'record', proargtypes => 'xid',
+  proallargtypes => '{xid,timestamptz,oid}', proargmodes => '{i,o,o}',
+  proargnames => '{xid,timestamp,roident}',
+  prosrc => 'pg_xact_commit_timestamp_origin' },
+
 { oid => '3583',
-  descr => 'get transaction Id and commit timestamp of latest transaction commit',
+  descr => 'get transaction Id, commit timestamp and replication origin of latest transaction commit',
   proname => 'pg_last_committed_xact', provolatile => 'v',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{xid,timestamptz}', proargmodes => '{o,o}',
-  proargnames => '{xid,timestamp}', prosrc => 'pg_last_committed_xact' },
+  proallargtypes => '{xid,timestamptz,oid}', proargmodes => '{o,o,o}',
+  proargnames => '{xid,timestamp,roident}', prosrc => 'pg_last_committed_xact' },
 
 { oid => '3537', descr => 'get identification of SQL object',
   proname => 'pg_describe_object', provolatile => 's', prorettype => 'text',
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9cdb136435..0771133868 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -417,28 +417,38 @@ pg_xact_commit_timestamp(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ * pg_last_committed_xact
+ *
+ * SQL-callable wrapper to obtain some information about the latest
+ * committed transaction: transaction ID, timestamp and replication
+ * origin.
+ */
 Datum
 pg_last_committed_xact(PG_FUNCTION_ARGS)
 {
 	TransactionId xid;
+	RepOriginId nodeid;
 	TimestampTz ts;
-	Datum		values[2];
-	bool		nulls[2];
+	Datum		values[3];
+	bool		nulls[3];
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 
 	/* and construct a tuple with our data */
-	xid = GetLatestCommitTsData(&ts, NULL);
+	xid = GetLatestCommitTsData(&ts, &nodeid);
 
 	/*
 	 * Construct a tuple descriptor for the result row.  This must match this
 	 * function's pg_proc entry!
 	 */
-	tupdesc = CreateTemplateTupleDesc(2);
+	tupdesc = CreateTemplateTupleDesc(3);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid",
 	   XIDOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "timestamp",
 	   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "roident",
+	   OIDOID, -1, 0);
 	tupdesc = BlessTupleDesc(tupdesc);
 
 	if (!TransactionIdIsNormal(xid))
@@ -452,6 +462,9 @@ pg_last_committed_xact(PG_FUNCTION_ARGS)
 
 		values[1] = TimestampTzGetDatum(ts);
 		nulls[1] = false;
+
+		values[2] = ObjectIdGetDatum(nodeid);
+		nulls[2] = false;
 	}
 
 	htup = heap_form_tuple(tupdesc, values, nulls);
@@ -459,6 +472,54 @@ pg_last_committed_xact(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
 }
 
+/*
+ * pg_xact_commit_timestamp_origin
+ *
+ * SQL-callable wrapper to obtain commit timestamp and origin of a given
+ * transaction.
+ */
+Datum
+pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
+{
+	TransactionId xid = PG_GETARG_UINT32(0);
+	RepOriginId nodeid;
+	TimestampTz ts;
+	Datum		values[2];
+	bool		nulls[2];
+	TupleDesc	tupdesc;
+	HeapTuple	htup;
+	bool		found;
+
+	found = TransactionIdGetCommitTsData(xid, &ts, &nodeid);
+
+	/*
+	 * Construct a tuple descriptor for the result row.  This must match this
+	 * function's pg_proc entry!
+	 */
+	tupdesc = CreateTemplateTupleDesc(2);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "timestamp",
+	   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "roident",
+	   OIDOID, -1, 0);
+	tupdesc = BlessTupleDesc(tupdesc);
+
+	if (!found)
+	{
+		nulls[0] = true;
+		nulls[1] = true;
+	}
+	else
+	{
+		values[0] = TimestampTzGetDatum(ts);
+		nulls[0] = false;
+		values[1] = ObjectIdGetDatum(nodeid);
+		nulls[1] = false;
+	}
+
+	htup = heap_form_tuple(tupdesc, values, nulls);
+
+	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
+}
 
 /*
  * Number of shared CommitTS buffers.
diff --git a/src/test/modules/commit_ts/expected/commit_timestamp.out b/src/test/modules/commit_ts/expected/commit_timestamp.out
index 5b7783b58f..97588c9895 100644
--- a/src/test/modules/commit_ts/expected/commit_timestamp.out
+++ b/src/test/modules/commit_ts/expected/commit_timestamp

Re: Performing partition pruning using row value

2020-07-07 Thread Fujii Masao





On 2020/07/08 13:25, kato-...@fujitsu.com wrote:

Amit-san

On Wednesday, July 8, 2020 11:53 AM, Amit Langote :

I think the only reason that this is not supported is that I hadn't tested such 
a
query when developing partition pruning, nor did anyone else suggest doing
so. :)


Seems we can do partition pruning even in Kato-san's case by dong

create type hoge as (c1 int, c2 int);
create table a( c1 int, c2 int, c3 varchar) partition by range(((c1, 
c2)::hoge));
create table a1 partition of a for values from((0, 0)) to ((100, 100));
create table a2 partition of a for values from((100, 100)) to ((200, 200));
explain select * from a where (c1, c2)::hoge < (99, 99)::hoge;

I'm not sure if this method is officially supported or not, though...

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Modifying data type of slot_keep_segs from XLogRecPtr to XLogSegNo

2020-07-07 Thread Fujii Masao




On 2020/07/08 11:55, torikoshia wrote:

On 2020-07-08 11:15, Fujii Masao wrote:

On 2020/07/08 11:02, torikoshia wrote:

Hi,

Currently, slot_keep_segs is defined as "XLogRecPtr" in KeepLogSeg(),
but it seems that should be "XLogSegNo" because this variable is
segment number.

How do you think?


I agree that using XLogRecPtr for slot_keep_segs is incorrect.
But this variable indicates the number of segments rather than
segment no, uint64 seems better. Thought?


That makes sense.
The number of segments and segment number are different.


Yes, so patch attached. I will commit it later.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index c2feb92576..3ebf8caee9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9601,7 +9601,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
/* Cap by max_slot_wal_keep_size ... */
if (max_slot_wal_keep_size_mb >= 0)
{
-   XLogRecPtr  slot_keep_segs;
+   uint64  slot_keep_segs;
 
slot_keep_segs =
ConvertToXSegs(max_slot_wal_keep_size_mb, 
wal_segment_size);

Re: Collation versioning

2020-07-07 Thread Michael Paquier

On Wed, Jul 08, 2020 at 06:12:51PM +1200, Thomas Munro wrote:
> I still wish I had a better idea than this:
> 
> +/*
> + * Returns whether the given index access method depend on a stable collation
> + * order.
> + */
> +static bool
> +index_depends_stable_coll_order(Oid amoid)
> +{
> +   return (amoid != HASH_AM_OID &&
> +   strcmp(get_am_name(amoid), "bloom") != 0);
> +}
> 
> I'm doing some more testing and looking for weird cases...  More soon.

Wouldn't the normal way to track that a new field in IndexAmRoutine?
What you have here is not extensible.
--
Michael


signature.asc
Description: PGP signature

RE: Performing partition pruning using row value

2020-07-07 Thread kato-...@fujitsu.com

Fujii-san

Wednesday, July 8, 2020 3:20 PM, Fujii Masao  
wrote:
> Seems we can do partition pruning even in Kato-san's case by dong
> 
> create type hoge as (c1 int, c2 int);
> create table a( c1 int, c2 int, c3 varchar) partition by range(((c1, 
> c2)::hoge));
> create table a1 partition of a for values from((0, 0)) to ((100, 100)); 
> create table
> a2 partition of a for values from((100, 100)) to ((200, 200)); explain select 
> * from
> a where (c1, c2)::hoge < (99, 99)::hoge;

I hadn't thought of it that way. Thanks.

Regards, 
Sho kato
> -Original Message-
> From: Fujii Masao 
> Sent: Wednesday, July 8, 2020 3:20 PM
> To: Kato, Sho/加藤 翔 ; 'Amit Langote'
> 
> Cc: Etsuro Fujita ; PostgreSQL-development
> 
> Subject: Re: Performing partition pruning using row value
> 
> 
> 
> On 2020/07/08 13:25, kato-...@fujitsu.com wrote:
> > Amit-san
> >
> > On Wednesday, July 8, 2020 11:53 AM, Amit Langote
> :
> >> I think the only reason that this is not supported is that I hadn't
> >> tested such a query when developing partition pruning, nor did anyone
> >> else suggest doing so. :)
> 
> Seems we can do partition pruning even in Kato-san's case by dong
> 
> create type hoge as (c1 int, c2 int);
> create table a( c1 int, c2 int, c3 varchar) partition by range(((c1, 
> c2)::hoge));
> create table a1 partition of a for values from((0, 0)) to ((100, 100)); 
> create table
> a2 partition of a for values from((100, 100)) to ((200, 200)); explain select 
> * from
> a where (c1, c2)::hoge < (99, 99)::hoge;
> 
> I'm not sure if this method is officially supported or not, though...
> 
> Regards,
> 
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION

90 matches

Mail list logo