date:20200507

Re: Implementing Incremental View Maintenance

2020-05-07 Thread Tatsuo Ishii

>> +1,  This is a smart idea.   How did you test it?  AFAIK, we can test it
> with:
> 
> 1.  For any query like SELECT xxx,  we create view like CREATE MATERIAL VIEW
> mv_name as SELECT xxx;  to test if the features in the query are supported.

No I didn't test the correctness of IVM with TPC-DS data for now.
TPC-DS comes with a data generator and we can test IVM something like:

SELECT * FROM IVM_vew EXCEPT SELECT ... (TPC-DS original query);

If this produces 0 row, then the IVM is correct for the initial data.
(of course actually we need to add appropreate ORDER BY and LIMIT
clause to the SELECT statement for IVM if neccessary).

> 2.  Update the data and then compare the result with SELECT XXX with SELECT
> * from  mv_name to test if the data is correctly sync.

I wanted to test the data updating but I am still struggling how to
extract correct updating data from TPC-DS data set.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Logical replication subscription owner

2020-05-07 Thread Kyotaro Horiguchi

At Fri, 8 May 2020 01:02:11 -0400, Alvaro Herrera  
wrote in 
> On 2020-May-07, Tom Lane wrote:
> 
> > FWIW, I would argue that LOGIN permits logging in on a regular SQL
> > connection, while REPLICATION should permit logging in on a
> > replication connection, and there's no reason for either to depend on
> > or require the other.
> 
> I agree with this.

I agree, too.  Anyway, it is unreasonable that a user is banned for
the lack of replication-attribute after a successful *replication*
login.

LOG:  replication connection authorized: user=user1 application_name=psql
FATAL:  must be superuser or replication role to start walsender

> > >> Also- what about per-database connections?  Does having REPLICATION mean
> > >> you get to override the CONNECT privileges on a database, if you're
> > >> connecting for the purposes of doing logical replication?
> > 
> > No, why would it?  Should LOGIN privilege mean you can override
> > CONNECT?  That's nonsense.  You need the respective privilege
> > to connect with the protocol you want to connect with, and you
> > also need CONNECT on the DB you want to connect to.
> 
> And this.

A user can start physical replication without needing CONNECT on any
database if it has REPLICATION attribute.  That means any user that
is allowed logical replication on a specific database (or even no
databases) can replicate the whole cluster using physical replication.
I don't think it is a proper behavior from the security perspective.

It seems to me that we need to restrict physical replication to
require CONNECT privilege on all databases, or separate physical
replication privilege from logical replication privilege.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Fix pg_buffercache document

2020-05-07 Thread Masahiko Sawada

On Fri, 8 May 2020 at 13:36, Amit Kapila  wrote:
>
> On Thu, May 7, 2020 at 4:02 PM Amit Kapila  wrote:
> >
> > On Thu, May 7, 2020 at 2:53 PM Masahiko Sawada
> >  wrote:
> > >
> > > >
> > > > There is a typo in the patch (queris/queries).  How about if slightly
> > > > reword it as "Since buffer manager locks are not taken to copy the
> > > > buffer state data that the view will display, accessing
> > > > pg_buffercache view has less impact on normal
> > > > buffer activity but it doesn't provide a consistent set of results
> > > > across all buffers.  However, we ensure that the information of each
> > > > buffer is self-consistent."?
> > >
> > > Thank you for your idea. Agreed.
> > >
> > > Attached the updated version patch.
> > >
> >
> > LGTM.  I will commit this tomorrow unless there are more comments.
> >
>
> Pushed.
>

Thank you!

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.

2020-05-07 Thread Fujii Masao





On 2020/05/07 17:57, Amit Kapila wrote:

On Thu, May 7, 2020 at 12:13 PM Fujii Masao  wrote:


On 2020/05/02 20:40, Amit Kapila wrote:


I don't see any obvious problem with the changed code but we normally
don't backpatch performance improvements.  I can see that the code
change here appears to be straight forward so it might be fine to
backpatch this.  Have we seen similar reports earlier as well?  AFAIK,
this functionality is for a long time and if people were facing this
on a regular basis then we would have seen such reports multiple
times.  I mean to say if the chances of this hitting are less then we
can even choose not to backpatch this.


I found the following two reports. ISTM there are not so many reports...
https://www.postgresql.org/message-id/16159-f5a34a3a04dc6...@postgresql.org
https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru



The first seems to be the same where this bug has been fixed.  It has
been moved to hackers in email [1].


Yes, that's the original report that leaded to the commit.


 Am, I missing something?
Considering it has been encountered by two different people, I think
it would not be a bad idea to back-patch this.


+1 So I will do the back-patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Implementing Incremental View Maintenance

2020-05-07 Thread Andy Fan

On Fri, May 8, 2020 at 9:13 AM Tatsuo Ishii  wrote:

> >> Hi,
> >>
> >> Attached is the latest patch (v15) to add support for Incremental
> Materialized
> >> View Maintenance (IVM). It is possible to apply to current latest
> master branch.
>
> I have tried to use IVM against TPC-DS (http://www.tpc.org/tpcds/)
> queries.  TPC-DS models decision support systems and those queries are
> modestly complex. So I thought applying IVM to those queries could
> show how IVM covers real world queries.
>
> +1,  This is a smart idea.   How did you test it?  AFAIK, we can test it
with:

1.  For any query like SELECT xxx,  we create view like CREATE MATERIAL VIEW
mv_name as SELECT xxx;  to test if the features in the query are supported.
2.  Update the data and then compare the result with SELECT XXX with SELECT
* from  mv_name to test if the data is correctly sync.

Best Regards
Andy Fan

Re: Logical replication subscription owner

2020-05-07 Thread Alvaro Herrera

On 2020-May-07, Tom Lane wrote:

> FWIW, I would argue that LOGIN permits logging in on a regular SQL
> connection, while REPLICATION should permit logging in on a
> replication connection, and there's no reason for either to depend on
> or require the other.

I agree with this.

> >> Also- what about per-database connections?  Does having REPLICATION mean
> >> you get to override the CONNECT privileges on a database, if you're
> >> connecting for the purposes of doing logical replication?
> 
> No, why would it?  Should LOGIN privilege mean you can override
> CONNECT?  That's nonsense.  You need the respective privilege
> to connect with the protocol you want to connect with, and you
> also need CONNECT on the DB you want to connect to.

And this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fix pg_buffercache document

2020-05-07 Thread Amit Kapila

On Thu, May 7, 2020 at 4:02 PM Amit Kapila  wrote:
>
> On Thu, May 7, 2020 at 2:53 PM Masahiko Sawada
>  wrote:
> >
> > >
> > > There is a typo in the patch (queris/queries).  How about if slightly
> > > reword it as "Since buffer manager locks are not taken to copy the
> > > buffer state data that the view will display, accessing
> > > pg_buffercache view has less impact on normal
> > > buffer activity but it doesn't provide a consistent set of results
> > > across all buffers.  However, we ensure that the information of each
> > > buffer is self-consistent."?
> >
> > Thank you for your idea. Agreed.
> >
> > Attached the updated version patch.
> >
>
> LGTM.  I will commit this tomorrow unless there are more comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PG 13 release notes, first draft

2020-05-07 Thread Noah Misch

On Thu, May 07, 2020 at 09:38:34AM -0400, Bruce Momjian wrote:
> On Wed, May  6, 2020 at 10:20:57PM -0700, Noah Misch wrote:
> > On Wed, May 06, 2020 at 07:40:25PM -0400, Bruce Momjian wrote:
> > > On Tue, May  5, 2020 at 11:39:10PM -0700, Noah Misch wrote:
> > > > On Mon, May 04, 2020 at 11:16:00PM -0400, Bruce Momjian wrote:
> > > > > Allow skipping of WAL for new tables and indexes if wal_level is 
> > > > > 'minimal' (Noah Misch)
> > > > 
> > > > Kyotaro Horiguchi authored that one.  (I committed it.)  The commit 
> > > > message
> > > > noted characteristics, some of which may deserve mention in the notes:
> > > 
> > > Fixed.
> > 
> > I don't see that change pushed (but it's not urgent).
> 
> I got stuck on Amit's partition items and my head couldn't process any
> more, so I went to bed, and just committed it now.  I was afraid to have
> pending stuff uncommitted, but I am also hesitant to do a commit for
> each change.

Got it, +1 for batching such changes.

> > > > - Crash recovery was losing tuples written via COPY TO.  This fixes the 
> > > > bug.
> > > 
> > > This was not backpatched?
> > 
> > Right.
> 
> Oh.  So you are saying we could lose COPY data on a crash, even after a
> commit.  That seems bad.  Can you show me the commit info?  I can't find
> it.

commit c6b9204
Author: Noah Misch 
AuthorDate: Sat Apr 4 12:25:34 2020 -0700
Commit: Noah Misch 
CommitDate: Sat Apr 4 12:25:34 2020 -0700

Skip WAL for new relfilenodes, under wal_level=minimal.

Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.ga9...@svana.org

Re: Logical replication subscription owner

2020-05-07 Thread Tom Lane

Alvaro Herrera  writes:
> I'd welcome input from other people on this issue; only now I noticed
> that it's buried in pgsql-docs, so CCing pgsql-hackers now.

FWIW, I would argue that LOGIN permits logging in on a regular SQL
connection, while REPLICATION should permit logging in on a
replication connection, and there's no reason for either to depend on
or require the other.

> On 2020-Apr-23, Stephen Frost wrote:
>> Also- what about per-database connections?  Does having REPLICATION mean
>> you get to override the CONNECT privileges on a database, if you're
>> connecting for the purposes of doing logical replication?

No, why would it?  Should LOGIN privilege mean you can override
CONNECT?  That's nonsense.  You need the respective privilege
to connect with the protocol you want to connect with, and you
also need CONNECT on the DB you want to connect to.

regards, tom lane

Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators

2020-05-07 Thread Kyotaro Horiguchi

At Fri, 8 May 2020 11:31:42 +0900, Fujii Masao  
wrote in 
> >> You mean that pg_lsn_pli() and pg_lsn_mii() should emit an error like
> >> "the number of bytes to add/subtract cannnot be NaN" when NaN is
> >> specified?
> > The function is called while executing an expression, so "NaN cannot
> > be used in this expression" or something like that would work.
> 
> This sounds ambiguous. I like to use clearer messages like
> 
> cannot add NaN to pg_lsn
> cannot subtract NaN from pg_lsn

They works fine to me.

> >> I'm not sure if int128 is available in every environments.
> > In second thought, I found that we don't have enough substitute
> > functions for the platforms without a native implement.  Instead,
> > there are some overflow-safe uint64 math functions, that is,
> > pg_add/sub_u64_overflow. This patch defines numeric_pg_lsn which is
> > substantially numeric_uint64.  By using them, for example, we can make
> > pg_lsn_pli mainly with integer arithmetic as follows.
> 
> Sorry, I'm not sure what the benefit of this approach...

(If we don't allow negative nbytes,)
We accept numeric so that the operators can accept values out of range
of int64, but we don't need to perform all arithmetic in numeric. That
approach does less numeric arithmetic, that is, faster and simpler.
We don't need to string'ify LSN with it. That avoid stack consumption.

> > If invalid values are given as the addend, the following message would
> > make sense.
> > =# select '1/1::pg_lsn + 'NaN'::numeric;
> > ERROR:  cannot use NaN in this expression
> > =# select '1/1::pg_lsn + '-1'::numeric;
> > ERROR:  numeric value out of range for this expression
> 
> Could you tell me why we should reject this calculation?
> IMO it's ok to add the negative number, and which is possible
> with the latest patch.

Sorry, I misread the patch as it rejected -1 for *nbytes*, by seeing
numeric_pg_lsn.

Finally, I'm convinced that we lack required integer arithmetic
infrastructure to perform the objective.

The patch looks good to me except the size of buf[], but I don't
strongly object to that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PG 13 release notes, first draft

2020-05-07 Thread Amit Langote

On Fri, May 8, 2020 at 2:06 AM Bruce Momjian  wrote:
> On Fri, May  8, 2020 at 12:32:16AM +0900, Amit Langote wrote:
> > c8434d64c implements a new feature whereby, to use partitionwise join,
> > partition bounds of the tables being joined no longer have to match
> > exactly.  I think it might be better to mention this explicitly
> > because it enables partitionwise joins to be used in more partitioning
> > setups.
>
> Well, the text says:
>
> Allow partitionwise joins to happen in more cases (Ashutosh Bapat,
> Etsuro Fujita, Amit Langote, Tom Lane)
>
> Isn't that what you just said?  I just added this paragraph:
>
> For example, partitionwise joins can now happen between partitioned
> tables where the ancestors do not exactly match.
>
> Does that help?

Yes, although "ancestors do not exactly match" doesn't make clear what
about partitioned tables doesn't match.   "partition bounds do not
exactly match" would.

> > > 
> > > Previously, partitions had to be replicated individually.  Now
> > > partitioned tables can be published explicitly causing all 
> > > partitions
> > > to be automatically published.  Addition/removal of partitions 
> > > from
> > > partitioned tables are automatically added/removed on subscribers.
> > > The CREATE PUBLICATION option publish_via_partition_root controls 
> > > whether
> > > partitioned tables are published as themselves or their ancestors.
> > > 
> >
> > Thanks.  Sounds good except I think the last sentence should read:
> >
> > ...controls whether partition changes are published as their own or as
> > their ancestor's.
>
> OK, done.

Hmm, I see that you only took "as their own".

- ...controls whether partitioned tables are published as themselves
or their ancestors.
+ ...controls whether partitioned tables are published as their own or
their ancestors.

and that makes the new sentence sound less clear.  I mainly wanted
"partitioned table" replaced by "partition", because only then the
phrase "as their own or their ancestor's" would make sense.

I know our partitioning terminology can be very confusing with many
terms including at least "partitioned table", "partition", "ancestor",
"leaf partition", "parent", "child", etc. that I see used.

> > > 
> > >
> > > 
> > > 
> > >
> > > 
> > > Allow non-partitioned tables to be logically replicated to 
> > > subscribers
> > > that receive the rows into partitioned tables (Amit Langote)
> > > 
> >
> > Hmm, why it make it sound like this works only if the source table is
> > non-partitioned?  The source table can be anything, a regular
> > non-partitioned table, or a partitioned one.
>
> Well, we already covered the publish partitioned case in the above item.
>
> > How about:
> >
> > Allow logical replication into partitioned tables on subscribers
> >
> > Previously, it was allowed only into regular [ non-partitioned ] tables.
>
> OK, I used this wording:
>
> Allow logical replication into partitioned tables on subscribers (Amit
> Langote)
>
> Previously, subscribers could only receive rows into non-partitioned
> tables.

This is fine, thanks.

I have attached a patch with my suggestions above.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


partition-item-wording.patch
Description: Binary data

Re: PG 13 release notes, first draft

2020-05-07 Thread Michael Paquier

On Mon, May 04, 2020 at 11:16:00PM -0400, Bruce Momjian wrote:
> I have committed the first draft of the PG 13 release notes.  You can
> see them here:
> 
>   https://momjian.us/pgsql_docs/release-13.html
> 
> It still needs markup, word wrap, and indenting.  The community doc
> build should happen in a few hours.

Thanks Bruce for compiling the release notes.  Here are some comments
from me, after looking at the state of the notes as of f2ff203.

Should e2e02191 be added to the notes?  This commit means that we
actually dropped support for Windows 2000 (finally) at run-time.

At the same time I see no mention of 79dfa8af, which added better
error handling when backends the SSL context with incorrect bounds.

What about fc8cb94, which basically means that vacuumlo and oid2name
are able to now support coloring output for their logging?

Document color support (Peter Eisentraut)

[...]

THIS WAS NOT DOCUMENTED BEFORE?

Not sure that there is a point to add that to the release notes.
--
Michael

signature.asc
Description: PGP signature

Re: 2pc leaks fds

2020-05-07 Thread Kyotaro Horiguchi

At Thu, 7 May 2020 19:28:55 -0400, Alvaro Herrera  
wrote in 
> On 2020-Apr-27, Kyotaro Horiguchi wrote:
> 
> > At Fri, 24 Apr 2020 11:48:46 -0400, Alvaro Herrera 
> >  wrote in 
> 
> > Sorry for the ambiguity, I didn't meant I minded that this conflicts
> > with my patch or I don't want this to be committed. It is easily
> > rebased on this patch.  What I was anxious about is that the new
> > callback struct might be too flexible than required. So I "mildly"
> > objected, and I won't be dissapointed if this patch is committed.
> 
> ... well, yeah, maybe it is too flexible.  And perhaps we could further
> tweak this interface so that the file descriptor is not part of
> XLogReader at all -- with such a change, it would make more sense to
> worry about the "close" callback not being "close" but something like
> "cleanup", as you suggest.  But right now, and thinking from the point
> of view of going into postgres 13 beta shortly, it seems to me that
> XLogReader is just a very leaky abstraction since both itself and its
> users are aware of the fact that there is a file descriptor.

Agreed.

> Maybe with your rework for encryption you'll want to remove the FD from
> XLogReader at all, and move it elsewhere.  Or maybe not.  But it seems
> to me that my suggested approach is sensible, and better than the
> current situation.  (Let's keep in mind that the primary concern here is
> that the callstack is way too complicated -- you ask XlogReader for
> data, it calls your Read callback, that one calls WALRead passing your
> openSegment callback and stuffs the FD in XLogReaderState ... a sieve it
> is, the way it leaks, not an abstraction.)

I agree that new callback functions is most sensible for getting into
13, of course.

> > > I have to admit that until today I hadn't realized that that's what your
> > > patch series was doing.  I'm not familiar with how you intend to
> > > implement WAL encryption on top of this, but on first blush I'm not
> > > liking this proposed design too much.
> > 
> > Right. I might be too much in detail, but it simplifies the call
> > tree. Anyway that is another discussion, though:)
> 
> Okay.  We can discuss further changes later, of course.
> 
> > It looks like as if the open/read/close-callbacks are generic and on
> > the same interface layer, but actually open-callback is dedicate to
> > WALRead and it is useless when the read-callback doesn't use
> > WALRead. What I was anxious about is that the open-callback is
> > uselessly exposing the secret of the read-callback.
> 
> Well, I don't think we care about that.  WALRead can be thought of as
> just a helper function that you may use to write your read callback.
> The comments I added explain this.

Thanks.

> > I meant concretely that we only have read- and cleanup- callbacks in
> > xlogreader state.  The caller of XLogReaderAllocate specifies the
> > cleanup-callback that is to be used to clean up what the
> > reader-callback left behind, in the same manner with this patch does.
> > The only reason it is not named close-callback is that it is used only
> > when the xlogreader-state is destroyed. So I'm fine with
> > read/close-callbacks.
> 
> We can revisit the current design in the future.  For example for
> encryption we might decide to remove the current-open-segment FD from
> XLogReaderState and then things might be different.  (I think the
> current design is based a lot on historical code, rather than being
> optimal.)
> 
> Since your objection isn't strong, I propose to commit same patch as
> before, only rebased as conflicted with cd123234404e and this comment
> prologuing WALRead:
> 
> /*
>  * Helper function to ease writing of XLogRoutine->page_read callbacks.
>  * If this function is used, caller must supply an open_segment callback in
>  * 'state', as that is used here.
>   [... rest is same as before ...]

I agree to the direction of this patch. Thanks for the explanation.
The patch looks good to me except the two points below.


+   /* XXX for xlogreader use, we'd call XLogBeginRead+XLogReadRecord here 
*/
+   if (!WALRead(&fake_xlogreader,
+&output_message.data[output_message.len],

I'm not sure the point of the XXX comment, but I think WALRead here is
the right thing and we aren't going to use
XLogBeginRead+XLogReadRecord here.  So it seems to me the comment is
misleading and instead we need such a comment for fake_xlogreader like
this.

+   static XLogReaderState fake_xlogreader =
+   {
+   /* fake reader state only to let WALRead to use the callbacks */


wal_segment_close(XlogReaderState *state) is setting
state->seg.ws_file to -1.  On the other hand wal_segment_close(state,..)
doesn't update ws_file and the caller sets the returned value to
(eventually) the same field.

+   seg->ws_file = state->routine.segment_open(state, 
nextSegNo,
+

Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators

2020-05-07 Thread Fujii Masao





On 2020/05/08 10:00, Kyotaro Horiguchi wrote:

At Thu, 7 May 2020 13:17:01 +0900, Fujii Masao  
wrote in



On 2020/05/07 11:21, Kyotaro Horiguchi wrote:

+static bool
+numericvar_to_uint64(const NumericVar *var, uint64 *result)
Other numricvar_to_xxx() functions return an integer value that means
success by 0 and failure by -1, which is one of standard signature of
this kind of functions.  I don't see a reason for this function to
have different signatures from them.


Unless I'm missing something, other functions also return boolean.
For example,

static bool numericvar_to_int32(const NumericVar *var, int32 *result);
static bool numericvar_to_int64(const NumericVar *var, int64 *result);


Mmm.




+   /* XXX would it be better to return NULL? */
+   if (NUMERIC_IS_NAN(num))
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot convert NaN to pg_lsn")));
The ERROR seems perfect to me since NaN is out of the domain of
LSN. log(-1) results in a similar error.
On the other hand, the code above makes the + operator behave as the
follows.
=# SELECT '1/1'::pg_lsn + 'NaN'::numeric;
ERROR:  cannot convert NaN to pg_lsn
This looks somewhat different from what actually wrong is.


You mean that pg_lsn_pli() and pg_lsn_mii() should emit an error like
"the number of bytes to add/subtract cannnot be NaN" when NaN is
specified?


The function is called while executing an expression, so "NaN cannot
be used in this expression" or something like that would work.


This sounds ambiguous. I like to use clearer messages like

cannot add NaN to pg_lsn
cannot subtract NaN from pg_lsn


+   charbuf[256];
+
+   /* Convert to numeric */
+   snprintf(buf, sizeof(buf), UINT64_FORMAT, lsn);
The values larger than 2^64 is useless. So 32 (or any value larger
than 21) is enough for the buffer length.


Could you tell me what the actual problem is when buf[256] is used?


It's just a waste of stack depth by over 200 bytes. I doesn't lead to
an actual problem but it is evidently useless.


By the way coudln't we use int128 instead for internal arithmetic?  I
think that makes the code simpler.


I'm not sure if int128 is available in every environments.


In second thought, I found that we don't have enough substitute
functions for the platforms without a native implement.  Instead,
there are some overflow-safe uint64 math functions, that is,
pg_add/sub_u64_overflow. This patch defines numeric_pg_lsn which is
substantially numeric_uint64.  By using them, for example, we can make
pg_lsn_pli mainly with integer arithmetic as follows.


Sorry, I'm not sure what the benefit of this approach...



Datum
pg_lsn_pli(..)
{
 XLogRecPtr  lsn = PG_GETARG_LSN(0);
 Datum   num_nbytes = PG_GETARG_DATUM(1);
 Datum   u64_nbytes =
 DatumGetInt64(DirectFunctionCall1(numeric_pg_lsn, num_nbytes));
 XLogRecPtr  result;

 if (pg_add_u64_overflow(lsn, u64_nbytes, &result))
 elog(ERROR, "result out of range");

PG_RETURN_LSN(result);
}

If invalid values are given as the addend, the following message would
make sense.

=# select '1/1::pg_lsn + 'NaN'::numeric;
ERROR:  cannot use NaN in this expression
=# select '1/1::pg_lsn + '-1'::numeric;
ERROR:  numeric value out of range for this expression


Could you tell me why we should reject this calculation?
IMO it's ok to add the negative number, and which is possible
with the latest patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

2020-05-07 Thread Andy Fan

Hi Ashutosh:

Appreciate for your comments!

On Thu, May 7, 2020 at 7:26 PM Ashutosh Bapat 
wrote:

> Hi Andy,
> Sorry for delay in review.


I  understand no one has obligation to do that,  and it must take
reviewer's time
and more, so really appreciated for it!  Hope I can provide more kindly
help like
this in future as well.


> Your earlier patches are very large and it
> requires some time to review those. I didn't finish reviewing those
> but here are whatever comments I have till now on the previous set of
> patches. Please see if any of those are useful to the new set.
>
> Yes, I just realized the size as well, so I split them to smaller commit
and
each commit and be build and run make check successfully.

All of your comments still valid except the last one
(covert_subquery_uniquekeys)
which has been fixed v7 version.


>
> +/*
> + * Return true iff there is an equal member in target for every
> + * member in members
>
> Suggest reword: return true iff every entry in "members" list is also
> present
> in the "target" list.


Will do, thanks!


> This function doesn't care about multi-sets, so please
> mention that in the prologue clearly.
>
> +
> +if (root->parse->hasTargetSRFs)
> +return;
>
> Why? A relation's uniqueness may be useful well before we work on SRFs.
>
>
Looks I misunderstand when the SRF function is executed.  I will fix this
in v8.

+
> +if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> +/*
> + * Set UniqueKey on member rel is useless, we have to recompute
> it at
> + * upper level, see populate_partitionedrel_uniquekeys for
> reference
> + */
> +return;
>
> Handling these here might help in bottom up approach. We annotate each
> partition here and then annotate partitioned table based on the individual
> partitions. Same approach can be used for partitioned join produced by
> partitionwise join.
>
> +/*
> + * If the unique index doesn't contain partkey, then it is unique
> + * on this partition only, so it is useless for us.
> + */
>
> Not really. It could help partitionwise join.
>
> +
> +/* Now we have the unique index list which as exactly same on all
> childrels,
> + * Set the UniqueIndex just like it is non-partition table
> + */
>
> I think it's better to annotate each partition with whatever unique index
> it
> has whether or not global. That will help partitionwise join, partitionwise
> aggregate/group etc.
>

Excellent catch! All the 3 items above is partitionwise join related, I
need some time
to check how to handle that.


>
> +/* A Normal group by without grouping set */
> +if (parse->groupClause)
> +add_uniquekey_from_sortgroups(root,
> +  grouprel,
> +  root->parse->groupClause);
>
> Those keys which are part of groupClause and also form unique keys in the
> input
> relation, should be recorded as unique keys in group rel. Knowing the
> minimal
> set of keys allows more optimizations.
>

This is a very valid point now. I ignored this because I wanted to remove
the AggNode
totally if the part of groupClause is unique,  However it doesn't happen
later if there is
aggregation call in this query.


> +
> +foreach(lc,  unionrel->reltarget->exprs)
> +{
> +exprs = lappend(exprs, lfirst(lc));
> +colnos = lappend_int(colnos, i);
> +i++;
> +}
>
> This should be only possible when it's not UNION ALL. We should add some
> assert
> or protection for that.
>

OK, actually I called this function in generate_union_paths. which handle
UNION case only.  I will add the Assert anyway.


>
> +
> +/* Fast path */
> +if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
> +return;
> +
> +outer_is_onerow = relation_is_onerow(outerrel);
> +inner_is_onerow = relation_is_onerow(innerrel);
> +
> +outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
> outerrel);
> +innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
> innerrel);
> +
> +clause_list = gather_mergeable_joinclauses(joinrel, outerrel,
> innerrel,
> +   restrictlist, jointype);
>
> Something similar happens in select_mergejoin_clauses().


I didn't realized this before.  I will refactor this code accordingly if
necessary
after reading that.


> At least from the
> first reading of this patch, I get an impression that all these functions
> which
> are going through clause lists and index lists should be merged into other
> functions which go through these lists hunting for some information useful
> to
> the optimizer.
>
+
> +
> +if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list,
> false))
> +{
> +foreach(lc, innerrel_ukey_ctx)
> +{
> +UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
> +if (!list_is_subset(ctx->uniquekey->exprs,
> jo

Re: Logical replication subscription owner

2020-05-07 Thread Alvaro Herrera

I'd welcome input from other people on this issue; only now I noticed
that it's buried in pgsql-docs, so CCing pgsql-hackers now.


On 2020-Apr-23, Stephen Frost wrote:

> Greetings,
> 
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Alvaro Herrera  writes:
> > > I had it in my mind that LOGIN was for regular (SQL-based) login, and
> > > REPLICATION was for replication login, and that they were orthogonal.
> > 
> > Yeah, that's what I would've expected.  Otherwise, is REPLICATION
> > without LOGIN useful at all?
> 
> No, but it's less surprising, at least to me, for all roles that login
> to require having the LOGIN right.  Having REPLICATION ignore that would
> be surprising (and a change from today).  Maybe if we called it
> REPLICATIONLOGIN or something along those lines it would be less
> surprising, but it still seems pretty awkward.
> 
> I view REPLICATION as allowing a specific kind of connection, but you
> first need to be able to login.
> 
> Also- what about per-database connections?  Does having REPLICATION mean
> you get to override the CONNECT privileges on a database, if you're
> connecting for the purposes of doing logical replication?
> 
> I agree we could do better in these areas, but I'd argue that's mostly
> around improving the documentation rather than baking in implications
> that one privilege implies another.  We certainly get people who
> complain about getting a permission denied error when they have UPDATE
> rights on a table (but not SELECT) and they include a WHERE clause in
> their update statement, but that doesn't mean we should assume that
> having UPDATE rights means you also get SELECT rights, just because
> UPDATE is next to useless without SELECT.
> 
> Thanks,
> 
> Stephen




-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Why are wait events not reported even though it reads/writes a timeline history file?

2020-05-07 Thread Fujii Masao





On 2020/05/07 21:24, Michael Paquier wrote:

On Thu, May 07, 2020 at 04:51:16PM +0900, Fujii Masao wrote:

Yeah, so I updated the patch so that ferror() is called only
when fgets() returns NULL. Attached is the updated version of
the patch.


Thanks for the new patch.  LGTM.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Implementing Incremental View Maintenance

2020-05-07 Thread Tatsuo Ishii

>> Hi, 
>> 
>> Attached is the latest patch (v15) to add support for Incremental 
>> Materialized
>> View Maintenance (IVM). It is possible to apply to current latest master 
>> branch.

I have tried to use IVM against TPC-DS (http://www.tpc.org/tpcds/)
queries.  TPC-DS models decision support systems and those queries are
modestly complex. So I thought applying IVM to those queries could
show how IVM covers real world queries.

Since IVM does not support queries including ORDER BY and LIMIT, I
removed them from the queries before the test.

Here are some facts so far learned in this attempt.

- Number of TPC-DS query files is 99.
- IVM was successfully applied to 20 queries.
- 33 queries failed because they use WITH clause (CTE) (currenly IVM does not 
support CTE).
- Error messages from failed queries (except those using WITH) are below:
  (the number indicates how many queries failed by the same reason)

11   aggregate functions in nested query are not supported on incrementally 
maintainable materialized view
8window functions are not supported on incrementally maintainable 
materialized view
7UNION/INTERSECT/EXCEPT statements are not supported on incrementally 
maintainable materialized view
5WHERE clause only support subquery with EXISTS clause
3GROUPING SETS, ROLLUP, or CUBE clauses is not supported on 
incrementally maintainable materialized view
3aggregate function and EXISTS condition are not supported at the same 
time
2GROUP BY expression not appeared in select list is not supported on 
incrementally maintainable materialized view
2aggregate function with DISTINCT arguments is not supported on 
incrementally maintainable materialized view
2aggregate is not supported with outer join
1aggregate function stddev_samp(integer) is not supported on 
incrementally maintainable materialized view
1HAVING clause is not supported on incrementally maintainable 
materialized view
1subquery is not supported with outer join
1   column "avg" specified more than once

Attached are the queries IVM are successfully applied.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


IVM_sucessfull_queries.tar.gz
Description: Binary data

Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators

2020-05-07 Thread Kyotaro Horiguchi

At Thu, 7 May 2020 13:17:01 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/05/07 11:21, Kyotaro Horiguchi wrote:
> > +static bool
> > +numericvar_to_uint64(const NumericVar *var, uint64 *result)
> > Other numricvar_to_xxx() functions return an integer value that means
> > success by 0 and failure by -1, which is one of standard signature of
> > this kind of functions.  I don't see a reason for this function to
> > have different signatures from them.
> 
> Unless I'm missing something, other functions also return boolean.
> For example,
> 
> static bool numericvar_to_int32(const NumericVar *var, int32 *result);
> static bool numericvar_to_int64(const NumericVar *var, int64 *result);

Mmm. 

> 
> > +   /* XXX would it be better to return NULL? */
> > +   if (NUMERIC_IS_NAN(num))
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +errmsg("cannot convert NaN to pg_lsn")));
> > The ERROR seems perfect to me since NaN is out of the domain of
> > LSN. log(-1) results in a similar error.
> > On the other hand, the code above makes the + operator behave as the
> > follows.
> > =# SELECT '1/1'::pg_lsn + 'NaN'::numeric;
> > ERROR:  cannot convert NaN to pg_lsn
> > This looks somewhat different from what actually wrong is.
> 
> You mean that pg_lsn_pli() and pg_lsn_mii() should emit an error like
> "the number of bytes to add/subtract cannnot be NaN" when NaN is
> specified?

The function is called while executing an expression, so "NaN cannot
be used in this expression" or something like that would work.


> > +   charbuf[256];
> > +
> > +   /* Convert to numeric */
> > +   snprintf(buf, sizeof(buf), UINT64_FORMAT, lsn);
> > The values larger than 2^64 is useless. So 32 (or any value larger
> > than 21) is enough for the buffer length.
> 
> Could you tell me what the actual problem is when buf[256] is used?

It's just a waste of stack depth by over 200 bytes. I doesn't lead to
an actual problem but it is evidently useless.

> > By the way coudln't we use int128 instead for internal arithmetic?  I
> > think that makes the code simpler.
> 
> I'm not sure if int128 is available in every environments.

In second thought, I found that we don't have enough substitute
functions for the platforms without a native implement.  Instead,
there are some overflow-safe uint64 math functions, that is,
pg_add/sub_u64_overflow. This patch defines numeric_pg_lsn which is
substantially numeric_uint64.  By using them, for example, we can make
pg_lsn_pli mainly with integer arithmetic as follows.

Datum
pg_lsn_pli(..)
{
XLogRecPtr  lsn = PG_GETARG_LSN(0);
Datum   num_nbytes = PG_GETARG_DATUM(1);
Datum   u64_nbytes =
DatumGetInt64(DirectFunctionCall1(numeric_pg_lsn, num_nbytes));
XLogRecPtr  result;

if (pg_add_u64_overflow(lsn, u64_nbytes, &result))
elog(ERROR, "result out of range");

PG_RETURN_LSN(result);
}

If invalid values are given as the addend, the following message would
make sense.

=# select '1/1::pg_lsn + 'NaN'::numeric;
ERROR:  cannot use NaN in this expression
=# select '1/1::pg_lsn + '-1'::numeric;
ERROR:  numeric value out of range for this expression

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Improving estimates for TPC-H Q2

2020-05-07 Thread Tomas Vondra


Hi,

I've been re-running the TPC-H benchmark, to remind myself the common
issues with OLAP workloads, and one of the most annoying problems seems
to be the misestimates in Q2. The query is not particularly complex,
although it does have a correlated subquery with an aggregate, but it's
one of the queries prone to a cascade of nested loops running forever.
I wonder if there's something we could do to handle this better.

A raw Q2 looks like this:

select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 16
and p_type like '%NICKEL'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey;

and the full query plan is attached (q2-original-plan.txt).

The relevant part of the plan is the final join, which also considers
the subplan result (all estimates are for scale 10):

   ->  Merge Join  (cost=638655.36..1901120.61 rows=1 width=192)
   (actual time=7299.121..10993.517 rows=4737 loops=1)
 Merge Cond: (part.p_partkey = partsupp.ps_partkey)
 Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
 Rows Removed by Join Filter: 1661

Yeah, this is estimated as 1 row but actually returns 4737 rows. All
the other nodes are estimated very accurately, it's just this final join
that is entirely wrong.

If you tweak the costs a bit (e.g. reducing random_page_cost etc.) the
plan can easily switch to nested loops, with this join much deeper in
the plan. See the attached q2-nested-loops.txt for an example (I had to
disable merge/hash joins to trigger this on scale 10, on larger scales
it can happen much easier).

Now, the query seems a bit complex, but we can easily simplify it by
creating an extra table and reducing the number of joins:

create table foo as select
  *
from
partsupp,
supplier,
nation,
region
where
s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA';

reate index on t (ps_partkey);

which allows us to rewrite Q2 like this (this also ditches the ORDER BY

and LIMIT clauses):

select
1
from
part,
t
where
p_partkey = ps_partkey
and p_size = 16
and p_type like '%NICKEL'
and ps_supplycost = (
select
min(ps_supplycost)
from t
where
p_partkey = ps_partkey
);

in fact, we can ditch even the conditions on p_size/p_type which makes
the issue even more severe:

select
1
from
part,
t
where
p_partkey = ps_partkey
and ps_supplycost = (
select
min(ps_supplycost)
from t
where
p_partkey = ps_partkey
);

with the join estimated like this:

   Hash Join  (cost=89761.10..1239195.66 rows=17 width=4)
  (actual time=15379.356..29315.436 rows=1182889 loops=1)
 Hash Cond: ((t.ps_partkey = part.p_partkey) AND
 (t.ps_supplycost = (SubPlan 1)))

Yeah, that's underestimated by a factor of 7 :-(

An interesting observation is that if you remove the condition on supply
cost (with the correlated subquery), the estimates get perfect atain. So
this seems to be about this particular condition, or how we combine the
selectivities ...

I'm not sure I've figured all the details yet, but this seems to be due
to a dependency between the ps_partkey and ps_supplycost columns.

When estimating the second condition, we end up calling eqjoinsel()
with SubPlan and Var arguments. We clearly won't have ndistinct of MCVs
for the SubPlan, so we use

nd1 = 200;   /* default */
nd2 = 94005; /* n_distinct for t.ps_supplycost */

and end up (thanks to eqjoinsel_inner and no NULLs in data) with

selec_inner = 0.1 = Min(1/nd1, 1/nd2)

But that's entirely bogus, because while there are ~100k distinct values
in t.ps_supplycost, those are for *all* ps_partkey values combined. But
each ps_partkey value has only about ~1.4 distinct ps_supplycost values
on average:

select avg(x) from (select count(distinct ps_suppl

Re: 2pc leaks fds

2020-05-07 Thread Alvaro Herrera

On 2020-Apr-27, Kyotaro Horiguchi wrote:

> At Fri, 24 Apr 2020 11:48:46 -0400, Alvaro Herrera  
> wrote in 

> Sorry for the ambiguity, I didn't meant I minded that this conflicts
> with my patch or I don't want this to be committed. It is easily
> rebased on this patch.  What I was anxious about is that the new
> callback struct might be too flexible than required. So I "mildly"
> objected, and I won't be dissapointed if this patch is committed.

... well, yeah, maybe it is too flexible.  And perhaps we could further
tweak this interface so that the file descriptor is not part of
XLogReader at all -- with such a change, it would make more sense to
worry about the "close" callback not being "close" but something like
"cleanup", as you suggest.  But right now, and thinking from the point
of view of going into postgres 13 beta shortly, it seems to me that
XLogReader is just a very leaky abstraction since both itself and its
users are aware of the fact that there is a file descriptor.

Maybe with your rework for encryption you'll want to remove the FD from
XLogReader at all, and move it elsewhere.  Or maybe not.  But it seems
to me that my suggested approach is sensible, and better than the
current situation.  (Let's keep in mind that the primary concern here is
that the callstack is way too complicated -- you ask XlogReader for
data, it calls your Read callback, that one calls WALRead passing your
openSegment callback and stuffs the FD in XLogReaderState ... a sieve it
is, the way it leaks, not an abstraction.)

> > I have to admit that until today I hadn't realized that that's what your
> > patch series was doing.  I'm not familiar with how you intend to
> > implement WAL encryption on top of this, but on first blush I'm not
> > liking this proposed design too much.
> 
> Right. I might be too much in detail, but it simplifies the call
> tree. Anyway that is another discussion, though:)

Okay.  We can discuss further changes later, of course.

> It looks like as if the open/read/close-callbacks are generic and on
> the same interface layer, but actually open-callback is dedicate to
> WALRead and it is useless when the read-callback doesn't use
> WALRead. What I was anxious about is that the open-callback is
> uselessly exposing the secret of the read-callback.

Well, I don't think we care about that.  WALRead can be thought of as
just a helper function that you may use to write your read callback.
The comments I added explain this.

> I meant concretely that we only have read- and cleanup- callbacks in
> xlogreader state.  The caller of XLogReaderAllocate specifies the
> cleanup-callback that is to be used to clean up what the
> reader-callback left behind, in the same manner with this patch does.
> The only reason it is not named close-callback is that it is used only
> when the xlogreader-state is destroyed. So I'm fine with
> read/close-callbacks.

We can revisit the current design in the future.  For example for
encryption we might decide to remove the current-open-segment FD from
XLogReaderState and then things might be different.  (I think the
current design is based a lot on historical code, rather than being
optimal.)

Since your objection isn't strong, I propose to commit same patch as
before, only rebased as conflicted with cd123234404e and this comment
prologuing WALRead:

/*
 * Helper function to ease writing of XLogRoutine->page_read callbacks.
 * If this function is used, caller must supply an open_segment callback in
 * 'state', as that is used here.
  [... rest is same as before ...]

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From fbe94ae201bc3963c71be65cb30f820da9591aeb Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 23 Apr 2020 18:02:20 -0400
Subject: [PATCH v2] Rework XLogReader callback system

segment_open and segment_close are passed in at XLogReaderAllocate time,
together with page_read; add comment to explain the role of WALRead.
---
 src/backend/access/transam/twophase.c |   5 +-
 src/backend/access/transam/xlog.c |  10 +-
 src/backend/access/transam/xlogreader.c   |  51 
 src/backend/access/transam/xlogutils.c|  24 ++--
 src/backend/replication/logical/logical.c |  20 +--
 .../replication/logical/logicalfuncs.c|   4 +-
 src/backend/replication/slotfuncs.c   |  10 +-
 src/backend/replication/walsender.c   |  37 --
 src/bin/pg_rewind/parsexlog.c |   9 +-
 src/bin/pg_waldump/pg_waldump.c   |  30 +++--
 src/include/access/xlogreader.h   | 114 +++---
 src/include/access/xlogutils.h|   5 +
 src/include/replication/logical.h |   4 +-
 13 files changed, 210 insertions(+), 113 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 2f7d4ed59a..e1904877fa 100644
--- a/src/backend/access/

Re: Incremental sorts and EXEC_FLAG_REWIND

2020-05-07 Thread Jonathan S. Katz

On 5/7/20 5:07 PM, Tomas Vondra wrote:
> On Thu, May 07, 2020 at 03:58:47PM -0400, James Coleman wrote:
>> On Thu, May 7, 2020 at 2:57 PM Jonathan S. Katz 
>> wrote:
>>>
>>> ...
>>>
>>> With Beta 1 just around the corner[1], I wanted to check in to see if
>>> this was closer to being committed so we could close off the open
>>> item[2] prior the beta release.
>>>
>>> Thanks!
>>>
>>> Jonathan
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/b782a4ec-5e8e-21a7-f628-624be683e...@postgresql.org
>>>
>>> [2] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
>>
>>
>> Tangential, I know, but I think we should consider [1] (includes a
>> very minor patch) part of the open items on incremental sort also:
>>
> 
> Yes, I do plan to get both those issues fixed in the next coupld of days
> (certainly in time for beta 1).

Excellent - thank you!

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: pg_restore error message

2020-05-07 Thread Ranier Vilela

Em qui., 7 de mai. de 2020 às 18:54, Euler Taveira <
euler.tave...@2ndquadrant.com> escreveu:

> Hi,
>
> While investigating a pg_restore error, I stumbled upon a message that is
> not so useful.
>
> pg_restore: error: could not close data file: No such file or directory
>
> Which file? File name should be printed too like in the error check for
> cfopen_read a few lines above.
>
Can suggest improvements?

1. free (398 line) must be pg_free(buf)';
2. %m, is a format to parameter, right?
But what parameter? Both fatal call, do not pass this parameter, or is
it implied?

regards,
Ranier Vilela

pg_restore error message

2020-05-07 Thread Euler Taveira

Hi,

While investigating a pg_restore error, I stumbled upon a message that is
not so useful.

pg_restore: error: could not close data file: No such file or directory

Which file? File name should be printed too like in the error check for
cfopen_read a few lines above.

Regards,

-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From f3853512e827952402020b2b0f3003ac8c5c9d96 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Thu, 7 May 2020 18:17:28 -0300
Subject: [PATCH] pg_restore failure message does not provide filename

An error message like "pg_restore: error: could not close data file: No
such file or directory" is not informative without a file name. Since
error message for 'open' provides the file name (a few lines above), we
should also add it to 'close' error check.
---
 src/bin/pg_dump/pg_backup_directory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index c9cce5ed8a..f178d6ac21 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -397,7 +397,7 @@ _PrintFileData(ArchiveHandle *AH, char *filename)
 
 	free(buf);
 	if (cfclose(cfp) !=0)
-		fatal("could not close data file: %m");
+		fatal("could not close data file \"%s\": %m", filename);
 }
 
 /*
-- 
2.20.1

Re: [PATCH] fix GIN index search sometimes losing results

2020-05-07 Thread Tom Lane

Pavel Borisov  writes:
> It appeared than GIN index sometimes lose results if simultaneously:
> 1 if query operand contains weight marks
> 2 if weight-marked operand is negated by ! operator
> 3 if there are only logical (not phrase) operators from this negation
> towards the root of query tree.

Nice catch ... but if you try it with a GIST index, that fails too.

Even if it were only GIN indexes, this patch is an utter hack.
It might accidentally work for the specific case of NOT with
a single QI_VAL node as argument, but not for anything more
complicated.

I think the root of the problem is that if we have a query using
weights, and we are testing tsvector data that lacks positions/weights,
we can never say there's definitely a match.  I don't see any decently
clean way to fix this without redefining the TSExecuteCallback API
to return a tri-state YES/NO/MAYBE result, because really we need to
decide that it's MAYBE at the level of processing the QI_VAL node,
not later on.  I'd tried to avoid that in e81e5741a, but maybe we
should just bite that bullet, and not worry about whether there's
any third-party code providing its own TSExecuteCallback routine.
codesearch.debian.net suggests that there are no external callers
of TS_execute, so maybe we can get away with that.

regards, tom lane

Re: Incremental sorts and EXEC_FLAG_REWIND

2020-05-07 Thread Tomas Vondra


On Thu, May 07, 2020 at 03:58:47PM -0400, James Coleman wrote:

On Thu, May 7, 2020 at 2:57 PM Jonathan S. Katz  wrote:


...

With Beta 1 just around the corner[1], I wanted to check in to see if
this was closer to being committed so we could close off the open
item[2] prior the beta release.

Thanks!

Jonathan

[1]
https://www.postgresql.org/message-id/b782a4ec-5e8e-21a7-f628-624be683e...@postgresql.org
[2] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items



Tangential, I know, but I think we should consider [1] (includes a
very minor patch) part of the open items on incremental sort also:



Yes, I do plan to get both those issues fixed in the next coupld of days
(certainly in time for beta 1).

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Incremental sorts and EXEC_FLAG_REWIND

2020-05-07 Thread James Coleman

On Thu, May 7, 2020 at 2:57 PM Jonathan S. Katz  wrote:
>
> On 4/24/20 6:57 PM, Tomas Vondra wrote:
> > On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
> >> On Sun, Apr 19, 2020 at 12:14 PM James Coleman  wrote:
> >>>
> >>> On Wed, Apr 15, 2020 at 2:04 PM James Coleman  wrote:
> >>> >
> >>> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman 
> >>> wrote:
> >>> > >
> >>> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier
> >>>  wrote:
> >>> > > >
> >>> > > > Hi,
> >>> > > >
> >>> > > > When initializing an incremental sort node, we have the
> >>> following as
> >>> > > > of ExecInitIncrementalSort():
> >>> > > > /*
> >>> > > >  * Incremental sort can't be used with either
> >>> EXEC_FLAG_REWIND,
> >>> > > >  * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only
> >>> one of many sort
> >>> > > >  * batches in the current sort state.
> >>> > > >  */
> >>> > > >  Assert((eflags & (EXEC_FLAG_BACKWARD |
> >>> > > >EXEC_FLAG_MARK)) == 0);
> >>> > > > While I don't quite follow why EXEC_FLAG_REWIND should be
> >>> allowed here
> >>> > > > to begin with (because incremental sorts don't support rescans
> >>> without
> >>> > > > parameter changes, right?), the comment and the assertion are
> >>> telling
> >>> > > > a different story.
> >>> > >
> >>> > > I remember changing this assertion in response to an issue I'd found
> >>> > > which led to rewriting the rescan implementation, but I must have
> >>> > > missed updating the comment.
> >>> >
> >>> > All right, here are the most relevant messages:
> >>> >
> >>> > [1]: Here I'd said:
> >>> > --
> >>> > While working on finding a test case to show rescan isn't implemented
> >>> > properly yet, I came across a bug. At the top of
> >>> > ExecInitIncrementalSort, we assert that eflags does not contain
> >>> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
> >>> > disabled) breaks that assertion:
> >>> >
> >>> > select * from t join (select * from t order by a, b) s on s.a = t.a
> >>> > where t.a in (1,2);
> >>> >
> >>> > The comments about this flag in src/include/executor/executor.h say:
> >>> >
> >>> > * REWIND indicates that the plan node should try to efficiently
> >>> support
> >>> > * rescans without parameter changes. (Nodes must support ExecReScan
> >>> calls
> >>> > * in any case, but if this flag was not given, they are at liberty
> >>> to do it
> >>> > * through complete recalculation. Note that a parameter change
> >>> forces a
> >>> > * full recalculation in any case.)
> >>> >
> >>> > Now we know that except in rare cases (as just discussed recently up
> >>> > thread) we can't implement rescan efficiently.
> >>> >
> >>> > So is this a planner bug (i.e., should we try not to generate
> >>> > incremental sort plans that require efficient rewind)? Or can we just
> >>> > remove that part of the assertion and know that we'll implement the
> >>> > rescan, albeit inefficiently? We already explicitly declare that we
> >>> > don't support backwards scanning, but I don't see a way to declare the
> >>> > same for rewind.
> >>> > --
> >>> >
> >>> > So it seems to me that we can't disallow REWIND, and we have to
> >>> > support rescan, but, we could try to mitigate the effects (without a
> >>> > param change) with a materialize node, as noted below.
> >>> >
> >>> > [2]: Here, in response to my questioning above if this was a planner
> >>> > bug, I'd said:
> >>> > --
> >>> > Other nodes seem to get a materialization node placed above them to
> >>> > support this case "better". Is that something we should be doing?
> >>> > --
> >>> >
> >>> > I never got any reply on this point; if we _did_ introduce a
> >>> > materialize node here, then it would mean we could start disallowing
> >>> > REWIND again. See the email for full details of a specific plan that I
> >>> > encountered that reproduced this.
> >>> >
> >>> > Thoughts?
> >>> >
> >>> > > In the meantime, your question is primarily about making sure the
> >>> > > code/comments/etc. are consistent and not a behavioral problem or
> >>> > > failure you've seen in testing?
> >>> >
> >>> > Still want to confirm this is the case.
> >>> >
> >>> > James
> >>> >
> >>> > [1]:
> >>> https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
> >>>
> >>> > [2]:
> >>> https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
> >>>
> >>>
> >>> Looking at this more, I think this is definitely suspect. The current
> >>> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
> >>> the former is definitely fine because we declare that we don't support
> >>> backwards scans. The latter seems like the same reasoning would apply,
> >>> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
> >>> attached a patch to do that.
> >>>
> >>> The EXEC_FLAG_REWIND situation though I'm still not clear

Re: Incremental sorts and EXEC_FLAG_REWIND

2020-05-07 Thread Jonathan S. Katz

On 4/24/20 6:57 PM, Tomas Vondra wrote:
> On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
>> On Sun, Apr 19, 2020 at 12:14 PM James Coleman  wrote:
>>>
>>> On Wed, Apr 15, 2020 at 2:04 PM James Coleman  wrote:
>>> >
>>> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman 
>>> wrote:
>>> > >
>>> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier
>>>  wrote:
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > When initializing an incremental sort node, we have the
>>> following as
>>> > > > of ExecInitIncrementalSort():
>>> > > > /*
>>> > > >  * Incremental sort can't be used with either
>>> EXEC_FLAG_REWIND,
>>> > > >  * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only
>>> one of many sort
>>> > > >  * batches in the current sort state.
>>> > > >  */
>>> > > >  Assert((eflags & (EXEC_FLAG_BACKWARD |
>>> > > >    EXEC_FLAG_MARK)) == 0);
>>> > > > While I don't quite follow why EXEC_FLAG_REWIND should be
>>> allowed here
>>> > > > to begin with (because incremental sorts don't support rescans
>>> without
>>> > > > parameter changes, right?), the comment and the assertion are
>>> telling
>>> > > > a different story.
>>> > >
>>> > > I remember changing this assertion in response to an issue I'd found
>>> > > which led to rewriting the rescan implementation, but I must have
>>> > > missed updating the comment.
>>> >
>>> > All right, here are the most relevant messages:
>>> >
>>> > [1]: Here I'd said:
>>> > --
>>> > While working on finding a test case to show rescan isn't implemented
>>> > properly yet, I came across a bug. At the top of
>>> > ExecInitIncrementalSort, we assert that eflags does not contain
>>> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
>>> > disabled) breaks that assertion:
>>> >
>>> > select * from t join (select * from t order by a, b) s on s.a = t.a
>>> > where t.a in (1,2);
>>> >
>>> > The comments about this flag in src/include/executor/executor.h say:
>>> >
>>> > * REWIND indicates that the plan node should try to efficiently
>>> support
>>> > * rescans without parameter changes. (Nodes must support ExecReScan
>>> calls
>>> > * in any case, but if this flag was not given, they are at liberty
>>> to do it
>>> > * through complete recalculation. Note that a parameter change
>>> forces a
>>> > * full recalculation in any case.)
>>> >
>>> > Now we know that except in rare cases (as just discussed recently up
>>> > thread) we can't implement rescan efficiently.
>>> >
>>> > So is this a planner bug (i.e., should we try not to generate
>>> > incremental sort plans that require efficient rewind)? Or can we just
>>> > remove that part of the assertion and know that we'll implement the
>>> > rescan, albeit inefficiently? We already explicitly declare that we
>>> > don't support backwards scanning, but I don't see a way to declare the
>>> > same for rewind.
>>> > --
>>> >
>>> > So it seems to me that we can't disallow REWIND, and we have to
>>> > support rescan, but, we could try to mitigate the effects (without a
>>> > param change) with a materialize node, as noted below.
>>> >
>>> > [2]: Here, in response to my questioning above if this was a planner
>>> > bug, I'd said:
>>> > --
>>> > Other nodes seem to get a materialization node placed above them to
>>> > support this case "better". Is that something we should be doing?
>>> > --
>>> >
>>> > I never got any reply on this point; if we _did_ introduce a
>>> > materialize node here, then it would mean we could start disallowing
>>> > REWIND again. See the email for full details of a specific plan that I
>>> > encountered that reproduced this.
>>> >
>>> > Thoughts?
>>> >
>>> > > In the meantime, your question is primarily about making sure the
>>> > > code/comments/etc. are consistent and not a behavioral problem or
>>> > > failure you've seen in testing?
>>> >
>>> > Still want to confirm this is the case.
>>> >
>>> > James
>>> >
>>> > [1]:
>>> https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
>>>
>>> > [2]:
>>> https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
>>>
>>>
>>> Looking at this more, I think this is definitely suspect. The current
>>> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
>>> the former is definitely fine because we declare that we don't support
>>> backwards scans. The latter seems like the same reasoning would apply,
>>> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
>>> attached a patch to do that.
>>>
>>> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
>>> the comments/docs seem to suggest it's a hint for efficiency rather
>>> than something that has to work or be declared as not implemented, so
>>> it seems like one of the following should be the outcome:
>>>
>>> 1. "Disallow" it by only generating materialize nodes above

Re: PG 13 release notes, first draft

2020-05-07 Thread Peter Geoghegan

Hi Bruce,

On Mon, May 4, 2020 at 8:16 PM Bruce Momjian  wrote:
> I have committed the first draft of the PG 13 release notes.  You can
> see them here:
>
> https://momjian.us/pgsql_docs/release-13.html

I see that you have an entry for the deduplication feature:

"More efficiently store duplicates in btree indexes (Anastasia
Lubennikova, Peter Geoghegan)"

I would like to provide some input on this. Fortunately it's much
easier to explain than the B-Tree work that went into Postgres 12. I
think that you should point out that deduplication works by storing
the duplicates in the obvious way: Only storing the key once per
distinct value (or once per distinct combination of values in the case
of multi-column indexes), followed by an array of TIDs (i.e. a posting
list). Each TID points to a separate row in the table.

It won't be uncommon for this to make indexes as much as 3x smaller
(it depends on a number of different factors that you can probably
guess). I wrote a summary of how it works for power users in the
B-Tree documentation chapter, which you might want to link to in the
release notes:

https://www.postgresql.org/docs/devel/btree-implementation.html#BTREE-DEDUPLICATION

Users that pg_upgrade will have to REINDEX to actually use the
feature, regardless of which version they've upgraded from. There are
also some limited caveats about the data types that can use
deduplication, and stuff like that -- see the documentation section I
linked to.

Finally, you might want to note that the feature is enabled by
default, and can be disabled by setting the "deduplicate_items" index
storage option to "off". (We have yet to make a final decision on
whether the feature should be enabled before the first stable release
of Postgres 13, though -- I have an open item for that.)

-- 
Peter Geoghegan

Re: pg_basebackup misses to report checksum error

2020-05-07 Thread Ashwin Agrawal

On Thu, May 7, 2020 at 10:25 AM Stephen Frost  wrote:

> Greetings,
>
> * Ashwin Agrawal (aagra...@pivotal.io) wrote:
> > On Wed, May 6, 2020 at 3:02 PM Robert Haas 
> wrote:
> > > On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal 
> wrote:
> > > > If pg_basebackup is not able to read BLCKSZ content from file, then
> it
> > > > just emits a warning "could not verify checksum in file "" block
> > > > X: read buffer size X and page size 8192 differ" currently but misses
> > > > to error with "checksum error occurred". Only if it can read 8192 and
> > > > checksum mismatch happens will it error in the end.
> > >
> > > I don't think it's a good idea to conflate "hey, we can't checksum
> > > this because the size is strange" with "hey, the checksum didn't
> > > match". Suppose the a file has 1000 full blocks and a partial block.
> > > All 1000 blocks have good checksums. With your change, ISTM that we'd
> > > first emit a warning saying that the checksum couldn't be verified,
> > > and then we'd emit a second warning saying that there was 1 checksum
> > > verification failure, which would also be reported to the stats
> > > system. I don't think that's what we want.
> >
> > I feel the intent of reporting "total checksum verification failure" is
> to
> > report corruption. Which way is the secondary piece of the puzzle. Not
> > being able to read checksum itself to verify is also corruption and is
> > checksum verification failure I think. WARNINGs will provide fine grained
> > clarity on what type of checksum verification failure it is, so I am not
> > sure we really need fine grained clarity in "total numbers" to
> > differentiate these two types.
>
> Are we absolutely sure that there's no way for a partial block to end up
> being seen by pg_basebackup, which is just doing routine filesystem
> read() calls, during normal operation though..?  Across all platforms?
>

Okay, that's a good point, I didn't think about it. This comment to skip
verifying checksum, I suppose convinces, can't be sure and hence can't
report partial blocks as corruption.

/*
 * Only check pages which have not been modified since the
  * start of the base backup. Otherwise, they might have been
  * written only halfway and the checksum would not be valid.
  * However, replaying WAL would reinstate the correct page in
  * this case. We also skip completely new pages, since they
  * don't have a checksum yet.
  */

Might be nice to have a similar comment for the partial block case to
document why we can't report it as corruption. Thanks.

Re: PG 13 release notes, first draft

2020-05-07 Thread Chapman Flack

On 05/07/20 09:46, Bruce Momjian wrote:
> Ah, very good point.  New text is:
> 
>   Allow Unicode escapes, e.g., E'\u', in databases that do not
>   use UTF-8 encoding (Tom Lane)
> 
>   The Unicode characters must be available in the database encoding.
> ...
> 
> I am only using E'\u' as an example.

Hmm, how about:

Allow Unicode escapes, e.g., E'\u' or U&'\', to represent
any character available in the database encoding, even when that
encoding is not UTF-8.

which I suggest as I recall more clearly that the former condition
was not that such escapes were always rejected in other encodings; it was
that they were rejected if they represented characters outside of ASCII.
(Yossarian let out a respectful whistle.)

My inclination is to give at least one example each of the E and U&
form, if only so the casual reader of the notes may think "say! I hadn't
heard of that other form!" and be inspired to find out about it. But
perhaps it seems too much.

Regards,
-Chap

Re: pg_basebackup misses to report checksum error

2020-05-07 Thread Stephen Frost

Greetings,

* Ashwin Agrawal (aagra...@pivotal.io) wrote:
> On Wed, May 6, 2020 at 3:02 PM Robert Haas  wrote:
> > On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal  wrote:
> > > If pg_basebackup is not able to read BLCKSZ content from file, then it
> > > just emits a warning "could not verify checksum in file "" block
> > > X: read buffer size X and page size 8192 differ" currently but misses
> > > to error with "checksum error occurred". Only if it can read 8192 and
> > > checksum mismatch happens will it error in the end.
> >
> > I don't think it's a good idea to conflate "hey, we can't checksum
> > this because the size is strange" with "hey, the checksum didn't
> > match". Suppose the a file has 1000 full blocks and a partial block.
> > All 1000 blocks have good checksums. With your change, ISTM that we'd
> > first emit a warning saying that the checksum couldn't be verified,
> > and then we'd emit a second warning saying that there was 1 checksum
> > verification failure, which would also be reported to the stats
> > system. I don't think that's what we want.
> 
> I feel the intent of reporting "total checksum verification failure" is to
> report corruption. Which way is the secondary piece of the puzzle. Not
> being able to read checksum itself to verify is also corruption and is
> checksum verification failure I think. WARNINGs will provide fine grained
> clarity on what type of checksum verification failure it is, so I am not
> sure we really need fine grained clarity in "total numbers" to
> differentiate these two types.

Are we absolutely sure that there's no way for a partial block to end up
being seen by pg_basebackup, which is just doing routine filesystem
read() calls, during normal operation though..?  Across all platforms?

We certainly don't want to end up reporting a false positive by saying
that there's been corruption when it was just that the file was getting
extended and a read() happened to catch an incomplete write(), or
something along those lines.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: pg_basebackup misses to report checksum error

2020-05-07 Thread Ashwin Agrawal

On Wed, May 6, 2020 at 3:02 PM Robert Haas  wrote:

> On Wed, May 6, 2020 at 5:48 PM Ashwin Agrawal  wrote:
> > If pg_basebackup is not able to read BLCKSZ content from file, then it
> > just emits a warning "could not verify checksum in file "" block
> > X: read buffer size X and page size 8192 differ" currently but misses
> > to error with "checksum error occurred". Only if it can read 8192 and
> > checksum mismatch happens will it error in the end.
>
> I don't think it's a good idea to conflate "hey, we can't checksum
> this because the size is strange" with "hey, the checksum didn't
> match". Suppose the a file has 1000 full blocks and a partial block.
> All 1000 blocks have good checksums. With your change, ISTM that we'd
> first emit a warning saying that the checksum couldn't be verified,
> and then we'd emit a second warning saying that there was 1 checksum
> verification failure, which would also be reported to the stats
> system. I don't think that's what we want.

I feel the intent of reporting "total checksum verification failure" is to
report corruption. Which way is the secondary piece of the puzzle. Not
being able to read checksum itself to verify is also corruption and is
checksum verification failure I think. WARNINGs will provide fine grained
clarity on what type of checksum verification failure it is, so I am not
sure we really need fine grained clarity in "total numbers" to
differentiate these two types.

Not reporting anything to the stats system and at end reporting there is
checksum failure would be more weird, right? Or will say
ERRCODE_DATA_CORRUPTED with some other message and not checksum
verification failure.

There might be an argument
> for making this code trigger...
>
> ereport(ERROR,
> (errcode(ERRCODE_DATA_CORRUPTED),
>  errmsg("checksum verification failure during base
> backup")));
>
> ...but I wouldn't for that reason inflate the number of blocks that
> are reported as having failures.
>

When checksum verification is turned on and the issue is detected, I
strongly feel ERROR must be triggered as silently reporting success doesn't
seem right.
I can introduce one more variable just to capture and track files with such
cases. But will we report them separately to stats? How? Also, do we want
to have separate WARNING for the total number of files in this category?
Those all seem slight complications but if wind is blowing in that
direction, I am ready to fly that way.

Re: PG 13 release notes, first draft

2020-05-07 Thread Alexander Korotkov

On Thu, May 7, 2020 at 2:46 AM Bruce Momjian  wrote:
> 
> Allow CREATE INDEX to specify the GiST signature length and maximum 
> number of integer ranges (Nikita Glukhov)
> 

Should we specify which particular opclasses are affected?  Or at
least mention it affects core and particular contribs?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Fri, May  8, 2020 at 12:32:16AM +0900, Amit Langote wrote:
> On Thu, May 7, 2020 at 9:48 PM Bruce Momjian  wrote:
> > On Thu, May  7, 2020 at 12:06:33AM +0900, Amit Langote wrote:
> > > +Author: Etsuro Fujita 
> > > +2020-04-08 [c8434d64c] Allow partitionwise joins in more cases.
> > > +Author: Tom Lane 
> > > +2020-04-07 [981643dcd] Allow partitionwise join to handle nested FULL 
> > > JOIN USIN
> > > +-->
> > > +
> > > +
> > > +Allow partitionwise joins to happen in more cases (Ashutosh Bapat,
> > > Etsuro Fujita, Amit Langote)
> > > +
> > >
> > > Maybe it would be better to break this into two items, because while
> > > c8434d64c is significant new functionality that I only contributed a
> > > few review comments towards, 981643dcd is relatively minor surgery of
> >
> > What text would we use for the new item?  I thought FULL JOIN was just
> > another case that matched the description I had.
> 
> c8434d64c implements a new feature whereby, to use partitionwise join,
> partition bounds of the tables being joined no longer have to match
> exactly.  I think it might be better to mention this explicitly
> because it enables partitionwise joins to be used in more partitioning
> setups.

Well, the text says:

Allow partitionwise joins to happen in more cases (Ashutosh Bapat,
Etsuro Fujita, Amit Langote, Tom Lane)

Isn't that what you just said?  I just added this paragraph:

For example, partitionwise joins can now happen between partitioned
tables where the ancestors do not exactly match.

Does that help?

> 981643dcd fixes things so that 3-way and higher FULL JOINs can now be
> performed partitionwise.  I am okay with even omitting this if it
> doesn't sound big enough to be its own item.
> 
> > I think trying to put this all into one item is too complex, but I did
> > merge two of the items together, so we have two items now:
> >
> > 
> > 
> >
> > 
> > Allow partitioned tables to be replicated via publications (Amit 
> > Langote)
> > 
> >
> > 
> > Previously, partitions had to be replicated individually.  Now
> > partitioned tables can be published explicitly causing all 
> > partitions
> > to be automatically published.  Addition/removal of partitions from
> > partitioned tables are automatically added/removed on subscribers.
> > The CREATE PUBLICATION option publish_via_partition_root controls 
> > whether
> > partitioned tables are published as themselves or their ancestors.
> > 
> 
> Thanks.  Sounds good except I think the last sentence should read:
> 
> ...controls whether partition changes are published as their own or as
> their ancestor's.

OK, done.

> > 
> >
> > 
> > 
> >
> > 
> > Allow non-partitioned tables to be logically replicated to 
> > subscribers
> > that receive the rows into partitioned tables (Amit Langote)
> > 
> 
> Hmm, why it make it sound like this works only if the source table is
> non-partitioned?  The source table can be anything, a regular
> non-partitioned table, or a partitioned one.

Well, we already covered the publish partitioned case in the above item.

> How about:
> 
> Allow logical replication into partitioned tables on subscribers
> 
> Previously, it was allowed only into regular [ non-partitioned ] tables.

OK, I used this wording:

Allow logical replication into partitioned tables on subscribers (Amit
Langote)

Previously, subscribers could only receive rows into non-partitioned
tables.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Thu, May  7, 2020 at 11:30:40PM +0900, Amit Langote wrote:
> On Thu, May 7, 2020 at 11:12 PM Bruce Momjian  wrote:
> > Well, her name was there already for a later commit that was not
> > backpatched, so I just moved her name earlier.  The fact that her name
> > was moved earlier because of something that was backpatched is
> > inconsistent, but I don't know enough about the work that went into the
> > item to comment on that.  I will need someone to tell me, of the commits
> > that only appear in PG 13, what should be the name order.
> 
> Sorry, I misremembered that the patch to make default partition
> pruning more aggressive was not backpatched, because I thought at the
> time that the patch had turned somewhat complex, but indeed it was
> backpatched; in 11.5 release notes:
> 
>   Prune a partitioned table's default partition (that is, avoid
> uselessly scanning it) in more cases (Yuzuko Hosoya)
> 
> Sorry for the noise.
> 
> I think it's okay for her name to appear first even considering the
> commits that only appear in PG 13, because my role was mainly
> reviewing the work and perhaps posting an updated version of her
> patch.

OK, confirmed, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Should smgrdounlink() be removed?

2020-05-07 Thread Peter Geoghegan

On Thu, May 7, 2020 at 4:33 AM Michael Paquier  wrote:
> So this gives the attached.  Any thoughts?

That seems fine.

-- 
Peter Geoghegan

Re: tablespace_map code cleanup

2020-05-07 Thread Robert Haas

On Wed, May 6, 2020 at 11:15 AM Robert Haas  wrote:
> Oh, good point. v2 attached.

Here's v3, with one more small cleanup. I noticed tblspc_map_file is
initialized to NULL and then unconditionally reset to the return value
of makeStringInfo(), and then later tested to see whether it is NULL.
It can't be, because makeStringInfo() doesn't return NULL. So the
attached version deletes the superfluous initialization and the
superfluous test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v3-0001-Don-t-export-basebackup.c-s-sendTablespace.patch
Description: Binary data


v3-0002-Minor-code-cleanup-for-perform_base_backup.patch
Description: Binary data

Re: PG 13 release notes, first draft

2020-05-07 Thread Amit Langote

On Thu, May 7, 2020 at 9:48 PM Bruce Momjian  wrote:
> On Thu, May  7, 2020 at 12:06:33AM +0900, Amit Langote wrote:
> > +Author: Etsuro Fujita 
> > +2020-04-08 [c8434d64c] Allow partitionwise joins in more cases.
> > +Author: Tom Lane 
> > +2020-04-07 [981643dcd] Allow partitionwise join to handle nested FULL JOIN 
> > USIN
> > +-->
> > +
> > +
> > +Allow partitionwise joins to happen in more cases (Ashutosh Bapat,
> > Etsuro Fujita, Amit Langote)
> > +
> >
> > Maybe it would be better to break this into two items, because while
> > c8434d64c is significant new functionality that I only contributed a
> > few review comments towards, 981643dcd is relatively minor surgery of
>
> What text would we use for the new item?  I thought FULL JOIN was just
> another case that matched the description I had.

c8434d64c implements a new feature whereby, to use partitionwise join,
partition bounds of the tables being joined no longer have to match
exactly.  I think it might be better to mention this explicitly
because it enables partitionwise joins to be used in more partitioning
setups.

981643dcd fixes things so that 3-way and higher FULL JOINs can now be
performed partitionwise.  I am okay with even omitting this if it
doesn't sound big enough to be its own item.

> I think trying to put this all into one item is too complex, but I did
> merge two of the items together, so we have two items now:
>
> 
> 
>
> 
> Allow partitioned tables to be replicated via publications (Amit 
> Langote)
> 
>
> 
> Previously, partitions had to be replicated individually.  Now
> partitioned tables can be published explicitly causing all partitions
> to be automatically published.  Addition/removal of partitions from
> partitioned tables are automatically added/removed on subscribers.
> The CREATE PUBLICATION option publish_via_partition_root controls 
> whether
> partitioned tables are published as themselves or their ancestors.
> 

Thanks.  Sounds good except I think the last sentence should read:

...controls whether partition changes are published as their own or as
their ancestor's.

> 
>
> 
> 
>
> 
> Allow non-partitioned tables to be logically replicated to subscribers
> that receive the rows into partitioned tables (Amit Langote)
> 

Hmm, why it make it sound like this works only if the source table is
non-partitioned?  The source table can be anything, a regular
non-partitioned table, or a partitioned one.

How about:

Allow logical replication into partitioned tables on subscribers

Previously, it was allowed only into regular [ non-partitioned ] tables.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-05-07 Thread Justin Pryzby

Rebased onto 1ad23335f36b07f4574906a8dc66a3d62af7c40c
>From 698026f6365a324bf52c5985d549f71b52ada567 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 16 Mar 2020 14:12:55 -0500
Subject: [PATCH v17 01/10] Document historic behavior of links to
 directories..

Backpatch to 9.5: pg_stat_file
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d9b3598977..e0d1eff6b5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25861,7 +25861,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 Returns a record containing the file's size, last access time stamp,
 last modification time stamp, last file status change time stamp (Unix
 platforms only), file creation time stamp (Windows only), and a flag
-indicating if it is a directory.
+indicating if it is a directory (or a symbolic link to a directory).


 This function is restricted to superusers by default, but other users
-- 
2.17.0

>From ecd5654cb1b22254387d3584bbc28cb1af046a62 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 30 Mar 2020 18:59:16 -0500
Subject: [PATCH v17 02/10] pg_stat_file and pg_ls_dir_* to use lstat()..

pg_ls_dir_* will now skip (no longer show) symbolic links, same as other
non-regular file types, as we advertize we do since 8b6d94cf6.  That seems to
be the intented behavior, since irregular file types are 1) less portable; and,
2) we don't currently show a file's type except for "bool is_dir".

pg_stat_file will now 1) show metadata of links themselves, rather than their
target; and, 2) specifically, show links to directories with "is_dir=false";
and, 3) not error on broken symlinks.
---
 doc/src/sgml/func.sgml  | 2 +-
 src/backend/utils/adt/genfile.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e0d1eff6b5..d9b3598977 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25861,7 +25861,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 Returns a record containing the file's size, last access time stamp,
 last modification time stamp, last file status change time stamp (Unix
 platforms only), file creation time stamp (Windows only), and a flag
-indicating if it is a directory (or a symbolic link to a directory).
+indicating if it is a directory.


 This function is restricted to superusers by default, but other users
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index ceaa6180da..219ac160f8 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -370,7 +370,7 @@ pg_stat_file(PG_FUNCTION_ARGS)
 
 	filename = convert_and_check_filename(filename_t);
 
-	if (stat(filename, &fst) < 0)
+	if (lstat(filename, &fst) < 0)
 	{
 		if (missing_ok && errno == ENOENT)
 			PG_RETURN_NULL();
@@ -596,7 +596,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 
 		/* Get the file info */
 		snprintf(path, sizeof(path), "%s/%s", dir, de->d_name);
-		if (stat(path, &attrib) < 0)
+		if (lstat(path, &attrib) < 0)
 		{
 			/* Ignore concurrently-deleted files, else complain */
 			if (errno == ENOENT)
-- 
2.17.0

>From c5b407dd179019a9b79c29f921f1f2366c7d52ed Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 17 Mar 2020 13:16:24 -0500
Subject: [PATCH v17 03/10] Add tests on pg_ls_dir before changing it

---
 src/test/regress/expected/misc_functions.out | 18 ++
 src/test/regress/sql/misc_functions.sql  |  5 +
 2 files changed, 23 insertions(+)

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index d3acb98d04..2e87c548eb 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -201,6 +201,24 @@ select count(*) > 0 from
  t
 (1 row)
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+ name 
+--
+ .
+(1 row)
+
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+ name 
+--
+(0 rows)
+
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+ pg_ls_dir 
+---
+(0 rows)
+
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+ERROR:  could not open directory "does not exist": No such file or directory
 --
 -- Test adding a support function to a subject function
 --
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 094e8f8296..f6857ad177 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -60,6 +60,11 @@ select count(*) > 0 from
where spcname = 'pg_default') pts
   join p

Re: PG 13 release notes, first draft

2020-05-07 Thread Amit Langote

On Thu, May 7, 2020 at 11:12 PM Bruce Momjian  wrote:
> On Thu, May  7, 2020 at 09:49:49AM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > OK, I have moved her name to be first.  FYI, this commit was backpatched
> > > back through PG 11, though the commit message doesn't mention that.
> >
> > If it was back-patched, it should not be appearing in the v13 release
> > notes at all, surely?
>
> Well, her name was there already for a later commit that was not
> backpatched, so I just moved her name earlier.  The fact that her name
> was moved earlier because of something that was backpatched is
> inconsistent, but I don't know enough about the work that went into the
> item to comment on that.  I will need someone to tell me, of the commits
> that only appear in PG 13, what should be the name order.

Sorry, I misremembered that the patch to make default partition
pruning more aggressive was not backpatched, because I thought at the
time that the patch had turned somewhat complex, but indeed it was
backpatched; in 11.5 release notes:

  Prune a partitioned table's default partition (that is, avoid
uselessly scanning it) in more cases (Yuzuko Hosoya)

Sorry for the noise.

I think it's okay for her name to appear first even considering the
commits that only appear in PG 13, because my role was mainly
reviewing the work and perhaps posting an updated version of her
patch.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Thu, May  7, 2020 at 09:49:49AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > OK, I have moved her name to be first.  FYI, this commit was backpatched
> > back through PG 11, though the commit message doesn't mention that.
> 
> If it was back-patched, it should not be appearing in the v13 release
> notes at all, surely?

Well, her name was there already for a later commit that was not
backpatched, so I just moved her name earlier.  The fact that her name
was moved earlier because of something that was backpatched is
inconsistent, but I don't know enough about the work that went into the
item to comment on that.  I will need someone to tell me, of the commits
that only appear in PG 13, what should be the name order.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-07 Thread Tom Lane

Bruce Momjian  writes:
> On Thu, May  7, 2020 at 08:29:55AM +0200, Fabien COELHO wrote:
>> After looking again at the release notes, I do really think that significant
>> documentation changes do not belong to the "Source code" section but should
>> be in separate "Documentation" section, and that more items should be listed
>> there, because they represent a lot of not-so-fun work, especially Tom's
>> restructuration of tables, and possibly others.

> Uh, can someone else give an opinion on this?  I am not sure how hard or
> un-fun an item is should be used as criteria.

Historically we don't document documentation changes at all, do we?
It seems (a) pointless and (b) circular.

regards, tom lane

Re: PG 13 release notes, first draft

2020-05-07 Thread Tom Lane

Bruce Momjian  writes:
> OK, I have moved her name to be first.  FYI, this commit was backpatched
> back through PG 11, though the commit message doesn't mention that.

If it was back-patched, it should not be appearing in the v13 release
notes at all, surely?

regards, tom lane

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Wed, May  6, 2020 at 04:01:44PM -0400, Chapman Flack wrote:
> On 05/05/20 10:31, Bruce Momjian wrote:
> > On Tue, May  5, 2020 at 09:20:39PM +0800, John Naylor wrote:
> >> ... This patch is
> >> about the server encoding, which formerly needed to be utf-8 for
> >> non-ascii characters. (I think the client encoding doesn't matter as
> >> long as ascii bytes are represented.)
> >>
> >> +
> >> +The UTF-8 characters must be available in the server encoding.
> >> +
> >>
> >> Same here, s/UTF-8/Unicode/.
> > 
> > OK, new text is:
> > 
> > Allow Unicode escapes, e.g., E'\u', in clients that don't use UTF-8
> > encoding (Tom Lane)
> > 
> > The Unicode characters must be available in the server encoding.
> > 
> > I kept the "UTF-8 encoding" since that is the only Unicode encoding we
> > support.
> 
> My understanding also was that it matters little to this change what the
> /client's/ encoding is.
> 
> There used to be a limitation of the server's lexer that would reject
> Unicode escapes whenever the /server's/ encoding wasn't UTF-8 (even
> if the server's encoding contained the characters the escapes represent).
> I think that limitation is what was removed.
> 
> I don't think the client encoding comes into it at all. Sure, you could
> just include the characters literally if they are in the client encoding,
> but you might still choose to express them as escapes, and if you do they
> get passed that way to the server for interpretation.

Ah, very good point.  New text is:

Allow Unicode escapes, e.g., E'\u', in databases that do not
use UTF-8 encoding (Tom Lane)

The Unicode characters must be available in the database encoding.

> I had assumed the patch applied to all of the forms U&'\',
> U&'\+##', E'\u', and E'\U##' but I don't think I read
> the patch to be sure of that.

I am only using E'\u' as an example.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Wed, May  6, 2020 at 10:20:57PM -0700, Noah Misch wrote:
> On Wed, May 06, 2020 at 07:40:25PM -0400, Bruce Momjian wrote:
> > On Tue, May  5, 2020 at 11:39:10PM -0700, Noah Misch wrote:
> > > On Mon, May 04, 2020 at 11:16:00PM -0400, Bruce Momjian wrote:
> > > > Allow skipping of WAL for new tables and indexes if wal_level is 
> > > > 'minimal' (Noah Misch)
> > > 
> > > Kyotaro Horiguchi authored that one.  (I committed it.)  The commit 
> > > message
> > > noted characteristics, some of which may deserve mention in the notes:
> > 
> > Fixed.
> 
> I don't see that change pushed (but it's not urgent).

I got stuck on Amit's partition items and my head couldn't process any
more, so I went to bed, and just committed it now.  I was afraid to have
pending stuff uncommitted, but I am also hesitant to do a commit for
each change.

> > > - Crash recovery was losing tuples written via COPY TO.  This fixes the 
> > > bug.
> > 
> > This was not backpatched?
> 
> Right.

Oh.  So you are saying we could lose COPY data on a crash, even after a
commit.  That seems bad.  Can you show me the commit info?  I can't find
it.

> > > - Out-of-tree table access methods will require changes.
> > 
> > Uh, I don't think we mention those.
> 
> Okay.  This point is relatively-important.  On the other hand, the table
> access methods known to me have maintainers who follow -hackers.  They may
> learn that way.

That was my thought.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

[PATCH] fix GIN index search sometimes losing results

2020-05-07 Thread Pavel Borisov

Hi, all

It appeared than GIN index sometimes lose results if simultaneously:


1 if query operand contains weight marks

2 if weight-marked operand is negated by ! operator

3 if there are only logical (not phrase) operators from this negation
towards the root of query tree.


e.g. '!crew:A'::tsquery refuse to find 'crew:BCD'::tsvector


Seems it is in all versions of PG.


The patch is intended to deal with the issue. Also it contains tests for
these rare condition.


Pavel Borisov.


gin-weights-patch-v3.diff
Description: Binary data

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-05-07 Thread Tom Lane

Amit Kapila  writes:
> On Thu, May 7, 2020 at 11:57 AM Tom Lane  wrote:
>> Monday is a back-branch release wrap day.

> How can I get the information about release wrap day?  The minor
> release dates are mentioned on the website [1], but this information
> is not available.  Do we keep it some-fixed number of days before
> minor release?

Yes, we've been using the same release schedule for years.  The
actual tarball wrap is always on a Monday --- if I'm doing it,
as is usually the case, I try to get it done circa 2100-2300 UTC.
There's a "quiet period" where we discourage nonessential commits
both before (starting perhaps on the Saturday) and after (until
the releases are tagged in git, about 24 hours after wrap).
The delay till public announcement on the Thursday is so the
packagers can produce their builds.  Likewise, the reason for
a wrap-to-tag delay is in case the packagers find anything that
forces a re-wrap, which has happened a few times.

regards, tom lane

Re: min_wal_size > max_wal_size is accepted

2020-05-07 Thread Marc Rechté


Hi,

Le jeu. 7 mai 2020 à 11:13, Marc Rechté > a écrit :


Hello,

It is possible to startup an instance with min > max, without the
system
complaining:

mrechte=# show min_wal_size ;

2020-05-07 11:12:11.422 CEST [11098] LOG:  durée : 0.279 ms

   min_wal_size

--

   128MB

(1 ligne)



mrechte=# show max_wal_size ;

2020-05-07 11:12:12.814 CEST [11098] LOG:  durée : 0.275 ms

   max_wal_size

--

   64MB

(1 ligne)


This could be an issue ?


I don't see how this could be an issue. You'll get a checkpoint every 
time 64MB have been written before checkpoint_timeout kicked in. And WAL 
files will be removed if you have more than 128MB of them.


Not the smartest configuration, but not a damaging one either.


--
Guillaume.


I have some doubts when I see such code in 
backend/access/transam/xlog.c:2334




if (recycleSegNo < minSegNo)

recycleSegNo = minSegNo;

if (recycleSegNo > maxSegNo)

recycleSegNo = maxSegNo;

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Thu, May  7, 2020 at 11:25:47AM +0500, Andrey M. Borodin wrote:
> 
> 
> > 7 мая 2020 г., в 04:35, Bruce Momjian  написал(а):
> > 
> > On Wed, May  6, 2020 at 11:17:54AM +0500, Andrey M. Borodin wrote:
> >> 
> >> 
> >>> 5 мая 2020 г., в 08:16, Bruce Momjian  написал(а):
> >>> 
> >>> I have committed the first draft of the PG 13 release notes.  You can
> >>> see them here:
> >>> 
> >>>   https://momjian.us/pgsql_docs/release-13.html
> >>> 
> >>> It still needs markup, word wrap, and indenting.  The community doc
> >>> build should happen in a few hours.
> >> 
> >> I'm not sure, but probably it worth mentioning in "General performance" 
> >> section that TOAST (and everything pglz-compressed) decompression should 
> >> be significantly faster in v13.
> >> https://github.com/postgres/postgres/commit/c60e520f6e0e8db9618cad042df071a6752f3c06
> > 
> > OK, I read the thread mentioned in the commit message and I now see the
> > value of this change.  Attached is the release note diff.  Let me know
> > if it needs improvement.
> 
> There is one minor typo retrievel -> retrieval.
> Thanks!

Got it, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Wed, May  6, 2020 at 07:31:14PM -0500, Justin Pryzby wrote:
> On Wed, May 06, 2020 at 07:35:34PM -0400, Bruce Momjian wrote:
> > On Wed, May  6, 2020 at 11:17:54AM +0500, Andrey M. Borodin wrote:
> > > I'm not sure, but probably it worth mentioning in "General performance" 
> > > section that TOAST (and everything pglz-compressed) decompression should 
> > > be significantly faster in v13.
> > > https://github.com/postgres/postgres/commit/c60e520f6e0e8db9618cad042df071a6752f3c06
> > 
> > OK, I read the thread mentioned in the commit message and I now see the
> > value of this change.  Attached is the release note diff.  Let me know
> > if it needs improvement.
> 
> Sorry I didn't see it earlier, but:
> 
> > -Improve retrieving of only the leading bytes of TOAST values (Binguo Bao)
> > +Improve speed of TOAST decompression and the retrievel of only the leading 
> > bytes of TOAST values (Binguo Bao, Andrey Borodin)
> 
> retrieval
> 
> I will include this with my running doc patch if you don't want to make a
> separate commit.

Fixed.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: min_wal_size > max_wal_size is accepted

2020-05-07 Thread Guillaume Lelarge

Hi,

Le jeu. 7 mai 2020 à 11:13, Marc Rechté  a écrit :

> Hello,
>
> It is possible to startup an instance with min > max, without the system
> complaining:
>
> mrechte=# show min_wal_size ;
>
> 2020-05-07 11:12:11.422 CEST [11098] LOG:  durée : 0.279 ms
>
>   min_wal_size
>
> --
>
>   128MB
>
> (1 ligne)
>
>
>
> mrechte=# show max_wal_size ;
>
> 2020-05-07 11:12:12.814 CEST [11098] LOG:  durée : 0.275 ms
>
>   max_wal_size
>
> --
>
>   64MB
>
> (1 ligne)
>
>
> This could be an issue ?
>
>
I don't see how this could be an issue. You'll get a checkpoint every time
64MB have been written before checkpoint_timeout kicked in. And WAL files
will be removed if you have more than 128MB of them.

Not the smartest configuration, but not a damaging one either.


-- 
Guillaume.

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Thu, May  7, 2020 at 08:29:55AM +0200, Fabien COELHO wrote:
> 
> Hello Bruce,
> 
> > > > > * "DOCUMENT THE DEFAULT GENERATION METHOD"
> > > > >   => The default is still to generate data client-side.
> > > > 
> > > > My point is that the docs are not clear about this.
> > > 
> > > Indeed.
> > > 
> > > > Can you fix it?
> > > 
> > > Sure. Attached patch adds an explicit sentence about it, as it was only
> > > hinted about in the default initialization command string, and removes a
> > > spurious empty paragraph found nearby.
> > 
> > Thanks, patch applied.
> 
> Ok.
> 
> You might remove the "DOCUMENT THE DEFAULT…" in the release note.

Oh, yes, of course.

> I'm wondering about the commit comment: "Reported-by: Fabien COELHO",
> actually you reported it, not me!

Uh, kind of, yeah, but via email, you did.  ;-)

> After looking again at the release notes, I do really think that significant
> documentation changes do not belong to the "Source code" section but should
> be in separate "Documentation" section, and that more items should be listed
> there, because they represent a lot of not-so-fun work, especially Tom's
> restructuration of tables, and possibly others.

Uh, can someone else give an opinion on this?  I am not sure how hard or
un-fun an item is should be used as criteria.

> About pgbench, ISTM that d37ddb745be07502814635585cbf935363c8a33d is worth
> mentionning because it is a user-visible change.

Uh, that is not usually something I mention because, like error message
changes, it is nice, but few people need to know about it before they
see it.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Thu, May  7, 2020 at 08:48:49AM -0400, Bruce Momjian wrote:
> I think trying to put this all into one item is too complex, but I did
> merge two of the items together, so we have two items now:

I ended up adjusting the wording again, so please review the commit or
the website:

https://momjian.us/pgsql_docs/release-13.html

Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: PG 13 release notes, first draft

2020-05-07 Thread Bruce Momjian

On Thu, May  7, 2020 at 12:06:33AM +0900, Amit Langote wrote:
> Hi Bruce,
> 
> On Tue, May 5, 2020 at 12:16 PM Bruce Momjian  wrote:
> >
> > I have committed the first draft of the PG 13 release notes.  You can
> > see them here:
> >
> > https://momjian.us/pgsql_docs/release-13.html
> >
> > It still needs markup, word wrap, and indenting.  The community doc
> > build should happen in a few hours.
> 
> Thanks for this as always.
> 
> +Author: Alvaro Herrera 
> +2019-08-07 [4e85642d9] Apply constraint exclusion more generally in 
> partitionin
> +Author: Alvaro Herrera 
> +2019-08-13 [815ef2f56] Don't constraint-exclude partitioned tables as much
> +-->
> +
> +
> +Improve cases where pruning of partitions can happen (Amit Langote,
> Yuzuko Hosoya, Álvaro Herrera)
> +
> 
> The following commit should be included with this item:
> 
> commit 489247b0e615592111226297a0564e11616361a5
> Author: Alvaro Herrera 
> Date:   Sun Aug 4 11:18:45 2019 -0400
> 
> Improve pruning of a default partition
> 
> Primary author for this commit and the person who raised various
> problems that led to these improvements is Yuzuko Hosoya. So I think
> her name should go first.

OK, I have moved her name to be first.  FYI, this commit was backpatched
back through PG 11, though the commit message doesn't mention that.

commit 8654407148
Author: Alvaro Herrera 
Date:   Sun Aug 4 11:18:45 2019 -0400

Improve pruning of a default partition

When querying a partitioned table containing a default partition, we
were wrongly deciding to include it in the scan too early in the
process, failing to exclude it in some cases.  If we reinterpret the
PruneStepResult.scan_default flag slightly, we can do a better job 
at
detecting that it can be excluded.  The change is that we avoid 
setting
the flag for that pruning step unless the step absolutely requires 
the
default partition to be scanned (in contrast with the previous
arrangement, which was to set it unless the step was able to prune 
it).
So get_matching_partitions() must explicitly check the partition 
that
each returned bound value corresponds to in order to determine 
whether
the default one needs to be included, rather than relying on the 
flag
from the final step result.

Author: Yuzuko Hosoya 
Reviewed-by: Amit Langote 
Discussion: 
https://postgr.es/m/00e601d4ca86$932b8bc0$b982a340$@lab.ntt.co.jp

FYI, I don't see backpatched commits when creating the release notes.

> +Author: Etsuro Fujita 
> +2020-04-08 [c8434d64c] Allow partitionwise joins in more cases.
> +Author: Tom Lane 
> +2020-04-07 [981643dcd] Allow partitionwise join to handle nested FULL JOIN 
> USIN
> +-->
> +
> +
> +Allow partitionwise joins to happen in more cases (Ashutosh Bapat,
> Etsuro Fujita, Amit Langote)
> +
> 
> Maybe it would be better to break this into two items, because while
> c8434d64c is significant new functionality that I only contributed a
> few review comments towards, 981643dcd is relatively minor surgery of

What text would we use for the new item?  I thought FULL JOIN was just
another case that matched the description I had.

> partitionwise join code to handle FULL JOINs correctly.  Tom's rewrite
> of my patch for the latter was pretty significant too, so maybe better
> to list his name as well.

OK, I have added Tom's name.

> +
> +
> +
> +Allow logical replication to replicate partitioned tables (Amit Langote)
> +
> +
> +
> +
> +
> +
> +
> +
> +Allow partitioned tables to be added to replicated publications (Amit 
> Langote)
> +
> +
> +
> +Partition additions/removals are replicated as well.  Previously,
> partitions had to be replicated individually.  HOW IS THIS DIFFERENT
> FROM THE ITEM ABOVE?
> +
> 
> The former is subscription-side new functionality and the latter is
> publication-side and the two are somewhat independent.
> 
> Till now, logical replication could only occur between relkind 'r'
> relations. So the only way to keep a given partitioned table in sync
> on the two servers was to manually add the leaf partitions (of relkind
> 'r') to a publication and also manually keep the list of replicated
> tables up to date as partitions come and go, that is, by
> adding/removing them to/from the publication.
> 
> 17b9e7f9f (the second item) makes it possible for the partitioned
> table (relkind 'p') to be added to the publication so that individual
> leaf partitions need not be manually kept in the publication.
> Replication still flows between the leaf partitions (relkind 'r'
> relations) though.
> 
> f1ac27bfd (the first item) makes is possible to replicate from a
> regular table (relkind 'r') into a partitioned table (relkind 'p').
> If a given row is replicated into a partitioned table, the
> subscription worker will route it to the corre

Re: Why are wait events not reported even though it reads/writes a timeline history file?

2020-05-07 Thread Michael Paquier

On Thu, May 07, 2020 at 04:51:16PM +0900, Fujii Masao wrote:
> Yeah, so I updated the patch so that ferror() is called only
> when fgets() returns NULL. Attached is the updated version of
> the patch.

Thanks for the new patch.  LGTM.
--
Michael


signature.asc
Description: PGP signature

Re: do {} while (0) nitpick

2020-05-07 Thread Andrew Dunstan



On 5/6/20 6:39 PM, David Steele wrote:
> On 5/6/20 6:28 PM, Andrew Dunstan wrote:
>> On 5/6/20 3:24 PM, Tom Lane wrote:
>>
>>> BTW, I looked around and could not find a package-provided ppport.h
>>> at all on my Red Hat systems.  What package is it in?
>>
>> perl-Devel-PPPort contains a perl module that will write the file for
>> you like this:
>>
>>  perl -MDevel::PPPort -e 'Devel::PPPort::WriteFile();'
>
> FWIW, pgBackRest always shipped with the newest version of ppport.h we
> were able to generate. This never caused any issues, but neither did
> we have problems that forced us to upgrade.
>
> The documentation seems to encourage this behavior:
>
> Don't direct the users of your module to download Devel::PPPort . They
> are most probably no XS writers. Also, don't make ppport.h optional.
> Rather, just take the most recent copy of ppport.h that you can find
> (e.g. by generating it with the latest Devel::PPPort release from
> CPAN), copy it into your project, adjust your project to use it, and
> distribute the header along with your module.
>
>


I don't think we need to keep updating it, though. plperl is essentially
pretty stable.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Postgres Windows build system doesn't work with python installed in Program Files

2020-05-07 Thread Ranier Vilela

Em qui., 7 de mai. de 2020 às 02:04, Victor Wagner 
escreveu:

> В Thu, 7 May 2020 09:14:33 +0900
> Michael Paquier  пишет:
>
> > On Wed, May 06, 2020 at 03:58:15PM -0300, Ranier Vilela wrote:
> > > Hacking pgbison.pl, to print PATH, shows that the path inside
> > > pgbison.pl, returned to being the original, without the addition of
> > > c:\perl\bin;c:\bin. my $out = $ENV{PATH};
> > > print "Path after system call=$out\n";
> > > Path after system
> > > call=...C:\Users\ranier\AppData\Local\Microsoft\WindowsApps;;
> > > The final part lacks: c:\perl\bin;c:\bin
> > >
> > > Now I need to find out why the path is being reset, within the perl
> > > scripts.
> >
> > FWIW, we have a buildfarm animal called drongo that runs with VS 2019,
> > that uses Python, and that is now happy.  One of my own machines uses
> > VS 2019 as well and I have yet to see what you are describing here.
> > Perhaps that's related to a difference in the version of perl you are
> > using and the version of that any others?
>
>
> I doubt so. I have different machines with perl from 5.22 to 5.30 but
> none of tham exibits such weird behavoir.
>
The perl is the same,when it was working ok.


>
> Perhaps problem is that Ranier calls vcvars64.bat from the menu, and
> then calls msbuild such way that is becames unrelated process.
>
It also worked previously, using this same process, link menu and manual
path configuration.
What has changed:
1 In the environment, the python installation, which added entries to the
path.
2. Perl scripts: Use perl's $/ more idiomatically
commit beb2516e961490723fb1a2f193406afb3d71ea9c
3. Msbuild and others, have been updated.They are not the same ones that
were working before.


>
> Obvoisly buildfarm animal doesn't use menu and then starts build
> process from same CMD.EXE process, that it called vcvarsall.but into.
>
> It is same in all OSes - Windows, *nix and even MS-DOS - there is no
> way to change environment of parent process. You can change environment
> of current process (and if this process is command interpreter you can
> do so by sourcing script into it. In windows this misleadingly called
> 'CALL', but it executes commands from command file in the current
> shell, not in subshell) you can pass enivronment to the child
> processes. But you can never affect environment of the parent or
> sibling process.
>
Maybe that's what is happening, calling system, perl or msbuild, would be
creating a new environment, transferring the path that is configured in
Windows, and not the path that is in the environment that was manually
configured.


>
> The only exception is - if you know that some process would at startup
> read environment vars from some file or registry, you can modify this
> source in unrelated process.
>
> So, if you want perl in path of msbuild, started from Visual Studio,
> you should either first set path in CMD.EXE, then type command to start
> Studio from this very  command window, or set path via control panel
> dialog (which modified registry). Later is what I usially do on machines
> wher I compile postgres.
>
buidfarm aninal, uses a more secure and reliable process, the path is
already configured and does not change.
Perhaps this is the way for me and for others.

It would then remain to document, to warn that to work correctly, the path
must be configured before entering the compilation environment.

regards,
Ranier Vilela

Re: Should smgrdounlink() be removed?

2020-05-07 Thread Michael Paquier

On Thu, May 07, 2020 at 04:57:00PM +0900, Fujii Masao wrote:
> On 2020/05/07 13:49, Michael Paquier wrote:
>> On Thu, May 07, 2020 at 09:48:35AM +0530, vignesh C wrote:
>> > I could not find any code reference to smgrdounlink, I feel it can be
>> > removed.
>> 
>> The last use of smgrdounlink() was b416691.  I have just looked at
>> Debian Code Search and github, and could not find a hit with the
>> function being used in some custom extension code, so it feels like a
>> safe bet to remove it.
> 
> +1

So this gives the attached.  Any thoughts?
--
Michael
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index bb8428f27f..6566659593 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -88,7 +88,6 @@ extern void smgrclose(SMgrRelation reln);
 extern void smgrcloseall(void);
 extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
-extern void smgrdounlink(SMgrRelation reln, bool isRedo);
 extern void smgrdosyncall(SMgrRelation *rels, int nrels);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index b053a4dc76..e6fa2148fc 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -335,59 +335,6 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	smgrsw[reln->smgr_which].smgr_create(reln, forknum, isRedo);
 }
 
-/*
- *	smgrdounlink() -- Immediately unlink all forks of a relation.
- *
- *		All forks of the relation are removed from the store.  This should
- *		not be used during transactional operations, since it can't be undone.
- *
- *		If isRedo is true, it is okay for the underlying file(s) to be gone
- *		already.
- */
-void
-smgrdounlink(SMgrRelation reln, bool isRedo)
-{
-	RelFileNodeBackend rnode = reln->smgr_rnode;
-	int			which = reln->smgr_which;
-	ForkNumber	forknum;
-
-	/* Close the forks at smgr level */
-	for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
-		smgrsw[which].smgr_close(reln, forknum);
-
-	/*
-	 * Get rid of any remaining buffers for the relation.  bufmgr will just
-	 * drop them without bothering to write the contents.
-	 */
-	DropRelFileNodesAllBuffers(&rnode, 1);
-
-	/*
-	 * It'd be nice to tell the stats collector to forget it immediately, too.
-	 * But we can't because we don't know the OID (and in cases involving
-	 * relfilenode swaps, it's not always clear which table OID to forget,
-	 * anyway).
-	 */
-
-	/*
-	 * Send a shared-inval message to force other backends to close any
-	 * dangling smgr references they may have for this rel.  We should do this
-	 * before starting the actual unlinking, in case we fail partway through
-	 * that step.  Note that the sinval message will eventually come back to
-	 * this backend, too, and thereby provide a backstop that we closed our
-	 * own smgr rel.
-	 */
-	CacheInvalidateSmgr(rnode);
-
-	/*
-	 * Delete the physical file(s).
-	 *
-	 * Note: smgr_unlink must treat deletion failure as a WARNING, not an
-	 * ERROR, because we've already decided to commit or abort the current
-	 * xact.
-	 */
-	smgrsw[which].smgr_unlink(rnode, InvalidForkNumber, isRedo);
-}
-
 /*
  *	smgrdosyncall() -- Immediately sync all forks of all given relations
  *
@@ -432,9 +379,6 @@ smgrdosyncall(SMgrRelation *rels, int nrels)
  *
  *		If isRedo is true, it is okay for the underlying file(s) to be gone
  *		already.
- *
- *		This is equivalent to calling smgrdounlink for each relation, but it's
- *		significantly quicker so should be preferred when possible.
  */
 void
 smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo)


signature.asc
Description: PGP signature

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

2020-05-07 Thread Ashutosh Bapat

Hi Andy,
Sorry for delay in review. Your earlier patches are very large and it
requires some time to review those. I didn't finish reviewing those
but here are whatever comments I have till now on the previous set of
patches. Please see if any of those are useful to the new set.


+/*
+ * Return true iff there is an equal member in target for every
+ * member in members

Suggest reword: return true iff every entry in "members" list is also present
in the "target" list. This function doesn't care about multi-sets, so please
mention that in the prologue clearly.

+
+if (root->parse->hasTargetSRFs)
+return;

Why? A relation's uniqueness may be useful well before we work on SRFs.

+
+if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+/*
+ * Set UniqueKey on member rel is useless, we have to recompute it at
+ * upper level, see populate_partitionedrel_uniquekeys for reference
+ */
+return;

Handling these here might help in bottom up approach. We annotate each
partition here and then annotate partitioned table based on the individual
partitions. Same approach can be used for partitioned join produced by
partitionwise join.

+/*
+ * If the unique index doesn't contain partkey, then it is unique
+ * on this partition only, so it is useless for us.
+ */

Not really. It could help partitionwise join.

+
+/* Now we have the unique index list which as exactly same on all
childrels,
+ * Set the UniqueIndex just like it is non-partition table
+ */

I think it's better to annotate each partition with whatever unique index it
has whether or not global. That will help partitionwise join, partitionwise
aggregate/group etc.

+/* A Normal group by without grouping set */
+if (parse->groupClause)
+add_uniquekey_from_sortgroups(root,
+  grouprel,
+  root->parse->groupClause);

Those keys which are part of groupClause and also form unique keys in the input
relation, should be recorded as unique keys in group rel. Knowing the minimal
set of keys allows more optimizations.

+
+foreach(lc,  unionrel->reltarget->exprs)
+{
+exprs = lappend(exprs, lfirst(lc));
+colnos = lappend_int(colnos, i);
+i++;
+}

This should be only possible when it's not UNION ALL. We should add some assert
or protection for that.

+
+/* Fast path */
+if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+return;
+
+outer_is_onerow = relation_is_onerow(outerrel);
+inner_is_onerow = relation_is_onerow(innerrel);
+
+outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+   restrictlist, jointype);

Something similar happens in select_mergejoin_clauses(). At least from the
first reading of this patch, I get an impression that all these functions which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information useful to
the optimizer.

+
+
+if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+{
+foreach(lc, innerrel_ukey_ctx)
+{
+UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+{
+/* The UniqueKey on baserel is not useful on the joinrel */

A joining relation need not be a base rel always, it could be a join rel as
well.

+ctx->useful = false;
+continue;
+}
+if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+{
+/* Change the multi_nullvals to true at this case */

Need a comment explaining this. Generally, I feel, this and other functions in
this file need good comments explaining the logic esp. "why" instead of "what".

+else if (inner_is_onerow)
+{
+/* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+ * the join result will be onerow also as well. Note:
onerow implies
+ * multi_nullvals = false.

I don't understand this comment. Why is there only one row in the other
relation which can join to this row?

+}
+/*
+ * Calculate max_colno in subquery. In fact we can check this with
+ * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+ * is not set on UPPERREL_FINAL relation, so do it this way
+ */


Should/can we use the same logic to convert an expression in the subquery into
a Var of the outer query as in convert_subquery_pathkeys(). If the subquery
doesn't have a reltarget

Re: [Proposal] Global temporary tables

2020-05-07 Thread 曾文旌



> 2020年4月29日 下午7:46，tushar  写道：
> 
> On 4/29/20 8:52 AM, 曾文旌 wrote:
>> Fixed the error message to make the expression more accurate. In v33.
> 
> Thanks wenjing
> 
> Please refer this scenario  , where getting an error while performing cluster 
> o/p
> 
> 1)
> 
> X terminal -
> 
> postgres=# create global temp table f(n int);
> CREATE TABLE
> 
> Y Terminal -
> 
> postgres=# create index index12 on f(n);
> CREATE INDEX
> postgres=# \q
> 
> X terminal -
> 
> postgres=# reindex index  index12;
> REINDEX
> postgres=#  cluster f using index12;
> ERROR:  cannot cluster on invalid index "index12"
> postgres=# drop index index12;
> DROP INDEX
> 
> if this is an expected  , could we try  to make the error message more 
> simpler, if possible.
> 
> Another issue  -
> 
> X terminal -
> 
> postgres=# create global temp table f11(n int);
> CREATE TABLE
> postgres=# create index ind1 on f11(n);
> CREATE INDEX
> postgres=# create index ind2 on f11(n);
> CREATE INDEX
> postgres=#
> 
> Y terminal -
> 
> postgres=# drop table f11;
> ERROR:  cannot drop index ind2 or global temporary table f11
> HINT:  Because the index is created on the global temporary table and other 
> backend attached it.
> postgres=#
> 
> it is only mentioning about ind2 index but what about ind1 and what if  - 
> they have lots of indexes ?
> i  think - we should not mix index information while dropping the table and 
> vice versa.
postgres=# drop index index12;
ERROR:  cannot drop index index12 or global temporary table f
HINT:  Because the index is created on the global temporary table and other 
backend attached it.

postgres=# drop table f;
ERROR:  cannot drop index index12 or global temporary table f
HINT:  Because the index is created on the global temporary table and other 
backend attached it.
postgres=#

Dropping an index on a GTT and dropping a GTT with an index can both trigger 
this message, so the message looks like this, and it feels like there's no 
better way to do it.



Wenjing



> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company



smime.p7s
Description: S/MIME cryptographic signature

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-05-07 Thread Amit Kapila

On Thu, May 7, 2020 at 11:57 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > Thanks, Juan and Davinder for verifying the latest patches. I think
> > this patch is ready to commit unless someone else has any comments.  I
> > will commit and backpatch this early next week (probably on Monday)
> > unless I see more comments.
>
> Monday is a back-branch release wrap day.
>

How can I get the information about release wrap day?  The minor
release dates are mentioned on the website [1], but this information
is not available.  Do we keep it some-fixed number of days before
minor release?


[1] - https://www.postgresql.org/developer/roadmap/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Fix pg_buffercache document

2020-05-07 Thread Amit Kapila

On Thu, May 7, 2020 at 2:53 PM Masahiko Sawada
 wrote:
>
> On Thu, 7 May 2020 at 18:12, Amit Kapila  wrote:
> >
> > On Thu, May 7, 2020 at 2:23 PM Masahiko Sawada
> >  wrote:
> > >
> > > Hi,
> > >
> > > The following description in pg_buffercace is no longer true.
> > >
> > > When the pg_buffercache view is accessed, internal buffer manager
> > > locks are taken for long enough to copy all the buffer state data that
> > > the view will display. This ensures that the view produces a
> > > consistent set of results, while not blocking normal buffer activity
> > > longer than necessary. Nonetheless there could be some impact on
> > > database performance if this view is read often.
> > >
> > > We changed pg_buffercache_page so that it doesn't take buffer manager
> > > locks in commit 6e654546fb6. Therefore from version 10,
> > > pg_buffercache_page has less impact on normal buffer activity less but
> > > might not return a consistent snapshot across all buffers instead.
> > >
> >
> > +1.
> >
> > There is a typo in the patch (queris/queries).  How about if slightly
> > reword it as "Since buffer manager locks are not taken to copy the
> > buffer state data that the view will display, accessing
> > pg_buffercache view has less impact on normal
> > buffer activity but it doesn't provide a consistent set of results
> > across all buffers.  However, we ensure that the information of each
> > buffer is self-consistent."?
>
> Thank you for your idea. Agreed.
>
> Attached the updated version patch.
>

LGTM.  I will commit this tomorrow unless there are more comments.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Fix pg_buffercache document

2020-05-07 Thread Masahiko Sawada

On Thu, 7 May 2020 at 18:12, Amit Kapila  wrote:
>
> On Thu, May 7, 2020 at 2:23 PM Masahiko Sawada
>  wrote:
> >
> > Hi,
> >
> > The following description in pg_buffercace is no longer true.
> >
> > When the pg_buffercache view is accessed, internal buffer manager
> > locks are taken for long enough to copy all the buffer state data that
> > the view will display. This ensures that the view produces a
> > consistent set of results, while not blocking normal buffer activity
> > longer than necessary. Nonetheless there could be some impact on
> > database performance if this view is read often.
> >
> > We changed pg_buffercache_page so that it doesn't take buffer manager
> > locks in commit 6e654546fb6. Therefore from version 10,
> > pg_buffercache_page has less impact on normal buffer activity less but
> > might not return a consistent snapshot across all buffers instead.
> >
>
> +1.
>
> There is a typo in the patch (queris/queries).  How about if slightly
> reword it as "Since buffer manager locks are not taken to copy the
> buffer state data that the view will display, accessing
> pg_buffercache view has less impact on normal
> buffer activity but it doesn't provide a consistent set of results
> across all buffers.  However, we ensure that the information of each
> buffer is self-consistent."?

Thank you for your idea. Agreed.

Attached the updated version patch.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pg_buffercache_doc_v2.patch
Description: Binary data

min_wal_size > max_wal_size is accepted

2020-05-07 Thread Marc Rechté


Hello,

It is possible to startup an instance with min > max, without the system 
complaining:


mrechte=# show min_wal_size ;

2020-05-07 11:12:11.422 CEST [11098] LOG:  durée : 0.279 ms

 min_wal_size

--

 128MB

(1 ligne)



mrechte=# show max_wal_size ;

2020-05-07 11:12:12.814 CEST [11098] LOG:  durée : 0.275 ms

 max_wal_size

--

 64MB

(1 ligne)


This could be an issue ?

Re: Fix pg_buffercache document

2020-05-07 Thread Amit Kapila

On Thu, May 7, 2020 at 2:23 PM Masahiko Sawada
 wrote:
>
> Hi,
>
> The following description in pg_buffercace is no longer true.
>
> When the pg_buffercache view is accessed, internal buffer manager
> locks are taken for long enough to copy all the buffer state data that
> the view will display. This ensures that the view produces a
> consistent set of results, while not blocking normal buffer activity
> longer than necessary. Nonetheless there could be some impact on
> database performance if this view is read often.
>
> We changed pg_buffercache_page so that it doesn't take buffer manager
> locks in commit 6e654546fb6. Therefore from version 10,
> pg_buffercache_page has less impact on normal buffer activity less but
> might not return a consistent snapshot across all buffers instead.
>

+1.

There is a typo in the patch (queris/queries).  How about if slightly
reword it as "Since buffer manager locks are not taken to copy the
buffer state data that the view will display, accessing
pg_buffercache view has less impact on normal
buffer activity but it doesn't provide a consistent set of results
across all buffers.  However, we ensure that the information of each
buffer is self-consistent."?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.

2020-05-07 Thread Amit Kapila

On Thu, May 7, 2020 at 12:13 PM Fujii Masao  wrote:
>
> On 2020/05/02 20:40, Amit Kapila wrote:
> >
> > I don't see any obvious problem with the changed code but we normally
> > don't backpatch performance improvements.  I can see that the code
> > change here appears to be straight forward so it might be fine to
> > backpatch this.  Have we seen similar reports earlier as well?  AFAIK,
> > this functionality is for a long time and if people were facing this
> > on a regular basis then we would have seen such reports multiple
> > times.  I mean to say if the chances of this hitting are less then we
> > can even choose not to backpatch this.
>
> I found the following two reports. ISTM there are not so many reports...
> https://www.postgresql.org/message-id/16159-f5a34a3a04dc6...@postgresql.org
> https://www.postgresql.org/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru
>

The first seems to be the same where this bug has been fixed.  It has
been moved to hackers in email [1].  Am, I missing something?
Considering it has been encountered by two different people, I think
it would not be a bad idea to back-patch this.

[1] - 
https://www.postgresql.org/message-id/20200129.120222.1476610231001551715.horikyota.ntt%40gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Fix pg_buffercache document

2020-05-07 Thread Masahiko Sawada

Hi,

The following description in pg_buffercace is no longer true.

When the pg_buffercache view is accessed, internal buffer manager
locks are taken for long enough to copy all the buffer state data that
the view will display. This ensures that the view produces a
consistent set of results, while not blocking normal buffer activity
longer than necessary. Nonetheless there could be some impact on
database performance if this view is read often.

We changed pg_buffercache_page so that it doesn't take buffer manager
locks in commit 6e654546fb6. Therefore from version 10,
pg_buffercache_page has less impact on normal buffer activity less but
might not return a consistent snapshot across all buffers instead.

I've attached a patch to fix the documentation.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pg_buffercache_doc.patch
Description: Binary data

Re: PG compilation error with Visual Studio 2015/2017/2019

2020-05-07 Thread Amit Kapila

On Thu, May 7, 2020 at 11:57 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > Thanks, Juan and Davinder for verifying the latest patches. I think
> > this patch is ready to commit unless someone else has any comments.  I
> > will commit and backpatch this early next week (probably on Monday)
> > unless I see more comments.
>
> Monday is a back-branch release wrap day.  If you push a back-patched
> change on that day (or immediately before it), it had better be a security
> fix.
>

Okay.  I'll wait in that case and will push it after that.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Feedback on table expansion hook (including patch)

2020-05-07 Thread Hans-Jürgen Schönig (PostgreSQL)

that sounds really really useful.
i can see a ton of use cases for that.
we also toyed with the idea recently of having pluggable FSM strategies.
that one could be quite useful as well.

regards,

hans


> On 07.05.2020, at 10:11, Erik Nordström  wrote:
> 
> Hi,
> 
> I am looking for feedback on the possibility of adding a table expansion hook 
> to PostgreSQL (see attached patch). The motivation for this is to allow 
> extensions to optimize table expansion. In particular, TimescaleDB does its 
> own table expansion in order to apply a number of optimizations, including 
> partition pruning (note that TimescaleDB uses inheritance since PostgreSQL 
> 9.6 rather than declarative partitioning ). There's currently no official 
> hook for table expansion, but TimescaleDB has been using the 
> get_relation_info hook for this purpose. Unfortunately, PostgreSQL 12 broke 
> this for us since it moved expansion to a later stage where we can no longer 
> control it without some pretty bad hacks. Given that PostgreSQL 12 changed 
> the expansion state of a table for the get_relation_info hook, we are 
> thinking about this as a regression and are wondering if this could be 
> considered against the head of PG 12 or maybe even PG 13 (although we realize 
> feature freeze has been reached)?
> 
> The attached patch is against PostgreSQL master (commit fb544735) and is 
> about ~10 lines of code. It doesn't change any existing behavior; it only 
> allows getting control of expand_inherited_rtentry, which would make a huge 
> difference for TimescaleDB.
> 
> Best regards,
> 
> Erik
> Engineering team lead
> Timescale
> 
> 
> 

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

Feedback on table expansion hook (including patch)

2020-05-07 Thread Erik Nordström

Hi,

I am looking for feedback on the possibility of adding a table expansion
hook to PostgreSQL (see attached patch). The motivation for this is to
allow extensions to optimize table expansion. In particular, TimescaleDB
does its own table expansion in order to apply a number of optimizations,
including partition pruning (note that TimescaleDB uses inheritance since
PostgreSQL 9.6 rather than declarative partitioning ). There's currently no
official hook for table expansion, but TimescaleDB has been using the
get_relation_info hook for this purpose. Unfortunately, PostgreSQL 12 broke
this for us since it moved expansion to a later stage where we can no
longer control it without some pretty bad hacks. Given that PostgreSQL 12
changed the expansion state of a table for the get_relation_info hook, we
are thinking about this as a regression and are wondering if this could be
considered against the head of PG 12 or maybe even PG 13 (although we
realize feature freeze has been reached)?

The attached patch is against PostgreSQL master (commit fb544735) and is
about ~10 lines of code. It doesn't change any existing behavior; it only
allows getting control of expand_inherited_rtentry, which would make a huge
difference for TimescaleDB.

Best regards,

Erik
Engineering team lead
Timescale
From 775102469a4c7e480c167e30ffd66d6555163f40 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= 
Date: Thu, 16 Apr 2020 13:20:48 +0200
Subject: [PATCH] Add a table expansion hook

This change adds a table expansion hook that allows extensions to
control expansion of inheritence relations (e.g., tables using
inheritence or partitioning). Extensions might want to control
expansion to enable optimizations to constraint exclusion or for
access control on certain subtables.

In PostgreSQL 12, table expansion moved to a much later stage of
planning, which means that extensions that relied on expansion
happening prior to, e.g., get_relation_info_hook no longer have the
ability to control or editorialize table expansion.
---
 src/backend/optimizer/plan/initsplan.c | 7 ++-
 src/backend/optimizer/util/inherit.c   | 3 +++
 src/include/optimizer/inherit.h| 8 
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..358adf7413 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -159,7 +159,12 @@ add_other_rels_to_query(PlannerInfo *root)
 
 		/* If it's marked as inheritable, look for children. */
 		if (rte->inh)
-			expand_inherited_rtentry(root, rel, rte, rti);
+		{
+			if (expand_inherited_rtentry_hook)
+(*expand_inherited_rtentry_hook) (root, rel, rte, rti);
+			else
+expand_inherited_rtentry(root, rel, rte, rti);
+		}
 	}
 }
 
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 3132fd35a5..2673e1a285 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -35,6 +35,9 @@
 #include "utils/rel.h"
 
 
+/* Hook for plugins to get control in expand_inherited_rtentry() */
+expand_inherited_rtentry_hook_type expand_inherited_rtentry_hook = NULL;
+
 static void expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 	   RangeTblEntry *parentrte,
 	   Index parentRTindex, Relation parentrel,
diff --git a/src/include/optimizer/inherit.h b/src/include/optimizer/inherit.h
index 0ba6132652..f0ccff2039 100644
--- a/src/include/optimizer/inherit.h
+++ b/src/include/optimizer/inherit.h
@@ -16,6 +16,14 @@
 
 #include "nodes/pathnodes.h"
 
+/* Hook for plugins to get control over expand_inherited_rtentry() */
+typedef void (*expand_inherited_rtentry_hook_type)(PlannerInfo *root,
+   RelOptInfo *rel,
+   RangeTblEntry *rte,
+   Index rti);
+
+extern PGDLLIMPORT expand_inherited_rtentry_hook_type expand_inherited_rtentry_hook;
+
 
 extern void expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
 	 RangeTblEntry *rte, Index rti);
-- 
2.25.1

Re: Should smgrdounlink() be removed?

2020-05-07 Thread Fujii Masao





On 2020/05/07 13:49, Michael Paquier wrote:

On Thu, May 07, 2020 at 09:48:35AM +0530, vignesh C wrote:

I could not find any code reference to smgrdounlink, I feel it can be
removed.


The last use of smgrdounlink() was b416691.  I have just looked at
Debian Code Search and github, and could not find a hit with the
function being used in some custom extension code, so it feels like a
safe bet to remove it.


+1

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Why are wait events not reported even though it reads/writes a timeline history file?

2020-05-07 Thread Fujii Masao




On 2020/05/02 11:24, Michael Paquier wrote:

On Fri, May 01, 2020 at 12:04:56PM +0900, Fujii Masao wrote:

I applied cosmetic changes to the patch (attached). Barring any objection,
I will push this patch (also back-patch to v10 where wait-event for timeline
file was added).


Sorry for arriving late to the party.  I have one tiny comment.


+   pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_READ);
+   res = fgets(fline, sizeof(fline), fd);
+   pgstat_report_wait_end();
+   if (ferror(fd))
+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg("could not read file \"%s\": 
%m", path)));
+   if (res == NULL)
+   break;


It seems to me that there is no point to check ferror() if fgets()
does not return NULL, no?


Yeah, so I updated the patch so that ferror() is called only
when fgets() returns NULL. Attached is the updated version of
the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/timeline.c 
b/src/backend/access/transam/timeline.c
index de57d699af..e6a29d9a9b 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -78,7 +78,6 @@ readTimeLineHistory(TimeLineID targetTLI)
List   *result;
charpath[MAXPGPATH];
charhistfname[MAXFNAMELEN];
-   charfline[MAXPGPATH];
FILE   *fd;
TimeLineHistoryEntry *entry;
TimeLineID  lasttli = 0;
@@ -123,15 +122,30 @@ readTimeLineHistory(TimeLineID targetTLI)
 * Parse the file...
 */
prevend = InvalidXLogRecPtr;
-   while (fgets(fline, sizeof(fline), fd) != NULL)
+   for (;;)
{
-   /* skip leading whitespace and check for # comment */
+   charfline[MAXPGPATH];
+   char   *res;
char   *ptr;
TimeLineID  tli;
uint32  switchpoint_hi;
uint32  switchpoint_lo;
int nfields;
 
+   pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_READ);
+   res = fgets(fline, sizeof(fline), fd);
+   pgstat_report_wait_end();
+   if (res == NULL)
+   {
+   if (ferror(fd))
+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg("could not read file 
\"%s\": %m", path)));
+
+   break;
+   }
+
+   /* skip leading whitespace and check for # comment */
for (ptr = fline; *ptr; ptr++)
{
if (!isspace((unsigned char) *ptr))
@@ -393,6 +407,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID 
parentTLI,
 
nbytes = strlen(buffer);
errno = 0;
+   pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_WRITE);
if ((int) write(fd, buffer, nbytes) != nbytes)
{
int save_errno = errno;
@@ -408,6 +423,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID 
parentTLI,
(errcode_for_file_access(),
 errmsg("could not write to file \"%s\": %m", 
tmppath)));
}
+   pgstat_report_wait_end();
 
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_SYNC);
if (pg_fsync(fd) != 0)

Re: xid wraparound danger due to INDEX_CLEANUP false

2020-05-07 Thread Masahiko Sawada

On Thu, 7 May 2020 at 15:40, Masahiko Sawada
 wrote:
>
> On Thu, 7 May 2020 at 03:28, Peter Geoghegan  wrote:
> >
> > On Wed, May 6, 2020 at 2:28 AM Masahiko Sawada
> >  wrote:
> > > I've attached the patch fixes this issue.
> > >
> > > With this patch, we don't skip only index cleanup phase when
> > > performing an aggressive vacuum. The reason why I don't skip only
> > > index cleanup phase is that index vacuum phase can be called multiple
> > > times, which takes a very long time. Since the purpose of this index
> > > cleanup is to process recyclable pages it's enough to do only index
> > > cleanup phase.
> >
> > That's only true in nbtree, though. The way that I imagined we'd want
> > to fix this is by putting control in each index access method. So,
> > we'd revise the way that amvacuumcleanup() worked -- the
> > amvacuumcleanup routine for each index AM would sometimes be called in
> > a mode that means "I don't really want you to do any cleanup, but if
> > you absolutely have to for your own reasons then you can". This could
> > be represented using something similar to
> > IndexVacuumInfo.analyze_only.
> >
> > This approach has an obvious disadvantage: the patch really has to
> > teach *every* index AM to do something with that state (most will
> > simply do no work). It seems logical to have the index AM control what
> > happens, though. This allows the logic to live inside
> > _bt_vacuum_needs_cleanup() in the case of nbtree, so there is only one
> > place where we make decisions like this.
> >
> > Most other AMs don't have this problem. GiST has a similar issue with
> > recyclable pages, except that it doesn't use 32-bit XIDs so it doesn't
> > need to care about this stuff at all. Besides, it seems like it might
> > be a good idea to do other basic maintenance of the index when we're
> > "skipping" index cleanup. We probably should always do things that
> > require only a small, fixed amount of work. Things like maintaining
> > metadata in the metapage.
> >
> > There may be practical reasons why this approach isn't suitable for
> > backpatch even if it is a superior approach. What do you think?
>
> I agree this idea is better. I was thinking the same approach but I
> was concerned about backpatching. Especially since I was thinking to
> add one or two fields to IndexVacuumInfo existing index AM might not
> work with the new VacuumInfo structure.

It would be ok if we added these fields at the end of VacuumInfo structure?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

74 matches

Mail list logo