Re: [HACKERS] WAL logging problem in 9.4.3?

2020-03-29 Thread Kyotaro Horiguchi
At Sun, 29 Mar 2020 21:41:01 -0700, Noah Misch  wrote in 
> I think attached v41nm is ready for commit.  Would anyone like to vote against
> back-patching this?  It's hard to justify lack of back-patch for a data-loss
> bug, but this is atypically invasive.  (I'm repeating the question, since some
> folks missed my 2020-02-18 question.)  Otherwise, I'll push this on Saturday.
> 
> On Mon, Mar 23, 2020 at 05:20:27PM +0900, Kyotaro Horiguchi wrote:
> > At Sat, 21 Mar 2020 15:49:20 -0700, Noah Misch  wrote in 
> > > The proximate cause is the RelFileNodeSkippingWAL() call that we added to
> > > MarkBufferDirtyHint().  MarkBufferDirtyHint() runs in parallel workers, 
> > > but
> > > parallel workers have zeroes for pendingSyncHash and rd_*Subid.
> 
> > > Kyotaro, can you look through the affected code and propose a strategy for
> > > good coexistence of parallel query with the WAL skipping mechanism?
> > 
> > Bi-directional communication between leader and workers is too-much.
> > It wouldn't be acceptable to inhibit the problematic operations on
> > workers such like heap-prune or btree pin removal.  If we do pending
> > syncs just before worker start, it won't fix the issue.
> > 
> > The attached patch passes a list of pending-sync relfilenodes at
> > worker start.
> 
> If you were to issue pending syncs and also cease skipping WAL for affected
> relations, that would fix the issue.  Your design is better, though.  I made
> two notable changes:
>
> - The patch was issuing syncs or FPIs every time a parallel worker exited.  I
>   changed it to skip most of smgrDoPendingSyncs() in parallel workers, like
>   AtEOXact_RelationMap() does.

Exactly. Thank you for fixing it.

> - PARALLEL_KEY_PENDING_SYNCS is most similar to PARALLEL_KEY_REINDEX_STATE and
>   PARALLEL_KEY_COMBO_CID.  parallel.c, not execParallel.c, owns those.  I
>   moved PARALLEL_KEY_PENDING_SYNCS to parallel.c, which also called for style
>   changes in the associated storage.c functions.

That sounds better.

Moving the responsibility of creating pending syncs array reduces
copy. RestorePendingSyncs (And AddPendingSync()) looks better.

> Since pendingSyncHash is always NULL under XLogIsNeeded(), I also removed some
> XLogIsNeeded() tests that immediately preceded !pendingSyncHash tests.

Sounds reasonable. In AddPendingSync, don't we put
Assert(!XLogIsNeeded()) instead of "Assert(pendingSyncHash == NULL)"?
The former guarantees the relationship between XLogIsNeeded() and
pendingSyncHash, and the existing latter assertion looks redundant as
it is placed just after "if (pendingSyncHash)".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2020-03-29 Thread Noah Misch
On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
> >> 3. It feels like the proposed test of cutoff position against both
> >> ends of a segment is a roundabout way of fixing the problem.  I
> >> wonder whether we ought not pass *both* the cutoff and the current
> >> endpoint (latest_page_number) down to the truncation logic, and
> >> have it compare against both of those values.
> 
> > Since latest_page_number can keep changing throughout SlruScanDirectory()
> > execution, that would give a false impression of control.  Better to
> > demonstrate that the xidWrapLimit machinery keeps latest_page_number within
> > acceptable constraints than to ascribe significance to a comparison with a
> > stale latest_page_number.
> 
> Perhaps.  I'm prepared to accept that line of argument so far as the clog
> SLRU goes, but I'm not convinced that the other SLRUs have equally robust
> defenses against advancing too far.  So on the whole I'd rather that the
> SLRU logic handled this issue strictly on the basis of what it knows,
> without assumptions about what calling code may be doing.  Still, maybe
> we only really care about the risk for the clog SLRU?

PerformOffsetsTruncation() is the most at-risk, since a single VACUUM could
burn millions of multixacts via FreezeMultiXactId() calls.  (To make that
happen in single-user mode, I suspect one could use prepared transactions as
active lockers and/or in-progress updaters.)  I'm not concerned about other
SLRUs.  TruncateCommitTs() moves in lockstep with TruncateCLOG().  The other
SimpleLruTruncate() callers handle data that becomes obsolete at every
postmaster restart.

> > Exactly.
> > https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
> > uses your octagon to show the behaviors before and after this patch.
> 
> Cool, thanks for drafting that up.  (My original sketch was not of
> publishable quality ;-).)  To clarify, the upper annotations probably
> ought to read "nextXid <= xidWrapLimit"?

It diagrams the scenario of nextXid reaching xidWrapLimit, so the green dot
represents both values.

> And "cutoffPage" ought
> to be affixed to the orange dot at lower right of the center image?

No; oldestXact and cutoffPage have the same position in that diagram, because
the patch causes the cutoffPage variable to denote the page that contains
oldestXact.  I've now added an orange dot to show that.

> I agree that this diagram depicts why we have a problem right now,
> and the right-hand image shows what we want to have happen.
> What's a little less clear is whether the proposed patch achieves
> that effect.
> 
> In particular, after studying this awhile, it seems like removal
> of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT"
> adjustment isn't really affecting anything.

True.  The set of unlink() calls needs to be the same for oldestXact in the
first page of a segment, in the last page, or in some interior page.  Removing
the rounding neither helps nor hurts correctness.

> So I think what we're actually trying to accomplish here is to
> ensure that instead of deleting up to half of the SLRU space
> before the cutoff, we delete up to half-less-one-segment.
> Maybe it should be half-less-two-segments, just to provide some
> cushion against edge cases.  Reading the first comment in
> SetTransactionIdLimit makes one not want to trust too much in
> arguments based on the exact value of xidWrapLimit, while for
> the other SLRUs it was already unclear whether the edge cases
> were exactly right.

That could be interesting insurance.  While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode.  More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments.  Hence, if we do buy this insurance, let's skip far
more segments.  For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half.  The lower half will stay in place; eventually, XID consumption
will overwrite it.  Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB.  Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest.  CLOG
maximum would rise from 512 MiB to 768 MiB.  Would that be worthwhile?




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-29 Thread Fabien COELHO



Hello Justin,


Well, the following comment says "ignore anything but regular files",
so I'm supposing that that is the behavior that we actually want here
and failed to implement correctly.  There might be scope for
additional directory-reading functions, but I'd think you'd want
more information (such as the file type) returned from anything
that doesn't act this way.


Maybe pg_stat_file() deserves similar attention ?  Right now, it'll fail on a
broken link.  If we changed it to lstat(), then it'd work, but it'd also show
metadata for the *link* rather than its target.


Yep. I think this traditional answer is the rational answer.

As I wrote about an earlier version of the patch, ISTM that instead of 
reinventing, extending, adapting various ls variants (with/without 
metadata, which show only files, which shows target of links, which shows 
directory, etc.) we would just need *one* postgres "ls" implementation 
which would be like "ls -la arg" (returns file type, dates), and then 
everything else is a wrapper around that with appropriate filtering that 
can be done at the SQL level, like you started with recurse.


It would reduce the amount of C code and I find the SQL-level approach 
quite elegant.


--
Fabien.




Re: Reducing WaitEventSet syscall churn

2020-03-29 Thread Kyotaro Horiguchi
At Fri, 13 Mar 2020 16:21:13 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > 0007: "Use a WaitEventSet for postgres_fdw."
> 
> Continues..

At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro  wrote 
in 
> 0007: "Use a WaitEventSet for postgres_fdw."
> 
> Create a single WaitEventSet and use it for all FDW connections.  By
> having our own dedicated WES, we can do the bookkeeping required to
> know when sockets have been closed or need to removed from kernel wait
> sets explicitly (which would be much harder to achieve if
> CommonWaitSet were to be used like that; you need to know when sockets
> are closed by other code, so you can provide fd_closed to
> RemoveWaitEvent()).

It is straightforward and looks fine. If we use the libpq-event-based
solution instead of using the changecount, pgfdw_wait_for_socket and
disconnect_pg_server would be a bit simpler.

> Concretely, if you use just one postgres_fdw connection, you'll see
> just epoll_wait()/kevent() calls for waits, but whever you switch
> between different connections, you'll see eg EPOLL_DEL/EV_DELETE
> followed by EPOLL_ADD/EV_ADD when the set is adjusted (in the kqueue
> implementation these could be collapse into the following wait, but I

Such syscall sequences is shorter than or equal to what we issue now,
so this patch is still an improvement.

> haven't done the work for that).  An alternative would be to have one
> WES per FDW connection, but that seemed wasteful of file descriptors.

In the multi-connection case, we can save both fds and syscalls by one
WaitEventSet containing all connection fds together.  WaitEventSetWait
should take event mask parameter or ModifyWaitEvent should set the
mask in that case. The event is now completely level-triggered so we
can safely ignore unwanted events.

> 0008: "Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet."
> 
> The FATAL message you get if you happen to be waiting for IO rather
> than waiting somewhere else seems arbitrarily different.  By switching
> to a standard automatic exit, it opens the possibility of using
> FeBeWaitSet in a couple more places that would otherwise need to
> create their own WES (see also [1]).  Thoughts?

Don't we really need any message on backend disconnection due to
postmaster death?  As for authentication code, it seems to me the
rationale for not writing the log is that the connection has not been
established at the time. (That depends on what we think the
"connection" is.)

And I suppose that the reason for the authentication logic ignoring
signals is that clients expect that authentication should be completed
as far as possible.

> 0009: "Use FeBeWaitSet for walsender.c."
> 
> Enabled by 0008.

It works and doesn't change behavior. But I found it a bit difficult
to find what events the WaitEventSetWait waits.  Maybe a comment at
the caller sites would be sufficient. I think any comment about the
bare number "0" as the event position of ModifyWaitEvent is also
needed.

> 0010: "Introduce a WaitEventSet for the stats collector."

This looks fine.  The variable event is defined outside its effective
scope but almost all of its function variables the same way.



By the way I found that pqcomm.c uses -1 instead of PGINVALID_SOCKET
for AddWaitEventToSet.

> [1] 
> https://www.postgresql.org/message-id/flat/CA%2BhUKGK%3Dm9dLrq42oWQ4XfK9iDjGiZVwpQ1HkHrAPfG7Kh681g%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Creating a database with a non-predefined collation

2020-03-29 Thread Laurenz Albe
On Sun, 2020-03-29 at 23:31 +0200, Shay Rojansky wrote:
> While working on first-class support for PG collations in the Entity 
> Framework Core ORM,
> I've come across an interesting problem: it doesn't seem to be possible to 
> create a
> database with a collation that isn't predefined, and there doesn't seem to be 
> a way to
> add to that list. I'm specifically looking into creating a database with an 
> ICU collation.
> 
> I've seen the discussion in
> https://www.postgresql.org/message-id/flat/99faa8eb-9de2-8ec0-0a25-1ad1276167cc%402ndquadrant.com,
> though to my understanding, that is about importing ICU rather than libc 
> predefined collations,
> and not adding an arbitrary collation to the list from which databases can be 
> created.
> 
> This seems particularly problematic as a database collation cannot be altered 
> once created,
> leading to an odd chicken-and-egg problem. My initial expectation was for 
> collations in the
> template database to be taken into account, but that doesn't seem to be the 
> case.

This is indeed a missing feature, and the thread you reference was trying to 
improve
things, but enough obstacles surfaced that it didn't make it.

It is less trivial that it looks at first glance.

Yours,
Laurenz Albe





Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread Laurenz Albe
On Sat, 2020-03-28 at 19:21 +1300, David Rowley wrote:
> Thank you.  Pushed.

Thanks for your efforts on this, and thanks for working on the fallout.

How can it be that even after an explicit VACUUM, this patch can cause
unstable regression test results?

Yours,
Laurenz Albe





Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-29 Thread Fujii Masao




On 2020/03/29 15:15, Julien Rouhaud wrote:

On Fri, Mar 27, 2020 at 03:42:50PM +0100, Julien Rouhaud wrote:

On Fri, Mar 27, 2020 at 2:01 PM Fujii Masao  wrote:





So what I'd like to say is that the information that users are interested
in would vary on each situation and case. At least for me it seems
enough for pgss to report only the basic information. Then users
can calculate to get the numbers (like total_time) they're interested in,
from those basic information.

But of course, I'd like to hear more opinions about this...


+1

Unless someone chime in by tomorrow, I'll just drop the sum as it
seems less controversial and not a blocker in userland if users are
interested.


Done in attached v11, with also the s/querytext/query_text/ discrepancy noted
previously.


Thanks for updating the patch! But I still think query_string is better
name because it's used in other several places, for the sake of consistency.
So I changed the argument name that way and commit the 0001 patch.
If you think query_text is better, let's keep discussing this topic!

Anyway many thanks for your great job!


I also exported BufferUsageAccumDiff as mentioned previously, as it seems
clearner and will avoid future useless code churn, and run pgindent.


Many thanks!! I'm thinking to commit this part separately.
So I made that patch based on your patch. Attached.


Thanks! It looks good to me.


I also kept that part in a distinct commit for convenience.


I also pushed 0002 patch. Thanks!

I will review 0003 patch again.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: improve transparency of bitmap-only heap scans

2020-03-29 Thread Tom Lane
Amit Kapila  writes:
> On Sat, Mar 28, 2020 at 8:02 PM James Coleman  wrote:
>> I'm curious if Tom's objection is mostly on the grounds that we should
>> be consistent in what's displayed, or that he thinks the information
>> is likely to be useless.

> Yeah, it would be good if he clarifies his position.

Some of both: it seems like these ought to be consistent, and the
lack of complaints so far about regular index-only scans suggests
that people don't need the info.  But perhaps we ought to add
similar info in both places.

regards, tom lane




Re: snapper vs. HEAD

2020-03-29 Thread Tom Lane
I wrote:
> I'm forced to the conclusion that the important difference between
> snapper and skate is that the latter uses --enable-cassert and the
> former doesn't, because that's the only remaining difference between
> how I built a working version before and the not-working version
> I have right now.

Confirmed: building gistget with --enable-cassert, and all of snapper's
compile/link options, produces something that passes regression.

The generated asm differs in a whole lot of details, but it looks like
the compiler remembers to annul the branch delay slot in all the
relevant places:

.loc 1 163 0
addcc   %l7, -1, %l7
.L186:
be,pn   %icc, .L80
 add%l6, 48, %l6
...
.loc 1 189 0
be,a,pt %icc, .L186
 addcc  %l7, -1, %l7
...
.loc 1 183 0
lduh[%g4+12], %g4
andcc   %g4, 1, %g0
be,a,pt %icc, .L186
 addcc  %l7, -1, %l7
andcc   %o7, 0xff, %g0
bne,a,pt %icc, .L186
 addcc  %l7, -1, %l7


regards, tom lane




Re: improve transparency of bitmap-only heap scans

2020-03-29 Thread Amit Kapila
On Sat, Mar 28, 2020 at 8:02 PM James Coleman  wrote:
>
> On Fri, Mar 27, 2020 at 9:24 PM Amit Kapila  wrote:
> >
> > Yeah, I also see this information could be useful.  It seems Tom Lane
> > is not entirely convinced of this.  I am not sure if this is the right
> > time to seek more opinions as we are already near the end of CF.  So,
> > we should either decide to move this to the next CF if we think of
> > getting the opinion of others or simply reject it and see a better way
> > for EXPLAIN to identify an index-only BMS.
>
> I'm curious if Tom's objection is mostly on the grounds that we should
> be consistent in what's displayed, or that he thinks the information
> is likely to be useless.
>

Yeah, it would be good if he clarifies his position.

> If consistency is the goal you might e.g., do something that just
> changes the node type output, but in favor of changing that, it seems
> to me that showing "how well did the optimization" is actually more
> valuable than "did we do the optimization at all". Additionally I
> think showing it as an optimization of an existing node is actually
> likely less confusing anyway.
>
> One other thing: my understanding is that this actually matches the
> underlying code split too. For the index only scan case, we actually
> have a separate node (it's not just an optimization of the standard
> index scan). There are discussions about whether that's a good thing,
> but it's what we have. In contrast, the bitmap scan actually has it as
> a pure optimization of the existing bitmap heap scan node.
>

I personally see those as valid points.  Does anybody else want to
weigh in here, so that we can reach to some conclusion and move ahead
with this CF entry?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Missing errcode() in ereport

2020-03-29 Thread Andres Freund
Hi,

On 2020-03-30 11:54:03 +0800, Craig Ringer wrote:
> On Fri, 20 Mar 2020, 01:59 Andres Freund,  wrote:
> > On 2020-03-17 10:09:18 -0400, Tom Lane wrote:
> > > We might want to spend some effort thinking how to find or prevent
> > > additional bugs of the same ilk ...
> >
> > Yea, that'd be good.  Trying to help people new to postgres write their
> > first patches I found that ereport is very confusing to them - largely
> > because the syntax doesn't make much sense. Made worse by the compiler
> > error messages being terrible in many cases.
> 
> 
> Very much agreed.
> 
> I'd have found it helpful to just have the docs explain clearly how it
> works by chaining the comma operator using functions with ignored return
> values.

IDK, that seems like it'll be a bit too implementation specific.


> That would also help people understand how they can make parts of an
> ereport conditional, e.g. only set errdetail() if there additional info is
> currently available W/O duplicating the rest of the ereport .

Well, they can do whatever they can in any function argument list. I
don't see why the ereport docs are a good place to explain ... ? ... :
... ? Or am I misunderstanding what you suggest here?


> Not sure there's much we can do without changing ereport's "signature"

I think the changes we just did already improved the situation a good
bit. Both by obviating the need for the additional parentheses, as well
as making parameters outside of err*() trigger warnings, and also
generally better error/warning messages.


I still think we might want to do a bigger change of the logging
APIs. See the messages leading up to
https://postgr.es/m/20190805214952.dmnn2oetdquixpp4%40alap3.anarazel.de
and the message linked from there.

Greetings,

Andres Freund




Re: Missing errcode() in ereport

2020-03-29 Thread Tom Lane
Craig Ringer  writes:
> I'd have found it helpful to just have the docs explain clearly how it
> works by chaining the comma operator using functions with ignored return
> values.

Want to write some text?

> That would also help people understand how they can make parts of an
> ereport conditional, e.g. only set errdetail() if there additional info is
> currently available W/O duplicating the rest of the ereport .

There are examples of that in the tree, of course, but maybe an
example in the docs wouldn't be bad either.

regards, tom lane




Re: Missing errcode() in ereport

2020-03-29 Thread Craig Ringer
On Fri, 20 Mar 2020, 01:59 Andres Freund,  wrote:

> Hi,
>
> On 2020-03-17 10:09:18 -0400, Tom Lane wrote:
> > We might want to spend some effort thinking how to find or prevent
> > additional bugs of the same ilk ...
>
> Yea, that'd be good.  Trying to help people new to postgres write their
> first patches I found that ereport is very confusing to them - largely
> because the syntax doesn't make much sense. Made worse by the compiler
> error messages being terrible in many cases.


Very much agreed.

I'd have found it helpful to just have the docs explain clearly how it
works by chaining the comma operator using functions with ignored return
values.

That would also help people understand how they can make parts of an
ereport conditional, e.g. only set errdetail() if there additional info is
currently available W/O duplicating the rest of the ereport .


Not sure there's much we can do without changing ereport's "signature"
> though :(
>
> Regards,
>
> Andres
>
>
>


Re: [PATCH] Support built control file in PGXS VPATH builds

2020-03-29 Thread Craig Ringer
On Mon, 9 Mar 2020, 17:27 Peter Eisentraut, <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-02-07 04:14, Craig Ringer wrote:
> > The attached patch fixes this by having PGXS resolve
> > $(EXTENSION).control along the VPATH.
>
> Simpler patch:
>
> diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
> index 271e7eaba8..1cd750eecd 100644
> --- a/src/makefiles/pgxs.mk
> +++ b/src/makefiles/pgxs.mk
> @@ -229,7 +229,7 @@ endif # MODULE_big
>
>  install: all installdirs
>  ifneq (,$(EXTENSION))
> -   $(INSTALL_DATA) $(addprefix $(srcdir)/, $(addsuffix .control,
> $(EXTENSION))) '$(DESTDIR)$(datadir)/extension/'
> +   $(INSTALL_DATA) $(call vpathsearch,$(addsuffix .control,
> $(EXTENSION))) '$(DESTDIR)$(datadir)/extension/'
>  endif # EXTENSION
>  ifneq (,$(DATA)$(DATA_built))
> $(INSTALL_DATA) $(addprefix $(srcdir)/, $(DATA)) $(DATA_built)
> '$(DESTDIR)$(datadir)/$(datamoduledir)/'
>
> Does that work for you?
>

It wouldn't be my preference because it relies on the VPATH variable.
AFAICS the extension  cannot use finer grained vpath directives for this.
And if anything relies on VPATH it must be set so you can't really benefit
from vpath directives for anything else.

Currently it's possible to build extensions by unsetting VPATH after
including pgxs.mk and defining vpath directives only the things you want to
search for. This is immensely useful. You can prevent make from looking for
build products in the srcdir so you don't get issues with stale files if
someone does a vpath build from a dirty worktree that has files left in it
from a previous in tree build. Lots of other things too.

So while your patch would work it would definitely not be my preference.
Frankly I'd rather be moving on the other direction - doing away with all
this DATA vs DATA_BUILT mess entirely and switch everything to using make
vpath directives + automatic variable path resolution.

Our vpathsearch function is IMO a bit of a hack we shouldn't need to use at
all. The only time it's necessary is when we absolutely need to get the
vpath resolved path into a Make variable. Since I don't think make offers
its own vpath directive aware search function there's no convenient way to
get a make var with the resolved path in it before the target is invoked.

So really I think we should be letting make resolve the targets for us
using automatic variables like $< $^ and $@ with the target search logic.

BTW it's definitely rather frustrating that make doesn't appear to have a
$(vpathsearch foo) or $(vpathlookup foo) or whatever of its own. Seems very
silly to not have that when there are vpath directives.


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Mar 30, 2020 at 7:47 AM Tom Lane  wrote:
>> But the ones that were seemingly due to that were intermittent,
>> so we'll have to watch for awhile.

> Today, stats_ext failed on petalura [1].  Can it be due to this?  I
> have also committed a patch but immediately I don't see it to be
> related to my commit.

Yeah, this looks just like petalura's previous failures, so the
problem is still there :-(

regards, tom lane




Re: error context for vacuum to include block number

2020-03-29 Thread Amit Kapila
On Sun, Mar 29, 2020 at 11:35 AM Amit Kapila  wrote:
>
> On Sun, Mar 29, 2020 at 9:04 AM Masahiko Sawada
>  wrote:
> >
> > On Sat, 28 Mar 2020 at 13:23, Amit Kapila  wrote:
> > >
> > >
> > > Please find attached the updated patch with all the changes discussed.
> > > Let me know if I have missed anything?
> > >
> >
> > Thank you for updating the patch! Looks good to me.
> >
>
> Okay, I will push this tomorrow.
>

Pushed.  I see one buildfarm failure [1] but that doesn't seem to be
related to this patch.

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura=2020-03-30%2002%3A20%3A03

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread Amit Kapila
On Mon, Mar 30, 2020 at 7:47 AM Tom Lane  wrote:
>
> David Rowley  writes:
> > I don't believe any of the current buildfarm failures can be
> > attributed to any of the recent changes to autovacuum, but I'll
> > continue to monitor the farm to see if anything is suspect.
>
> I agree none of the failures I see right now are related to that
> (there's some "No space left on device" failures, Windows randomicity,
> snapper's compiler bug, and don't-know-what on hyrax).
>
> But the ones that were seemingly due to that were intermittent,
> so we'll have to watch for awhile.
>

Today, stats_ext failed on petalura [1].  Can it be due to this?  I
have also committed a patch but immediately I don't see it to be
related to my commit.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura=2020-03-30%2002%3A20%3A03

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-29 Thread James Coleman
On Sun, Mar 29, 2020 at 9:44 PM Tomas Vondra
 wrote:
>
> Hi,
>
> Attached is a slightly reorganized patch series. I've merged the fixes
> into the appropriate matches, and I've also combined the two patches
> adding incremental sort paths to additional places in planner.
>
> A couple more comments:
>
>
> 1) I think the GUC documentation in src/sgml/config.sgml is a bit too
> detailed, compared to the other enable_* GUCs. I wonder if there's a
> better place where to move the details. What about adding some examples
> and explanation to perform.sgml?

I'll take a look at that and include in a patch series tomorrow.

> 2) Looking at the explain output, the verbose mode looks like this:
>
> test=# explain (verbose, analyze) select a from t order by a, b, c;
>
> QUERY PLAN
> --
>   Gather Merge  (cost=66.31..816072.71 rows=8333226 width=24) (actual 
> time=4.787..20092.555 rows=1000 loops=1)
> Output: a, b, c
> Workers Planned: 2
> Workers Launched: 2
> ->  Incremental Sort  (cost=66.28..729200.36 rows=4166613 width=24) 
> (actual time=1.308..14021.575 rows=333 loops=3)
>   Output: a, b, c
>   Sort Key: t.a, t.b, t.c
>   Presorted Key: t.a, t.b
>   Full-sort Groups: 4169 Sort Method: quicksort Memory: avg=30kB 
> peak=30kB
>   Presorted Groups: 4144 Sort Method: quicksort Memory: avg=128kB 
> peak=138kB
>   Worker 0:  actual time=0.766..16122.368 rows=3841573 loops=1
>   Full-sort Groups: 6871 Sort Method: quicksort Memory: avg=30kB peak=30kB
> Presorted Groups: 6823 Sort Method: quicksort Memory: avg=132kB 
> peak=141kB
>   Worker 1:  actual time=1.986..16189.831 rows=3845490 loops=1
>   Full-sort Groups: 6874 Sort Method: quicksort Memory: avg=30kB peak=30kB
> Presorted Groups: 6847 Sort Method: quicksort Memory: avg=130kB 
> peak=139kB
>   ->  Parallel Index Scan using t_a_b_idx on public.t  
> (cost=0.43..382365.92 rows=4166613 width=24) (actual time=0.040..9808.449 
> rows=333 loops=3)
> Output: a, b, c
> Worker 0:  actual time=0.048..11275.178 rows=3841573 loops=1
> Worker 1:  actual time=0.041..11314.133 rows=3845490 loops=1
>   Planning Time: 0.166 ms
>   Execution Time: 25135.029 ms
> (22 rows)
>
> There seems to be missing indentation for the first line of worker info.

Working on that too.

> I'm still not quite convinced we should be printing two lines - I know
> you mentioned the lines might be too long, but see how long the other
> lines may get ...

All right, I give in :)

Do you think non-workers (both the leader and non-parallel plans)
should also move to one line?

> 3) I see the new nodes (plan state, ...) have "presortedCols" which does
> not indicate it's a "number of". I think we usually prefix names of such
> fields "n" or "num". What about "nPresortedCols"? (Nitpicking, I know.)

I can fix this too.

Also I noticed a few compiler warnings I'll fixup in tomorrow's reply.

> My TODO for this patch is this:
>
> - review the costing (I think the estimates are OK, but I recall I
>haven't been entirely happy with how it's broken into functions.)
>
> - review the tuplesort changes (the memory contexts etc.)
>
> - do more testing of performance impact on planning

Sounds good.

Thanks,
James




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread Tom Lane
David Rowley  writes:
> I don't believe any of the current buildfarm failures can be
> attributed to any of the recent changes to autovacuum, but I'll
> continue to monitor the farm to see if anything is suspect.

I agree none of the failures I see right now are related to that
(there's some "No space left on device" failures, Windows randomicity,
snapper's compiler bug, and don't-know-what on hyrax).

But the ones that were seemingly due to that were intermittent,
so we'll have to watch for awhile.

regards, tom lane




Re: snapper vs. HEAD

2020-03-29 Thread Tom Lane
Andres Freund  writes:
> On 2020-03-29 20:25:32 -0400, Tom Lane wrote:
>> I could, but it's mighty slow, so I don't especially want to try all 2^N
>> combinations.  Do you have any specific cases in mind?

> I'd be most suspicious of -fstack-protector --param=ssp-buffer-size=4
> and -D_FORTIFY_SOURCE=2. The first two have direct codegen implications,
> the latter can lead to quite different headers being included and adds a
> lot of size tracking to the optimizer.

It occurred to me that just recompiling gistget.c, rather than the whole
backend, would be enough to prove the point.  So after a few trials:

* Removing "-fstack-protector --param=ssp-buffer-size=4" does nothing;
the generated .o file is bitwise the same.

* Removing -D_FORTIFY_SOURCE=2 does change the bits, but it still
crashes.

So that eliminates all of snapper's special compile options :-(.
I'm forced to the conclusion that the important difference between
snapper and skate is that the latter uses --enable-cassert and the
former doesn't, because that's the only remaining difference between
how I built a working version before and the not-working version
I have right now.  Which means that this really is pretty much a
phase-of-the-moon compiler bug, and we've just been very lucky
that we haven't tripped over it before.

regards, tom lane




Re: backup manifests

2020-03-29 Thread Andres Freund
Hi,

On 2020-03-29 21:23:06 -0400, David Steele wrote:
> On 3/29/20 9:07 PM, Andres Freund wrote:
> > On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
> > > > What do you think of having the verification process also call 
> > > > pg_waldump to
> > > > validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
> > > 
> > > I don't love calls to external binaries, but I think the thing that
> > > really bothers me is that pg_waldump is practically bound to terminate
> > > with an error, because the last WAL segment will end with a partial
> > > record.
> > 
> > I don't think that's the case here. You should know the last required
> > record, which should allow to specify the precise end for pg_waldump. If
> > it errors out reading to that point, we'd be in trouble.
> 
> Exactly. All WAL generated during the backup should read fine with
> pg_waldump or there is a problem.

See the attached minimal prototype for what I am thinking of.

This would not correctly handle the case where the timeline changes
while taking a base backup. But I'm not sure that'd be all that serious
a limitation for now?

I'd personally not want to use a base backup that included a timeline
switch...

Greetings,

Andres Freund
diff --git c/src/bin/pg_basebackup/pg_basebackup.c i/src/bin/pg_basebackup/pg_basebackup.c
index c5d95958b29..b61492eba4c 100644
--- c/src/bin/pg_basebackup/pg_basebackup.c
+++ i/src/bin/pg_basebackup/pg_basebackup.c
@@ -2033,6 +2033,11 @@ BaseBackup(void)
 
 	if (verbose)
 		pg_log_info("base backup completed");
+
+	if (format == 'p')
+		pg_log_info("pg_waldump --just-parse -p \"%s\" -t %u -s %s -e %s",
+	basedir, starttli,
+	xlogstart, xlogend);
 }
 
 
diff --git c/src/bin/pg_waldump/pg_waldump.c i/src/bin/pg_waldump/pg_waldump.c
index 279acfa0440..63f4a9cba1f 100644
--- c/src/bin/pg_waldump/pg_waldump.c
+++ i/src/bin/pg_waldump/pg_waldump.c
@@ -40,6 +40,7 @@ typedef struct XLogDumpPrivate
 typedef struct XLogDumpConfig
 {
 	/* display options */
+	bool		just_parse;
 	bool		bkp_details;
 	int			stop_after_records;
 	int			already_displayed_records;
@@ -749,6 +750,7 @@ main(int argc, char **argv)
 	char	   *errormsg;
 
 	static struct option long_options[] = {
+		{"just-parse", no_argument, NULL, 'j'},
 		{"bkp-details", no_argument, NULL, 'b'},
 		{"end", required_argument, NULL, 'e'},
 		{"follow", no_argument, NULL, 'f'},
@@ -794,6 +796,7 @@ main(int argc, char **argv)
 	private.endptr = InvalidXLogRecPtr;
 	private.endptr_reached = false;
 
+	config.just_parse = false;
 	config.bkp_details = false;
 	config.stop_after_records = -1;
 	config.already_displayed_records = 0;
@@ -810,11 +813,14 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "be:fn:p:r:s:t:x:z",
+	while ((option = getopt_long(argc, argv, "be:fjn:p:r:s:t:x:z",
  long_options, )) != -1)
 	{
 		switch (option)
 		{
+			case 'j':
+config.just_parse = true;
+break;
 			case 'b':
 config.bkp_details = true;
 break;
@@ -1076,10 +1082,13 @@ main(int argc, char **argv)
 			continue;
 
 		/* process the record */
-		if (config.stats == true)
-			XLogDumpCountRecord(, , xlogreader_state);
-		else
-			XLogDumpDisplayRecord(, xlogreader_state);
+		if (!config.just_parse)
+		{
+			if (config.stats == true)
+XLogDumpCountRecord(, , xlogreader_state);
+			else
+XLogDumpDisplayRecord(, xlogreader_state);
+		}
 
 		/* check whether we printed enough */
 		config.already_displayed_records++;


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread David Rowley
On Sun, 29 Mar 2020 at 15:29, David Rowley  wrote:
> I'm considering pushing the attached to try to get some confirmation
> that additional autovacuums are the issue. However, I'm not too sure
> it's a wise idea to as I can trigger an additional auto-vacuum and
> have these new tests fail with make installcheck after setting
> autovacuum_naptime to 1s, but I'm not getting the other diffs
> experienced by lousyjack and petalura.  The patch may just cause more
> failures without proving much, especially so with slower machines.

Instead of the above, I ended up modifying the two intermittently
failing tests to change the ANALYZE into a VACUUM ANALYZE.  This
should prevent autovacuum sneaking in a vacuum at some point in time
after the ANALYZE has taken place.

I don't believe any of the current buildfarm failures can be
attributed to any of the recent changes to autovacuum, but I'll
continue to monitor the farm to see if anything is suspect.

David




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-29 Thread Michael Paquier
On Fri, Mar 27, 2020 at 07:47:29AM +0900, Michael Paquier wrote:
> On Thu, Mar 26, 2020 at 06:56:36PM -0300, Alvaro Herrera wrote:
>> I don't think any such cleanup should hamper the patch at hand anyway.
> 
> I don't think either, so I would do the split for the archive routines
> at least.  It still feels strange to me to have a different routine
> name for the frontend-side of RestoreArchivedFile().  That's not
> really consistent with the current practice we have palloc() & co, as
> well as the sync routines.

Okay.  Hearing nothing, I have rebased the patch set this morning,
improving some comments and the code format while on it.  I would like
to commit both 0001 (creation of xlogarchiver.h) and 0002 attached in
the next couple of days.  If you have an objection, please feel free.
--
Michael
From 71302a709a54a88985563c38830deb784728e682 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 30 Mar 2020 10:34:17 +0900
Subject: [PATCH v5 1/2] Move routines of xlogarchive.c into their own header

The definitions of those routines have been part of xlog_internal.h,
which is included by several frontend tools.  More could be done in this
area, but that's already a nice and simple cut.
---
 src/include/access/xlog_internal.h   | 18 
 src/include/access/xlogarchive.h | 35 
 src/backend/access/transam/timeline.c|  1 +
 src/backend/access/transam/xlog.c|  1 +
 src/backend/access/transam/xlogarchive.c |  1 +
 src/backend/replication/walreceiver.c|  1 +
 6 files changed, 39 insertions(+), 18 deletions(-)
 create mode 100644 src/include/access/xlogarchive.h

diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 27ded593ab..8e3cfcf83e 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -320,22 +320,4 @@ extern bool InArchiveRecovery;
 extern bool StandbyMode;
 extern char *recoveryRestoreCommand;
 
-/*
- * Prototypes for functions in xlogarchive.c
- */
-extern bool RestoreArchivedFile(char *path, const char *xlogfname,
-const char *recovername, off_t expectedSize,
-bool cleanupEnabled);
-extern void ExecuteRecoveryCommand(const char *command, const char *commandName,
-   bool failOnSignal);
-extern void KeepFileRestoredFromArchive(const char *path, const char *xlogfname);
-extern void XLogArchiveNotify(const char *xlog);
-extern void XLogArchiveNotifySeg(XLogSegNo segno);
-extern void XLogArchiveForceDone(const char *xlog);
-extern bool XLogArchiveCheckDone(const char *xlog);
-extern bool XLogArchiveIsBusy(const char *xlog);
-extern bool XLogArchiveIsReady(const char *xlog);
-extern bool XLogArchiveIsReadyOrDone(const char *xlog);
-extern void XLogArchiveCleanup(const char *xlog);
-
 #endif			/* XLOG_INTERNAL_H */
diff --git a/src/include/access/xlogarchive.h b/src/include/access/xlogarchive.h
new file mode 100644
index 00..1c67de2ede
--- /dev/null
+++ b/src/include/access/xlogarchive.h
@@ -0,0 +1,35 @@
+/*
+ *
+ * xlogarchive.h
+ *		Prototypes for WAL archives in the backend
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		src/include/access/xlogarchive.h
+ *
+ *
+ */
+
+#ifndef XLOG_ARCHIVE_H
+#define XLOG_ARCHIVE_H
+
+#include "access/xlogdefs.h"
+
+extern bool RestoreArchivedFile(char *path, const char *xlogfname,
+const char *recovername, off_t expectedSize,
+bool cleanupEnabled);
+extern void ExecuteRecoveryCommand(const char *command, const char *commandName,
+   bool failOnSignal);
+extern void KeepFileRestoredFromArchive(const char *path, const char *xlogfname);
+extern void XLogArchiveNotify(const char *xlog);
+extern void XLogArchiveNotifySeg(XLogSegNo segno);
+extern void XLogArchiveForceDone(const char *xlog);
+extern bool XLogArchiveCheckDone(const char *xlog);
+extern bool XLogArchiveIsBusy(const char *xlog);
+extern bool XLogArchiveIsReady(const char *xlog);
+extern bool XLogArchiveIsReadyOrDone(const char *xlog);
+extern void XLogArchiveCleanup(const char *xlog);
+
+#endif			/* XLOG_ARCHIVE_H */
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index 860a996414..d683af377f 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -36,6 +36,7 @@
 
 #include "access/timeline.h"
 #include "access/xlog.h"
+#include "access/xlogarchive.h"
 #include "access/xlog_internal.h"
 #include "access/xlogdefs.h"
 #include "pgstat.h"
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8fe92962b0..03c81c0375 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -32,6 +32,7 @@
 #include 

Re: backup manifests

2020-03-29 Thread David Steele

On 3/29/20 9:07 PM, Andres Freund wrote:

On 2020-03-29 20:42:35 -0400, Robert Haas wrote:

What do you think of having the verification process also call pg_waldump to
validate the WAL CRCs (shown upthread)?  That looked helpful and simple.


I don't love calls to external binaries, but I think the thing that
really bothers me is that pg_waldump is practically bound to terminate
with an error, because the last WAL segment will end with a partial
record.


I don't think that's the case here. You should know the last required
record, which should allow to specify the precise end for pg_waldump. If
it errors out reading to that point, we'd be in trouble.


Exactly. All WAL generated during the backup should read fine with 
pg_waldump or there is a problem.


--
-David
da...@pgmasters.net




Re: snapper vs. HEAD

2020-03-29 Thread Tom Lane
Andres Freund  writes:
> Is it visibly bad when looking at the .s of gistget.c "directly", or
> when disassembling the fiinal binary? Because I've seen linkers screw up
> on a larger scale than I'd have expected in the past.

Yes, the bogus asm that I showed was from gistget.s, not from
disassembling.

regards, tom lane




Re: backup manifests

2020-03-29 Thread Andres Freund
Hi,

On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
> > What do you think of having the verification process also call pg_waldump to
> > validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
> 
> I don't love calls to external binaries, but I think the thing that
> really bothers me is that pg_waldump is practically bound to terminate
> with an error, because the last WAL segment will end with a partial
> record.

I don't think that's the case here. You should know the last required
record, which should allow to specify the precise end for pg_waldump. If
it errors out reading to that point, we'd be in trouble.


> For the same reason, I think there's really no such thing as
> validating a single WAL file. I suppose you'd need to know the exact
> start and end locations for a minimal WAL replay and check that all
> records between those LSNs appear OK, ignoring any apparent problems
> after the minimum ending point, or at least ignoring any problems due
> to an incomplete record in the last file. We don't have a tool for
> that currently, and I don't think I can write one this week. Or at
> least, not a good one.

pg_waldump -s / -e?


> > > + parse->pathname = palloc(raw_length + 1);
> >
> > I don't see this freed anywhere; is it?  (It's useful to make peak memory
> > consumption not grow in proportion to the number of files backed up.)
> 
> We need the hash table to remain populated for the whole run time of
> the tool, because we're essentially doing a full join of the actual
> directory contents against the manifest contents. That's a bit
> unfortunate but it doesn't seem simple to improve. I think the only
> people who are really going to suffer are people who have an enormous
> pile of empty or nearly-empty relations. People who have large
> databases for the normal reason - i.e. a reasonable number of tables
> that hold a lot of data - will have manifests of very manageable size.

Given that that's a pre-existing issue - at a significantly larger scale
imo - e.g. for pg_dump (even in the --schema-only case), and that there
are tons of backend side issues with lots of relations too, I think
that's fine.

You could of course implement something merge-join like, and implement
the sorted input via a disk base sort. But that's a lot of work (good
luck making tuplesort work in the frontend...). So I'd not go there
unless there's a lot of evidence this is a serious practical issue.

If we find this use too much memory, I think we'd be better off
condensing pathnames into either fewer allocations, or a RelFileNode as
part of the struct (with a fallback to string for other types of
files). But I'd also not go there for now.

Greetings,

Andres Freund




Re: backup manifests

2020-03-29 Thread David Steele

On 3/29/20 8:47 PM, Robert Haas wrote:

On Fri, Mar 27, 2020 at 4:02 PM David Steele  wrote:

I prefer to validate the size and checksum in the same pass, but I'm not
sure it's that big a deal.  If the backup is being corrupted under the
validate process that would also apply to files that had already been
validated.


I did it like this because I thought that in typical scenarios it
would be likely to produce useful results more quickly. For instance,
suppose that you forget to restore the tablespace directories, and
just get the main $PGDATA directory. Well, if you do it all in one
pass, you might spend a long time checksumming things before you
realize that some files are completely missing. I thought it would be
useful to complain about files that are extra or missing or the wrong
size FIRST, because that only requires us to stat() each file, and
only after that do the comparatively extensive checksumming step that
requires us to read the entire contents of each file. Granted, unless
you use --exit-on-error, you're going to get all the complaints
eventually anyway, but you might use that option, or you might hit ^C
when you start to see a slough of complaints poppoing out.


Yeah, that seems reasonable.

In our case backups are nearly always compressed and/or encrypted so 
even checking the original size is a bit of work. Getting the checksum 
at the same time seems like an obvious win.


Currently we don't have a separate validate command outside of restore 
but when we do we'll consider doing a pass to check for file presence 
(and size when possible) first. Thanks!



I wasn't worried about
concurrent modification of the backup because then you're super-hosed
no matter what.


Really, really, super-hosed.

Regards,
--
-David
da...@pgmasters.net




Re: backup manifests

2020-03-29 Thread Andres Freund
Hi,

On 2020-03-29 20:47:40 -0400, Robert Haas wrote:
> Maybe that was the wrong idea, but I thought people would like the
> idea of running cheaper checks first. I wasn't worried about
> concurrent modification of the backup because then you're super-hosed
> no matter what.

I do like that approach.

To be clear: I'm suggesting the additional crosscheck not because I'm
not concerned with concurrent modifications, but because I've seen
filesystem per-inode metadata and the actual data / extent-tree
differ. Leading to EOF reported while reading at a different place than
what the size via stat() would indicate.

Greetings,

Andres Freund




Re: [PATCH] Redudant initilization

2020-03-29 Thread Kyotaro Horiguchi
Hello.

At Sat, 28 Mar 2020 19:04:00 -0300, Ranier Vilela  wrote 
in 
> Hi,
> This patch fixes some redundant initilization, that are safe to remove.

> diff --git a/src/backend/access/gist/gistxlog.c 
> b/src/backend/access/gist/gistxlog.c
> index d3f3a7b803..ffaa2b1ab4 100644
> --- a/src/backend/access/gist/gistxlog.c
> +++ b/src/backend/access/gist/gistxlog.c
> @@ -396,7 +396,7 @@ gistRedoPageReuse(XLogReaderState *record)
>   if (InHotStandby)
>   {
>   FullTransactionId latestRemovedFullXid = 
> xlrec->latestRemovedFullXid;
> - FullTransactionId nextFullXid = ReadNextFullTransactionId();
> + FullTransactionId nextFullXid;

I'd prefer to preserve this and remove the latter.

> diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
> index 9d9e915979..795cf349eb 100644
> --- a/src/backend/catalog/heap.c
> +++ b/src/backend/catalog/heap.c
> @@ -3396,7 +3396,7 @@ List *
>  heap_truncate_find_FKs(List *relationIds)
>  {
>   List   *result = NIL;
> - List   *oids = list_copy(relationIds);
> + List   *oids;

This was just a waste of memory, the fix looks fine.

> diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
> index c5b771c531..37fbeef841 100644
> --- a/src/backend/storage/smgr/md.c
> +++ b/src/backend/storage/smgr/md.c
> @@ -730,9 +730,11 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, 
> BlockNumber blocknum,
>  BlockNumber
>  mdnblocks(SMgrRelation reln, ForkNumber forknum)
>  {
> - MdfdVec*v = mdopenfork(reln, forknum, EXTENSION_FAIL);
> + MdfdVec*v;
>   BlockNumber nblocks;
> - BlockNumber segno = 0;
> + BlockNumber segno;
> +
> +mdopenfork(reln, forknum, EXTENSION_FAIL);
>  
>   /* mdopen has opened the first segment */
>   Assert(reln->md_num_open_segs[forknum] > 0);

It doesn't seems *to me* an issue.

> diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
> index 8bb00abb6b..7a6a2ecbe9 100644
> --- a/src/backend/utils/adt/json.c
> +++ b/src/backend/utils/adt/json.c
> @@ -990,7 +990,7 @@ catenate_stringinfo_string(StringInfo buffer, const char 
> *addon)
>  Datum
>  json_build_object(PG_FUNCTION_ARGS)
>  {
> - int nargs = PG_NARGS();
> + int nargs;

This part looks fine.

>   int i;
>   const char *sep = "";
>   StringInfo  result;
> @@ -998,6 +998,8 @@ json_build_object(PG_FUNCTION_ARGS)
>   bool   *nulls;
>   Oid*types;
>  
> +PG_NARGS();
> +
>   /* fetch argument values to build the object */
>   nargs = extract_variadic_args(fcinfo, 0, false, , , );

PG_NARGS() doesn't have a side-effect so no need to call independently.


> diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
> index 9e24fec72d..fb0e833b2d 100644
> --- a/src/backend/utils/mmgr/mcxt.c
> +++ b/src/backend/utils/mmgr/mcxt.c
> @@ -475,7 +475,7 @@ MemoryContextMemAllocated(MemoryContext context, bool 
> recurse)
>  
>   if (recurse)
>   {
> - MemoryContext child = context->firstchild;
> + MemoryContext child;
>  
>   for (child = context->firstchild;
>child != NULL;

This looks fine.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: snapper vs. HEAD

2020-03-29 Thread Andres Freund
Hi,

On 2020-03-29 20:25:32 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > If you still have the environment it might make sense to check wether
> > it's related to one of the other options. But otherwise I wouldn't be
> > against the proposal.
> 
> I could, but it's mighty slow, so I don't especially want to try all 2^N
> combinations.  Do you have any specific cases in mind?

I'd be most suspicious of -fstack-protector --param=ssp-buffer-size=4
and -D_FORTIFY_SOURCE=2. The first two have direct codegen implications,
the latter can lead to quite different headers being included and adds a
lot of size tracking to the optimizer.


> (I guess we can exclude LDFLAGS, since the assembly output is visibly
> bad.)

Seems likely.

Is it visibly bad when looking at the .s of gistget.c "directly", or
when disassembling the fiinal binary? Because I've seen linkers screw up
on a larger scale than I'd have expected in the past.

Greetings,

Andres Freund




Re: backup manifests

2020-03-29 Thread David Steele

On 3/29/20 8:42 PM, Robert Haas wrote:

On Sat, Mar 28, 2020 at 11:40 PM Noah Misch  wrote:

I don't see this freed anywhere; is it?  (It's useful to make peak memory
consumption not grow in proportion to the number of files backed up.)


We need the hash table to remain populated for the whole run time of
the tool, because we're essentially doing a full join of the actual
directory contents against the manifest contents. That's a bit
unfortunate but it doesn't seem simple to improve. I think the only
people who are really going to suffer are people who have an enormous
pile of empty or nearly-empty relations. People who have large
databases for the normal reason - i.e. a reasonable number of tables
that hold a lot of data - will have manifests of very manageable size.


+1

--
-David
da...@pgmasters.net




Re: fix for BUG #3720: wrong results at using ltree

2020-03-29 Thread Tom Lane
I wrote:
> My main complaint about it remains the same, that it changes a
> disturbingly large number of existing regression-test results,
> suggesting that there's not a meeting of the minds about what
> this logic is supposed to do.  Maybe it's okay or maybe it's
> not, but who's going to decide?

Well ... *somebody's* got to decide, and since Oleg and Teodor aren't
stepping up, I took it on myself to study this more closely.

It seems to me that indeed the existing behavior is broken: given
what the documentation has to say, it's really hard to argue that
an lquery like "*.!foo.*" means something other than "match any
ltree that has at least one label that is not 'foo'".  But the
existing code produces

regression=# select 'bar.foo.baz'::ltree ~ '*.!foo.*'::lquery;
 ?column? 
--
 f
(1 row)

I agree that's just plain wrong, and so are all the regression
test cases that this patch is changing the results of.

However, I think there is a valid use-case that the existing
code is trying to solve: how can you say "match any ltree in
which no label is 'foo'"?  That is the effective behavior right
now of a pattern like this, and it seems useful.  So if we change
this, we ought to provide some other way to get that result.

What I propose we do about that is allow lquery quantifiers to
be attached to regular items as well as star items, so that the
need is met by saying this:

regression=# select 'bar.foo.baz'::ltree ~ '!foo{,}'::lquery;
 ?column? 
--
 f
(1 row)

regression=# select 'bar.fool.baz'::ltree ~ '!foo{,}'::lquery;
 ?column? 
--
 t
(1 row)

Also, once we do that, it's possible to treat star and non-star items
basically alike in checkCond, with checkLevel being the place that
accounts for them being different.  This results in logic that's far
simpler and much more nearly like the way that LIKE patterns are
implemented, which seems like a good thing to me.

Hence, attached are two revised patches that attack the problem
this way.  The first one is somewhat unrelated to the original
point --- it's trying to clean up the error messages in ltree_in
and lquery_in so that they are more consistent and agree with
the terminology used in the documentation.  (Notably, the term
"level" is used nowhere in the ltree docs, except in the legacy
function name nlevel().)  However its movement of the check for
high < low is needed to make that cover the case of a bogus non-star
quantifier in patch 0002.  These also depend on the cosmetic
patches I just committed, so you need HEAD as of today to get
them to apply.

regards, tom lane

diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index 1b31dbc..cd18516 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -464,7 +464,7 @@ SELECT nlevel(('1' || repeat('.1', 65534))::ltree);
 (1 row)
 
 SELECT nlevel(('1' || repeat('.1', 65535))::ltree);
-ERROR:  number of ltree levels (65536) exceeds the maximum allowed (65535)
+ERROR:  number of ltree labels (65536) exceeds the maximum allowed (65535)
 SELECT nlevel(('1' || repeat('.1', 65534))::ltree || '1');
 ERROR:  number of ltree levels (65536) exceeds the maximum allowed (65535)
 SELECT ('1' || repeat('.1', 65534))::lquery IS NULL;
@@ -474,7 +474,7 @@ SELECT ('1' || repeat('.1', 65534))::lquery IS NULL;
 (1 row)
 
 SELECT ('1' || repeat('.1', 65535))::lquery IS NULL;
-ERROR:  number of lquery levels (65536) exceeds the maximum allowed (65535)
+ERROR:  number of lquery items (65536) exceeds the maximum allowed (65535)
 SELECT '*{65535}'::lquery;
   lquery  
 --
@@ -485,7 +485,7 @@ SELECT '*{65536}'::lquery;
 ERROR:  lquery syntax error
 LINE 1: SELECT '*{65536}'::lquery;
^
-DETAIL:  Low limit (65536) exceeds the maximum allowed (65535).
+DETAIL:  Low limit (65536) exceeds the maximum allowed (65535), at character 2.
 SELECT '*{,65534}'::lquery;
   lquery   
 ---
@@ -502,7 +502,12 @@ SELECT '*{,65536}'::lquery;
 ERROR:  lquery syntax error
 LINE 1: SELECT '*{,65536}'::lquery;
^
-DETAIL:  High limit (65536) exceeds the maximum allowed (65535).
+DETAIL:  High limit (65536) exceeds the maximum allowed (65535), at character 3.
+SELECT '*{4,3}'::lquery;
+ERROR:  lquery syntax error
+LINE 1: SELECT '*{4,3}'::lquery;
+   ^
+DETAIL:  Low limit (4) is greater than high limit (3), at character 4.
 SELECT '1.2'::ltree  < '2.2'::ltree;
  ?column? 
 --
diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c
index 2503d47..2136715 100644
--- a/contrib/ltree/ltree_io.c
+++ b/contrib/ltree/ltree_io.c
@@ -17,12 +17,6 @@ PG_FUNCTION_INFO_V1(lquery_in);
 PG_FUNCTION_INFO_V1(lquery_out);
 
 
-#define UNCHAR ereport(ERROR, \
-	   (errcode(ERRCODE_SYNTAX_ERROR), \
-		errmsg("syntax error at position %d", \
-		pos)));
-
-
 typedef struct
 {
 	char	   *start;
@@ -49,6 +43,11 @@ ltree_in(PG_FUNCTION_ARGS)
 	int			charlen;
 	int			pos = 0;
 
+#define UNCHAR 

Re: backup manifests

2020-03-29 Thread David Steele

On 3/29/20 8:33 PM, Robert Haas wrote:

On Fri, Mar 27, 2020 at 4:32 PM Andres Freund  wrote:


FWIW, I was thinking of backup_manifest.checksum potentially being
desirable for another reason: The need to embed the checksum inside the
document imo adds a fair bit of rigidity to the file format. See


Well, David Steele suggested this approach. I didn't particularly like
it, but nobody showed up to agree with me or propose anything
different, so here we are. I don't think it's the end of the world.


I prefer the embedded checksum even though it is a pain. It's a lot less 
likely to go missing.


--
-David
da...@pgmasters.net




Re: backup manifests

2020-03-29 Thread Robert Haas
On Fri, Mar 27, 2020 at 4:02 PM David Steele  wrote:
> I prefer to validate the size and checksum in the same pass, but I'm not
> sure it's that big a deal.  If the backup is being corrupted under the
> validate process that would also apply to files that had already been
> validated.

I did it like this because I thought that in typical scenarios it
would be likely to produce useful results more quickly. For instance,
suppose that you forget to restore the tablespace directories, and
just get the main $PGDATA directory. Well, if you do it all in one
pass, you might spend a long time checksumming things before you
realize that some files are completely missing. I thought it would be
useful to complain about files that are extra or missing or the wrong
size FIRST, because that only requires us to stat() each file, and
only after that do the comparatively extensive checksumming step that
requires us to read the entire contents of each file. Granted, unless
you use --exit-on-error, you're going to get all the complaints
eventually anyway, but you might use that option, or you might hit ^C
when you start to see a slough of complaints poppoing out.

Maybe that was the wrong idea, but I thought people would like the
idea of running cheaper checks first. I wasn't worried about
concurrent modification of the backup because then you're super-hosed
no matter what.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: backup manifests

2020-03-29 Thread Robert Haas
On Sat, Mar 28, 2020 at 11:40 PM Noah Misch  wrote:
> Stephen Frost mentioned that a backup could pass validation even if
> pg_basebackup were killed after writing the base backup and before finishing
> the writing of pg_wal.  One might avoid that by simply writing the manifest to
> a temporary name and renaming it to the final name after populating pg_wal.

Huh, that's an idea. I'll have a look at the code and see what would
be involved.

> What do you think of having the verification process also call pg_waldump to
> validate the WAL CRCs (shown upthread)?  That looked helpful and simple.

I don't love calls to external binaries, but I think the thing that
really bothers me is that pg_waldump is practically bound to terminate
with an error, because the last WAL segment will end with a partial
record. For the same reason, I think there's really no such thing as
validating a single WAL file. I suppose you'd need to know the exact
start and end locations for a minimal WAL replay and check that all
records between those LSNs appear OK, ignoring any apparent problems
after the minimum ending point, or at least ignoring any problems due
to an incomplete record in the last file. We don't have a tool for
that currently, and I don't think I can write one this week. Or at
least, not a good one.

> I think this functionality doesn't belong in its own program.  If you suspect
> pg_basebackup or pg_restore will eventually gain the ability to merge
> incremental backups into a recovery-ready base backup, I would put the
> functionality in that program.  Otherwise, I would put it in pg_checksums.
> For me, part of the friction here is that the program description indicates
> general verification, but the actual functionality merely checks hashes on a
> directory tree that happens to represent a PostgreSQL base backup.

Suraj's original patch made this part of pg_basebackup, but I didn't
really like that, because I wanted it to have its own set of options.
I still think all the options I've added are pretty useful ones, and I
can think of other things somebody might want to do. It feels very
uncomfortable to make pg_basebackup, or pg_checksums, take either
options from set A and do thing X, or options from set B and do thing
Y. But it feels clear that the name pg_validatebackup is not going
over very well with anyone. I think I should rename it to
pg_validatemanifest.

> > + parse->pathname = palloc(raw_length + 1);
>
> I don't see this freed anywhere; is it?  (It's useful to make peak memory
> consumption not grow in proportion to the number of files backed up.)

We need the hash table to remain populated for the whole run time of
the tool, because we're essentially doing a full join of the actual
directory contents against the manifest contents. That's a bit
unfortunate but it doesn't seem simple to improve. I think the only
people who are really going to suffer are people who have an enormous
pile of empty or nearly-empty relations. People who have large
databases for the normal reason - i.e. a reasonable number of tables
that hold a lot of data - will have manifests of very manageable size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: backup manifests

2020-03-29 Thread Robert Haas
On Fri, Mar 27, 2020 at 4:32 PM Andres Freund  wrote:
> I don't like having a file format that's intended to be used by external
> tools too that's undocumented except for code that assembles it in a
> piecemeal fashion.  Do you mean in a follow-on patch this release, or
> later? I don't have a problem with the former.

This release. I'm happy to work on that as soon as this gets
committed, assuming it gets committed.

> I do found it to be circular. I think we mostly need a paragraph or two
> somewhere that explains on a higher level what the point of verifying
> base backups is and what is verified.

Fair enough.

> FWIW, I was thinking of backup_manifest.checksum potentially being
> desirable for another reason: The need to embed the checksum inside the
> document imo adds a fair bit of rigidity to the file format. See

Well, David Steele suggested this approach. I didn't particularly like
it, but nobody showed up to agree with me or propose anything
different, so here we are. I don't think it's the end of the world.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: snapper vs. HEAD

2020-03-29 Thread Tom Lane
Andres Freund  writes:
> On 2020-03-28 23:50:32 -0400, Tom Lane wrote:
>> ... Curiously, its sibling
>> skate is *not* failing, despite being on the same machine and compiler.

> Hm. There's some difference in code-gen specific options.

Yeah, I confirmed that on my VM install too: using our typical codegen
options (just -O2 -g), the regression tests pass, matching skate's
results.  It fell over after I matched snapper's CFLAGS, CPPFLAGS,
and LDFLAGS.  I didn't try to break things down more finely as to
which option(s) trigger the bad code, since it looks like it's probably
some purely-internal compiler state ...

> What options were you using? Reproducing snapper as exactly as possible?

Yeah, see above.

> If you still have the environment it might make sense to check wether
> it's related to one of the other options. But otherwise I wouldn't be
> against the proposal.

I could, but it's mighty slow, so I don't especially want to try all 2^N
combinations.  Do you have any specific cases in mind?

(I guess we can exclude LDFLAGS, since the assembly output is visibly
bad.)

regards, tom lane




Re: DROP DATABASE doesn't force other backends to close FDs

2020-03-29 Thread Andres Freund
Hi,

On 2018-10-03 15:37:25 -0700, Andres Freund wrote:
> Joerg reported on IRC that after a DROP DATABASE the space of the
> dropped database wasn't available to the OS until he killed a session
> that was active from before the drop. I was kinda incredulous at first,
> the code looks sane...  Thomas figured out significant parts of this.
> [ context ]
> That unfortunately disregards that normal backends could have plenty
> files open for the to-be-dropped database, if there's any sort of
> shared_buffer pressure. Whenever a backend writes out a dirty victim
> page belonging to another database, it'll also open files therein.

Ping? As far as I can tell this is still an issue. And I think the
issue exists not just or entire databases, but also tables.

I think is likely relevant for complaints like
https://www.postgresql.org/message-id/20200329224913.GA11265%40hjp.at

Greetings,

Andres Freund




Re: snapper vs. HEAD

2020-03-29 Thread Andres Freund
Hi,

On 2020-03-28 23:50:32 -0400, Tom Lane wrote:
> Buildfarm member snapper has been crashing in the core regression tests
> since commit 17a28b0364 (well, there's a bit of a range of uncertainty
> there, but 17a28b0364 looks to be the only such commit that could have
> affected code in gistget.c where the crash is).  Curiously, its sibling
> skate is *not* failing, despite being on the same machine and compiler.

Hm. There's some difference in code-gen specific options.

snapper has:
'CFLAGS' => '-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security ',
'CPPFLAGS' => '-D_FORTIFY_SOURCE=2',
'LDFLAGS' => '-Wl,-z,relro -Wl,-z,now'
and specifies (among others)
  '--enable-thread-safety',
  '--with-gnu-ld',
whereas skate has --enable-cassert.

Not too hard to imagine that several of these could cause enough
code-gen differences so that one exhibits the bug, and the other
doesn't.


The different commandlines for gistget end up being:

snapper:
ccache gcc-4.7 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -g -g -O2 -fstack-protector 
--param=ssp-buffer-size=4 -Wformat -Werror=format-security  
-I../../../../src/include  -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE 
-I/usr/include/libxml2  -I/usr/include/mit-krb5  -c -o gistget.o gistget.c
skate:
ccache gcc-4.7 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -g -O2 -I../../../../src/include  -D_GNU_SOURCE 
-I/usr/include/libxml2   -c -o gistget.o gistget.c


> I looked into this by dint of setting up a similar environment in a
> qemu VM.  I might not have reproduced things exactly, but I managed
> to get the same kind of crash at approximately the same place, and
> what it looks like to me is a compiler bug.

What options were you using? Reproducing snapper as exactly as possible?


> It's unclear how 17a28b0364 would have affected this, but there is
> an elog() call elsewhere in the same function, so maybe the new
> coding for that changed register assignments or some other
> phase-of-the-moon effect.

Yea, wouldn't be too surprising.


> I doubt that anyone's going to take much interest in fixing this
> old compiler version, so my recommendation is to back off the
> optimization level on snapper to -O1, and probably on skate as
> well because there's no obvious reason why the same compiler bug
> might not bite skate at some point.  I was able to get through
> the core regression tests on my qemu VM after recompiling
> gistget.c at -O1 (with other flags the same as snapper is using).

If you still have the environment it might make sense to check wether
it's related to one of the other options. But otherwise I wouldn't be
against the proposal.

Greetings,

Andres Freund




Re: Improving connection scalability: GetSnapshotData()

2020-03-29 Thread Tomas Vondra

On Sun, Mar 29, 2020 at 11:50:10AM -0700, Andres Freund wrote:

Hi,

On March 29, 2020 11:24:32 AM PDT, Alexander Korotkov
 wrote:

clearly a big win on majority
of workloads, I think we still need to investigate different workloads
on different hardware to ensure there is no regression.


Definitely. Which workloads are you thinking of? I can think of those
affected facets: snapshot speed, commit speed with writes, connection
establishment, prepared transaction speed. All in the small and large
connection count cases.

I did measurements on all of those but prepared xacts, fwiw. That
definitely needs to be measured, due to the locking changes around
procarrayaddd/remove.

I don't think regressions besides perhaps 2pc are likely - there's
nothing really getting more expensive but procarray add/remove.



If I get some instructions what tests to do, I can run a bunch of tests
on my machinees (not the largest boxes, but at least something). I don't
have the bandwidth to come up with tests on my own, at the moment.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Potential (low likelihood) wraparound hazard in heap_abort_speculative()

2020-03-29 Thread Andres Freund
Hi,

On 2020-03-29 15:20:01 -0700, Peter Geoghegan wrote:
> On Sat, Mar 28, 2020 at 2:30 PM Andres Freund  wrote:
> > Unless somebody has a better idea for how to solve this in a
> > back-paptchable way, I think it'd be best to replace RecentGlobalXmin
> > with RecentXmin. That'd be safe as far as I can see.
> 
> As far as I can tell, the worst consequence of this wraparound hazard
> is that we don't opportunistically prune at some later point where we
> probably ought to. Do you agree with that assessment?

Probably, yes.


> Since pd_prune_xid is documented as "a hint field" in bufpage.h, this
> bug cannot possibly lead to queries that give wrong answers. The
> performance issue also seems like it should not have much impact,
> since we only call heap_abort_speculative() in extreme cases where
> there is a lot of contention among concurrent upserting sessions.

Well, I think it could be fairly "persistent" in being set in some
cases. PageSetPrunable() and heap_prune_record_prunable() check that a
new prune xid is newer than the current one.

That said, I still think it's unlikely to be really problematic.



> Also, as you pointed out already, RecentGlobalXmin is probably not
> going to be any different to RecentXmin.

Huh, I think they very commonly are radically different? Where did I
point that out? RecentXmin is the xmin of the last snapshot
computed. Whereas RecentGlobalXmin basically is the oldest xmin of any
backend. That's a pretty large difference? Especially with longrunning
sessions, replication, logical decoding those can be different by
hundreds of millions of xids.

Where did I point out that they're not going to be very different?


> I am in favor of fixing the issue, and backpatching all the way. I
> just want to put the issue in perspective, and have my own
> understanding of things verified.

Cool.

Greetings,

Andres Freund




Re: Potential (low likelihood) wraparound hazard in heap_abort_speculative()

2020-03-29 Thread Peter Geoghegan
On Sat, Mar 28, 2020 at 2:30 PM Andres Freund  wrote:
> Unless somebody has a better idea for how to solve this in a
> back-paptchable way, I think it'd be best to replace RecentGlobalXmin
> with RecentXmin. That'd be safe as far as I can see.

As far as I can tell, the worst consequence of this wraparound hazard
is that we don't opportunistically prune at some later point where we
probably ought to. Do you agree with that assessment?

Since pd_prune_xid is documented as "a hint field" in bufpage.h, this
bug cannot possibly lead to queries that give wrong answers. The
performance issue also seems like it should not have much impact,
since we only call heap_abort_speculative() in extreme cases where
there is a lot of contention among concurrent upserting sessions.
Also, as you pointed out already, RecentGlobalXmin is probably not
going to be any different to RecentXmin.

I am in favor of fixing the issue, and backpatching all the way. I
just want to put the issue in perspective, and have my own
understanding of things verified.

-- 
Peter Geoghegan




Re: GSoC Proposal

2020-03-29 Thread Chapman Flack
On 03/29/20 15:18, Kartik Ohri wrote:
> Hi! I have shared the proposal with him directly. But as I was made aware
> that the project's prospective mentors are not at much liberty to give
> feedback on the proposal due to mentor guidelines, I wished if someone else

I'll add to that in case I could have misunderstood the mentor guidelines;
we have been in some communication about the parameters of the project,
but I was not sure how much involvement I should have in the preparation
of the proposal itself.

Is that in line with how other mentors for PostgreSQL have approached it?

Regards,
-Chap




Creating a database with a non-predefined collation

2020-03-29 Thread Shay Rojansky
Greetings hackers.

While working on first-class support for PG collations in the Entity
Framework Core ORM, I've come across an interesting problem: it doesn't
seem to be possible to create a database with a collation that isn't
predefined, and there doesn't seem to be a way to add to that list. I'm
specifically looking into creating a database with an ICU collation.

I've seen the discussion in
https://www.postgresql.org/message-id/flat/99faa8eb-9de2-8ec0-0a25-1ad1276167cc%402ndquadrant.com,
though to my understanding, that is about importing ICU rather than libc
predefined collations, and not adding an arbitrary collation to the list
from which databases can be created.

This seems particularly problematic as a database collation cannot be
altered once created, leading to an odd chicken-and-egg problem. My initial
expectation was for collations in the template database to be taken into
account, but that doesn't seem to be the case.

Finally, just a word to say that better support for non-deterministic
collations would be greatly appreciated - specifically LIKE support (though
I realize that isn't trivial). At the moment their actual usefulness seems
somewhat limited because of this.

Thanks,

Shay


Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-29 Thread Justin Pryzby
On Sun, Mar 29, 2020 at 01:22:04PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Sun, Mar 29, 2020 at 12:37:05PM -0400, Tom Lane wrote:
> >> After looking at the callers of pg_ls_dir_files, and noticing that
> >> it's already defined to ignore anything that's not a regular file,
> >> I think switching to lstat makes sense.
> 
> > Yea, only pg_ls_dir() shows special file types (and currently the others 
> > even
> > hide dirs).
> 
> > The essence of your patch is to ignore ENOENT, but you also changed to use
> > lstat(), which seems unrelated.  That means we'll now hide (non-broken)
> > symlinks.  Is that intentional/needed ?
> 
> Well, the following comment says "ignore anything but regular files",
> so I'm supposing that that is the behavior that we actually want here
> and failed to implement correctly.  There might be scope for
> additional directory-reading functions, but I'd think you'd want
> more information (such as the file type) returned from anything
> that doesn't act this way.

Maybe pg_stat_file() deserves similar attention ?  Right now, it'll fail on a
broken link.  If we changed it to lstat(), then it'd work, but it'd also show
metadata for the *link* rather than its target.

Patch proposed as v14-0001 patch here may be relevant:
https://www.postgresql.org/message-id/20200317190401.GY26184%40telsasoft.com
-indicating if it is a directory.  Typical usages include:
+indicating if it is a directory (or a symbolic link to a directory).
...

> In practice, since these directories shouldn't contain symlinks,
> it's likely moot.  The only place in PG data directories where
> we actually expect symlinks is pg_tablespace ... and that contains
> symlinks to directories, so that this function would ignore them
> anyway.

I wouldn't hesitate to make symlinks, at least in log.  It's surprising when
files are hidden, but I won't argue about the best behavior here.

I'm thinking of distributions or local configurations that use
/var/log/postgresql.  I didn't remember or didn't realize, but it looks like
debian's packages use logging_collector=off and then launch postmaster with
2> /var/log/postgres/...  It seems reasonable to do something like:
log/huge-querylog.csv => /zfs/compressed/...

-- 
Justin




GSoC Proposal

2020-03-29 Thread Kartik Ohri
Hi! I have shared the proposal with him directly. But as I was made aware
that the project's prospective mentors are not at much liberty to give
feedback on the proposal due to mentor guidelines, I wished if someone else
could give some feedback (even some general feedback for submitting a
proposal to PostgreSQL organization would be appreciated).

On Mon, Mar 30, 2020, 12:32 AM Stephen Frost  wrote:

> Greetings,
>
> * Kartik Ohri (kartikohr...@gmail.com) wrote:
> > Hi! Can I get some review on my GSoC proposal ?
> >
> >
> https://docs.google.com/document/d/1EiIHZjOjf6yWfGzKeHCbu8bJ6K1tCEcmPsD3i8lPXbg/edit?usp=sharing
> > <
> https://docs.google.com/document/d/1EiIHZjOjf6yWfGzKeHCbu8bJ6K1tCEcmPsD3i8lPXbg/edit?usp=sharing
> >
>
> I recommend you chat with the mentor who is listed for the GSoC Project
> you're writing a proposal for directly.  Chip may see it here also, but
> best to make sure.
>
> Thanks!
>
> Stephen
>


Re: GSoC Proposal

2020-03-29 Thread Stephen Frost
Greetings,

* Kartik Ohri (kartikohr...@gmail.com) wrote:
> Hi! Can I get some review on my GSoC proposal ?
> 
> https://docs.google.com/document/d/1EiIHZjOjf6yWfGzKeHCbu8bJ6K1tCEcmPsD3i8lPXbg/edit?usp=sharing
> 

I recommend you chat with the mentor who is listed for the GSoC Project
you're writing a proposal for directly.  Chip may see it here also, but
best to make sure.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [GSoC 2020] applicant proposal v2, Denis Volkov

2020-03-29 Thread Stephen Frost
Greetings,

* Denis Volkov (volkov.denis@yandex.ru) wrote:
> I want to apply to GSoC and this is my proposal draft. Please give me a 
> feedback.

Great, thanks!  I'd suggest you reach out to the mentors listed for this
proposal directly also to make sure they see your interest, and to chat
with them regarding your proposal.

Thanks again,

Stephen


signature.asc
Description: PGP signature


Re: Improving connection scalability: GetSnapshotData()

2020-03-29 Thread Andres Freund
Hi, 

On March 29, 2020 11:24:32 AM PDT, Alexander Korotkov 
 wrote:
> clearly a big win on majority
>of workloads, I think we still need to investigate different workloads
>on different hardware to ensure there is no regression.

Definitely. Which workloads are you thinking of? I can think of those affected 
facets: snapshot speed, commit speed with writes, connection establishment, 
prepared transaction speed. All in the small and large connection count cases.

I did measurements on all of those but prepared xacts, fwiw. That definitely 
needs to be measured, due to the locking changes around procarrayaddd/remove.

I don't think regressions besides perhaps 2pc are likely - there's nothing 
really getting more expensive but procarray add/remove.


Andres

Regards,

Andres


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Improving connection scalability: GetSnapshotData()

2020-03-29 Thread Alexander Korotkov
On Sun, Mar 29, 2020 at 4:40 AM Peter Geoghegan  wrote:
> On Sun, Mar 1, 2020 at 12:36 AM Andres Freund  wrote:
> > The workload is a pgbench readonly, with pgbench -M prepared -c $conns
> > -j $conns -S -n for each client count.  This is on a machine with 2
> > Intel(R) Xeon(R) Platinum 8168, but virtualized.
> >
> > conns   tps master  tps pgxact-split
> >
> > 1   26842.49284526524.194821
> > 10  246923.158682   249224.782661
> > 50  695956.539704   709833.746374
> > 100 1054727.043139  1903616.306028
> > 200 964795.282957   1949200.338012
> > 300 906029.377539   1927881.231478
> > 400 845696.690912   1911065.369776
> > 500 812295.222497   1926237.255856
> > 600 888030.104213   1903047.236273
> > 700 866896.532490   1886537.202142
> > 800 863407.341506   1883768.592610
> > 900 871386.608563   1874638.012128
> > 1000887668.277133   1876402.391502
> > 1500860051.361395   1815103.564241
> > 2000890900.098657   1775435.271018
> > 3000874184.980039   1653953.817997
> > 4000845023.080703   1582582.316043
> > 5000817100.195728   1512260.802371
> >
> > I think these are pretty nice results.
>
> This scalability improvement is clearly very significant. There is
> little question that this is a strategically important enhancement for
> the Postgres project in general. I hope that you will ultimately be
> able to commit the patchset before feature freeze.

+1, this is really very cool results.

Despite this patchset is expected to be clearly a big win on majority
of workloads, I think we still need to investigate different workloads
on different hardware to ensure there is no regression.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Can we get rid of GetLocaleInfoEx() yet?

2020-03-29 Thread Juan José Santamaría Flecha
On Sun, Mar 29, 2020 at 7:00 PM Tom Lane  wrote:

>
> In general, I think the problem is that we might be dealing with a
> Unix-style locale string, in which the encoding name might be quite
> a few other things besides "utf8".  But actually your patch works
> for that too, since what's going to happen next is we'll search the
> encoding_match_list[] for a match.  I do suggest being a bit more
> paranoid about what's a codepage number though, as attached.
> (Untested, since I lack a Windows environment, but it's pretty
> straightforward code.)
>

It works for the issue just fine, and more comments make a better a
patch, so no objections from me.

Regards,

Juan José Santamaría Flecha


[GSoC 2020] applicant proposal v2, Denis Volkov

2020-03-29 Thread Denis Volkov
Hi,
I want to apply to GSoC and this is my proposal draft. Please give me a 
feedback.
---
I am a 4-year undergraduate CS student at Ural Federal University. This fall I 
met PostgreSQL and posted the first PR to WAL-G. I liked it so much that I 
decided to do the graduate work related to WAL-G - to implement backup(s) 
copying from one store to another (while WIP, I'm waiting for a review). I 
liked writing on golang, sorting out what works, and then I accidentally found 
out about GSoC and decided to give it a try.

I saw a list of suggested problems at 
https://wiki.postgresql.org/wiki/GSoC_2020#WAL-G_performance_improvements_.282020.29
 and I was very interested in WAL's history consistency check and page checksum 
verification features.

In mid-June, I will have a defense of a diploma thesis for a bachelor’s degree 
(this should not interfere with GSoC), at the end of June graduation (and this 
too) and in mid-August, I will take exams for admission to the magistracy (this 
should not be reflected).

I want to learn new things, write cool features and improve PostgresQL.
---

Regards,
Denis Volkov




[GSoC 2020] applicant proposal, Volkov Denis

2020-03-29 Thread Denis Volkov
Hi,I want to apply to GSoC and this is my proposal draft. Please give me a feedback.---I am a 4-year undergraduate CS student at Ural Federal University. This fall I met PostgreSQL and posted the first PR to WAL-G. I liked it so much that I decided to do the graduate work related to WAL-G - to implement backup(s) copying from one store to another (while WIP, I'm waiting for a review). I liked writing on golang, sorting out what works, and then I accidentally found out about GSoC and decided to give it a try. I saw a list of suggested problems at https://wiki.postgresql.org/wiki/GSoC_2020#WAL-G_performance_improvements_.282020.29 and I was very interested in WAL's history consistency check and page checksum verification features. In mid-June, I will have a defense of a diploma thesis for a bachelor’s degree (this should not interfere with GSoC), at the end of June graduation (and this too) and in mid-August, I will take exams for admission to the magistracy (this should not be reflected). I want to learn new things, write cool features and improve Postgres.--- Regards,Volkov Denis

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-29 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Mar 29, 2020 at 12:37:05PM -0400, Tom Lane wrote:
>> After looking at the callers of pg_ls_dir_files, and noticing that
>> it's already defined to ignore anything that's not a regular file,
>> I think switching to lstat makes sense.

> Yea, only pg_ls_dir() shows special file types (and currently the others even
> hide dirs).

> The essence of your patch is to ignore ENOENT, but you also changed to use
> lstat(), which seems unrelated.  That means we'll now hide (non-broken)
> symlinks.  Is that intentional/needed ?

Well, the following comment says "ignore anything but regular files",
so I'm supposing that that is the behavior that we actually want here
and failed to implement correctly.  There might be scope for
additional directory-reading functions, but I'd think you'd want
more information (such as the file type) returned from anything
that doesn't act this way.

In practice, since these directories shouldn't contain symlinks,
it's likely moot.  The only place in PG data directories where
we actually expect symlinks is pg_tablespace ... and that contains
symlinks to directories, so that this function would ignore them
anyway.

regards, tom lane




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-29 Thread Justin Pryzby
On Sun, Mar 29, 2020 at 12:37:05PM -0400, Tom Lane wrote:
> I wrote:
> > Justin Pryzby  writes:
> >> Maybe we should lstat() the file to determine if it's a dangling link; if
> >> lstat() fails, then skip it.  Currently, we use stat(), which shows 
> >> metdata of
> >> a link's *target*.  Maybe we'd change that.
> 
> > Hm, good point that ENOENT could refer to a symlink's target.  Still,
> > I'm not sure it's worth going out of our way to disambiguate that,
> > given that these directories aren't really supposed to contain symlinks.
> > (And on the third hand, if they aren't supposed to, then maybe these
> > functions needn't look through any symlinks?  In which case just
> > substituting lstat for stat would resolve the ambiguity.)
> 
> After looking at the callers of pg_ls_dir_files, and noticing that
> it's already defined to ignore anything that's not a regular file,
> I think switching to lstat makes sense.

Yea, only pg_ls_dir() shows special file types (and currently the others even
hide dirs).

The essence of your patch is to ignore ENOENT, but you also changed to use
lstat(), which seems unrelated.  That means we'll now hide (non-broken)
symlinks.  Is that intentional/needed ?  I guess maybe you're trying to fix the
bug (?) that symlinks aren't skipped?  If so, I guess it should be a separate
commit, or the commit message should say so.  I think the doc update is already
handled by: 8b6d94cf6c8319bfd6bebf8b863a5db586c19c3b (we didn't used to say we
skipped specials, and now we say we do, and we'll to follow through RSN and
actually do it, too).

> diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
> index 01185f2..8429a12 100644
> --- a/src/backend/utils/adt/genfile.c
> +++ b/src/backend/utils/adt/genfile.c
> @@ -596,10 +596,15 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char 
> *dir, bool missing_ok)
>  
>   /* Get the file info */
>   snprintf(path, sizeof(path), "%s/%s", dir, de->d_name);
> - if (stat(path, ) < 0)
> + if (lstat(path, ) < 0)
> + {
> + /* Ignore concurrently-deleted files, else complain */
> + if (errno == ENOENT)
> + continue;
>   ereport(ERROR,
>   (errcode_for_file_access(),
>errmsg("could not stat file \"%s\": 
> %m", path)));
> + }
>  
>   /* Ignore anything but regular files */
>   if (!S_ISREG(attrib.st_mode))





Re: Can we get rid of GetLocaleInfoEx() yet?

2020-03-29 Thread Tom Lane
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?=  
writes:
> On Sun, Mar 29, 2020 at 3:29 AM Tom Lane  wrote:
>> The reason for the hack, per the comments, is that VS2015
>> omits a codepage field from the result of _create_locale();
>> and some optimism is expressed therein that Microsoft might
>> undo that oversight in future.  Has this been fixed in more
>> recent VS versions?  If not, can we find another, more robust
>> way to do it?

> While working on another issue I have seen this issue reproduce in VS2019.
> So no, it has not been fixed.

Oh well, I figured that was too optimistic :-(

> Please find attached a patch that provides a better detection of the "uft8"
> cases.

In general, I think the problem is that we might be dealing with a
Unix-style locale string, in which the encoding name might be quite
a few other things besides "utf8".  But actually your patch works
for that too, since what's going to happen next is we'll search the
encoding_match_list[] for a match.  I do suggest being a bit more
paranoid about what's a codepage number though, as attached.
(Untested, since I lack a Windows environment, but it's pretty
straightforward code.)

regards, tom lane

diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index c9c680f..9e3c6db 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -239,25 +239,44 @@ win32_langinfo(const char *ctype)
 	{
 		r = malloc(16);			/* excess */
 		if (r != NULL)
-			sprintf(r, "CP%u", cp);
+		{
+			/*
+			 * If the return value is CP_ACP that means no ANSI code page is
+			 * available, so only Unicode can be used for the locale.
+			 */
+			if (cp == CP_ACP)
+strcpy(r, "utf8");
+			else
+sprintf(r, "CP%u", cp);
+		}
 	}
 	else
 #endif
 	{
 		/*
-		 * Locale format on Win32 is _. . For
-		 * example, English_United States.1252.
+		 * Locale format on Win32 is _..  For
+		 * example, English_United States.1252.  If we see digits after the
+		 * last dot, assume it's a codepage number.  Otherwise, we might be
+		 * dealing with a Unix-style locale string; Windows' setlocale() will
+		 * take those even though GetLocaleInfoEx() won't, so we end up here.
+		 * In that case, just return what's after the last dot and hope we can
+		 * find it in our table.
 		 */
 		codepage = strrchr(ctype, '.');
 		if (codepage != NULL)
 		{
-			int			ln;
+			size_t		ln;
 
 			codepage++;
 			ln = strlen(codepage);
 			r = malloc(ln + 3);
 			if (r != NULL)
-sprintf(r, "CP%s", codepage);
+			{
+if (strspn(codepage, "0123456789") == ln)
+	sprintf(r, "CP%s", codepage);
+else
+	strcpy(r, codepage);
+			}
 		}
 
 	}


Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-29 Thread Tom Lane
I wrote:
> Justin Pryzby  writes:
>> Maybe we should lstat() the file to determine if it's a dangling link; if
>> lstat() fails, then skip it.  Currently, we use stat(), which shows metdata 
>> of
>> a link's *target*.  Maybe we'd change that.

> Hm, good point that ENOENT could refer to a symlink's target.  Still,
> I'm not sure it's worth going out of our way to disambiguate that,
> given that these directories aren't really supposed to contain symlinks.
> (And on the third hand, if they aren't supposed to, then maybe these
> functions needn't look through any symlinks?  In which case just
> substituting lstat for stat would resolve the ambiguity.)

After looking at the callers of pg_ls_dir_files, and noticing that
it's already defined to ignore anything that's not a regular file,
I think switching to lstat makes sense.

I also grepped the other uses of ReadDir[Extended], and didn't see
any other ones that seemed desperately in need of changing.

So the attached seems like a sufficient fix.

regards, tom lane

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 01185f2..8429a12 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -596,10 +596,15 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 
 		/* Get the file info */
 		snprintf(path, sizeof(path), "%s/%s", dir, de->d_name);
-		if (stat(path, ) < 0)
+		if (lstat(path, ) < 0)
+		{
+			/* Ignore concurrently-deleted files, else complain */
+			if (errno == ENOENT)
+continue;
 			ereport(ERROR,
 	(errcode_for_file_access(),
 	 errmsg("could not stat file \"%s\": %m", path)));
+		}
 
 		/* Ignore anything but regular files */
 		if (!S_ISREG(attrib.st_mode))


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-03-29 Thread Tomas Vondra

On Sun, Mar 29, 2020 at 11:19:21AM +0530, Amit Kapila wrote:

On Sun, Mar 29, 2020 at 6:29 AM Tomas Vondra
 wrote:


On Sat, Mar 28, 2020 at 03:29:34PM +0530, Amit Kapila wrote:
>On Sat, Mar 28, 2020 at 2:19 PM Dilip Kumar  wrote:
>
>How about if instead of writing an XLOG_XACT_ASSIGNMENT WAL, we set a
>flag in TransactionStateData and then log that as special information
>whenever we write next WAL record for a new subtransaction?  Then
>during recovery, we can only call ProcArrayApplyXidAssignment when we
>find that special flag is set in a WAL record.  One idea could be to
>use a flag bit in XLogRecord.xl_info.  If that is feasible then the
>solution can work as it is now, without any overhead or change in the
>way we maintain KnownAssignedXids.
>

Ummm, how is that different from what the patch is doing now? I mean, we
only write the top-level XID for the first WAL record in each subxact,
right? Or what would be the difference with your approach?



We have to do what the patch is currently doing and additionally, we
will set this flag after PGPROC_MAX_CACHED_SUBXIDS which would allow
us to call ProcArrayApplyXidAssignment during WAL replay only after
PGPROC_MAX_CACHED_SUBXIDS number of subxacts.  It will help us in
clearing the KnownAssignedXids at the same time as we do now, so no
additional performance overhead.



Hmmm. So we'd still log assignment twice? Or would we keep just the
immediate assignments (embedded into xlog records), and cache the
subxids on the replica somehow?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-29 Thread Tomas Vondra

On Sun, Mar 29, 2020 at 10:22:25AM +0100, Dean Rasheed wrote:

On Sat, 28 Mar 2020 at 13:18, Dean Rasheed  wrote:


OK, I've pushed that with your recommendation for that function name.



Does this now complete everything that you wanted to do for functional
dependency stats for PG13? Re-reading the thread, I couldn't see
anything else that needed looking at. If that's the case, the CF entry
can be closed.



Yes. There were two improvements proposed, we've committed one of them
(the IN/ANY operator handling) and the other (containment) needs more
discussion. So I think it's OK to mark this either as committed or maybe
returned with feedback.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Add support for ON UPDATE/DELETE actions on ALTER CONSTRAINT

2020-03-29 Thread Matheus de Oliveira
Hi All,

I've took some time today to rebase the patch with master. Follows attached.

I'm still not sure if the chosen path is the best way. But I'd be glad to
follow any directions we all see fit.

For now, this patch applies two methods:
1. Changes full constraint definition (which keeps compatibility with
current ALTER CONSTRAINT):
ALTER CONSTRAINT [] [] []
2. Changes only the subset explicit seem in the command (a new way, I've
chosen to just add SET in the middle, similar to `ALTER COLUMN ... SET
{DEFAULT | NOT NULL}` ):
ALTER CONSTRAINT SET [] [] []

I'm OK with changing the approach, we just need to chose the color :D

I believe this is a small change in source code, but with huge impact for
users with big tables. Would be great if it could go in PG 13.

Best regards,
-- 
Matheus de Oliveira
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index e486196477..6a51014b6f 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -56,7 +56,9 @@ ALTER TABLE [ IF EXISTS ] name
 ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
 ADD table_constraint [ NOT VALID ]
 ADD table_constraint_using_index
-ALTER CONSTRAINT constraint_name [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+ALTER CONSTRAINT constraint_name
+  [SET] [ ON DELETE action ] [ ON UPDATE action ]
+  [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 VALIDATE CONSTRAINT constraint_name
 DROP CONSTRAINT [ IF EXISTS ]  constraint_name [ RESTRICT | CASCADE ]
 DISABLE TRIGGER [ trigger_name | ALL | USER ]
@@ -488,6 +490,14 @@ WITH ( MODULUS numeric_literal, REM
   This form alters the attributes of a constraint that was previously
   created. Currently only foreign key constraints may be altered.
  
+ 
+  If SET keyword is ommitted, the full constraint
+  definition is changed, meaning that every option mentioned is set
+  accordingly and unmentioned options are set as default built-in values,
+  just like ADD CONSTRAINT would do, see definition of
+  default values on . With
+  SET keyword, only mentioned attributes are changed.
+ 
 

 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8e35c5bd1a..70fdea680e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9509,8 +9509,43 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
  errmsg("constraint \"%s\" of relation \"%s\" is not a foreign key constraint",
 		cmdcon->conname, RelationGetRelationName(rel;
 
+	/*
+	 * Verify for FKCONSTR_ACTION_UNKNOWN, if found, replace by current
+	 * action. We could handle FKCONSTR_ACTION_UNKNOWN bellow, but since we
+	 * already have to handle the case of changing to the same action, seems
+	 * simpler to simple replace FKCONSTR_ACTION_UNKNOWN by the current action
+	 * here.
+	 */
+	if (cmdcon->fk_del_action == FKCONSTR_ACTION_UNKNOWN)
+		cmdcon->fk_del_action = currcon->confdeltype;
+
+	if (cmdcon->fk_upd_action == FKCONSTR_ACTION_UNKNOWN)
+		cmdcon->fk_upd_action = currcon->confupdtype;
+
+	/*
+	 * Do the same for deferrable attributes. But consider that if changed
+	 * only initdeferred attribute and to true, force deferrable to be also
+	 * true. On the other hand, if changed only deferrable attribute and to
+	 * false, force initdeferred to be also false.
+	 */
+	if (!cmdcon->was_deferrable_set)
+		cmdcon->deferrable = cmdcon->initdeferred ? true : currcon->condeferrable;
+
+	if (!cmdcon->was_initdeferred_set)
+		cmdcon->initdeferred = !cmdcon->deferrable ? false : currcon->condeferred;
+
+	/*
+	 * This is a safe check only, should never happen here.
+	 */
+	if (cmdcon->initdeferred && !cmdcon->deferrable)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("constraint declared INITIALLY DEFERRED must be DEFERRABLE")));
+
 	if (currcon->condeferrable != cmdcon->deferrable ||
-		currcon->condeferred != cmdcon->initdeferred)
+		currcon->condeferred != cmdcon->initdeferred ||
+		currcon->confdeltype != cmdcon->fk_del_action ||
+		currcon->confupdtype != cmdcon->fk_upd_action)
 	{
 		HeapTuple	copyTuple;
 		HeapTuple	tgtuple;
@@ -9528,6 +9563,8 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
 		copy_con = (Form_pg_constraint) GETSTRUCT(copyTuple);
 		copy_con->condeferrable = cmdcon->deferrable;
 		copy_con->condeferred = cmdcon->initdeferred;
+		copy_con->confdeltype = cmdcon->fk_del_action;
+		copy_con->confupdtype = cmdcon->fk_upd_action;
 		CatalogTupleUpdate(conrel, >t_self, copyTuple);
 
 		InvokeObjectPostAlterHook(ConstraintRelationId,
@@ -9564,23 +9601,106 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
 otherrelids = list_append_unique_oid(otherrelids,
 	 tgform->tgrelid);
 
-			/*
-			 * Update deferrability of RI_FKey_noaction_del,
-			 * 

Re: color by default

2020-03-29 Thread Juan José Santamaría Flecha
On Sun, Mar 29, 2020 at 11:56 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-03-24 15:34, Juan José Santamaría Flecha wrote:
> > I think there is also some value in adding the functionality proposed in
> > terminal_supports_color().
>
> What do you want to do with that functionality?


Add it to the tests done when PG_COLOR is "auto".

Regards


Re: WAL usage calculation patch

2020-03-29 Thread Julien Rouhaud
Hi Amit,

Sorry I just noticed your mail.

On Sun, Mar 29, 2020 at 05:12:16PM +0530, Amit Kapila wrote:
> On Sun, Mar 29, 2020 at 1:26 PM Julien Rouhaud  wrote:
> >
> > I'm not sure that I get your point.  I'm assuming that you meant
> > parallel-read-only queries, but surely buffer usage infrastructure for
> > parallel query relies on the same approach as non-parallel one (each node
> > computes the process-local pgBufferUsage diff) and sums all of that at the 
> > end
> > of the parallel query execution.  I also don't see how whether the query is
> > read-only or not is relevant here as far as instrumentation is concerned,
> > especially since read-only query can definitely do writes and increase the
> > count of dirtied buffers, like a write query would.  For instance a hint
> > bit change can be done in a parallel query AFAIK, and this can generate WAL
> > records in wal_log_hints is enabled, so that's probably one way to test it.
> >
> 
> Yeah, that way we can test it.  Can you try that?
> 
> > I now think that not adding support for WAL buffers in EXPLAIN output in the
> > initial patch scope was a mistake, as this is probably the best way to test 
> > the
> > WAL counters for parallel queries.  This shouldn't be hard to add though, 
> > and I
> > can work on it quickly if there's still a chance to get this feature 
> > included
> > in pg13.
> >
> 
> I am not sure we will add it in Explain or not (maybe we need inputs
> from others in this regard), but if it helps in testing this part of
> the patch, then it is a good idea to write a patch for it.  You might
> want to keep it separate from the main patch as we might not commit
> it.

As I just wrote in [1] that's exactly what I did.  Using parallel query and
concurrent update on a table I could see that WAL usage for parallel query
seems to be working as one could expect.

> Sure, I am fine with that but I am not sure if it is a good idea to
> commit this patch without having a way to compute WAL utilization for
> those commands.

I'm generally fine with waiting for a fix for the existing issue to be
committed.  But as the feature freeze is approaching, I hope that it won't mean
postponing this feature to v14 because a related 2yo bug has just been
discovered, as it would seem a bit unfair.




Re: WAL usage calculation patch

2020-03-29 Thread Julien Rouhaud
On Mon, Mar 23, 2020 at 11:24:50PM +0900, Fujii Masao wrote:
> 
> > Here are the comments for 0001 patch.
> > 
> > +    /*
> > + * Report a full page image constructed for the WAL record
> > + */
> > +    pgWalUsage.wal_fp_records++;
> > 
> > Isn't it better to use "fpw" or "fpi" for the variable name rather than
> > "fp" here? In other places, "fpw" and "fpi" are used for full page
> > writes/image.

Agreed, I went with fpw.

> > ISTM that this counter could be incorrect if XLogInsertRecord() determines 
> > to
> > calculate again whether FPI is necessary or not. No? IOW, this issue could
> > happen if XLogInsert() calls  XLogRecordAssemble() multiple times in
> > its do-while loop. Isn't this problematic?

Yes probably.  I also see while adding support for EXPLAIN/auto_explain that
the previous approach was incrementing both records and fpw_records, while it
should be only one of those for each record.  I fixed this using the approach I
previously mentionned in [1] which seems to work just fine.

> > +    long    wal_bytes;    /* size of wal records produced */
> > 
> > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
> > rather than long?

Yes indeed.  I switched to uint64, and modified everything accordingly (and
changed pgss to output numeric as there's no other way to handle unsigned int8)

> > +    shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
> > 
> > bufusage_space should be walusage_space here?

Good catch, fixed.

> > /*
> >   * Finish parallel execution.  We wait for parallel workers to finish, and
> >   * accumulate their buffer usage.
> >   */
> > 
> > There are some comments mentioning buffer usage, in execParallel.c.
> > For example, the top comment for ExecParallelFinish(), as the above.
> > These should be updated.

I went through all the file and quickly checked in other places, and I think I
fixed all required comments.

> Here are the comments for 0002 patch.
> 
> +OUT wal_write_bytes int8,
> +OUT wal_write_records int8,
> +OUT wal_write_fp_records int8
> 
> Isn't "write" part in the column names confusing because it's WAL
> *generated* (not written) by the statement?

Agreed, I simply dropped the "_write" part everywhere.

> +RETURNS SETOF record
> +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
> +LANGUAGE C STRICT VOLATILE;
> 
> PARALLEL SAFE should be specified?

Indeed, fixed.

> +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
> 
> ISTM it's good timing to have also pg_stat_statements--1.8.sql since
> the definition of pg_stat_statements() is changed. Thought?

As mentionned in other pgss thread, I think the general agreement is to never
provide full script anymore, so I didn't changed that.

> +-- CHECKPOINT before WAL tests to ensure test stability
> +CHECKPOINT;
> 
> Is this true? I thought you added this because the number of FPI
> should be larger than zero in the subsequent test. No? But there
> seems no such test. I'm not excited about adding the test checking
> the number of FPI because it looks fragile, though...

It should ensure a FPW for each new block touch, but yes that's quite fragile.

Since I fixed the record / FPW record counters, I saw that this was actually
already broken as there was a mix of FPW and non-FPW, so I dropped the
checkpoint and just tested (wal_record + wal_fpw_record) instead.

> +UPDATE pgss_test SET b = '333' WHERE a = 3 \;
> +UPDATE pgss_test SET b = '444' WHERE a = 4 ;
> 
> Could you tell me why several queries need to be run to test
> the WAL usage? Isn't running a few query enough for the test purpase?

As far as I can see it's used to test multiple scenario (single command /
multiple commands in or outside explicit transaction).  It shouldn't add a lot
of overhead and since some commands are issues with "\;" it's also testing
proper query string isolation when multi-command query string is provided,
which doesn't seem like a bad idea.  I didn't changed that but I'm not opposed
to remove some of the updates if needed.

Also, to answer Amit Kapila's comments about WAL records and parallel query, I
added support for both EXPLAIN and auto_explain (tab completion and
documentation are also updated), and using a simple table with an index, with
forced parallelism and no leader participation and concurrent update on the
same table, I could test that WAL usage is working as expected:

rjuju=# explain (analyze, wal, verbose) select * from t1;
  QUERY PLAN
---
 Gather  (cost=0.00..8805.05 rows=100010 width=14) (actual time=8.695..47.592 
rows=100010 loops=1)
   Output: id, val
   Workers Planned: 2
   Workers Launched: 2
   WAL: records=204 bytes=86198
   ->  Parallel Seq Scan on public.t1  (cost=0.00..8805.05 rows=50005 width=14) 
(actual 

Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-03-29 Thread Masahiko Sawada
On Sun, 29 Mar 2020 at 20:15, Amit Kapila  wrote:
>
> On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud  wrote:
> >
> > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
> >  wrote:
> > >
> > > I've run vacuum with/without parallel workers on the table having 5
> > > indexes. The vacuum reads all blocks of table and indexes.
> > >
> > > * VACUUM command with no parallel workers
> > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > query ~ 'vacuum';
> > >
> > >   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > shared_blks_dirtied | shared_blks_written
> > > --+-+--+-+-+-
> > >  19857.217207 |   45238 |   226944 |  272182 |
> > >  225943 |  225894
> > > (1 row)
> > >
> > > * VACUUM command with 4 parallel workers
> > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > query ~ 'vacuum';
> > >
> > >  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > shared_blks_dirtied | shared_blks_written
> > > -+-+--+-+-+-
> > >  6932.117365 |   45205 |73079 |  118284 |
> > >  72403 |   72365
> > > (1 row)
> > >
> > > The total number of blocks of table and indexes are about 182243
> > > blocks. As Julien reported, obviously the total number of read blocks
> > > during parallel vacuum is much less than single process vacuum's
> > > result.
> > >
> > > Parallel create index has the same issue but it doesn't exist in
> > > parallel queries for SELECTs.
> > >
> > > I think we need to change parallel maintenance commands so that they
> > > report buffer usage like what ParallelQueryMain() does; prepare to
> > > track buffer usage during query execution by
> > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> > > after parallel maintenance command. To report buffer usage of parallel
> > > maintenance command correctly, I'm thinking that we can (1) change
> > > parallel create index and parallel vacuum so that they prepare
> > > gathering buffer usage, or (2) have a common entry point for parallel
> > > maintenance commands that is responsible for gathering buffer usage
> > > and calling the entry functions for individual maintenance command.
> > > I'll investigate it more in depth.
> >
> > As I just mentioned, (2) seems like a better design as it's quite
> > likely that the number of parallel-aware utilities will probably
> > continue to increase.  One problem also is that parallel CREATE INDEX
> > has been introduced in pg11, so (2) probably won't be packpatchable
> > (and (1) seems problematic too).
> >
>
> I am not sure if we can decide at this stage whether it is
> back-patchable or not.  Let's first see the patch and if it turns out
> to be complex, then we can try to do some straight-forward fix for
> back-branches.

Agreed.

> In general, I don't see why the fix here should be
> complex?

Yeah, particularly the approach (1) will not be complex. I'll write a
patch tomorrow.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: WAL usage calculation patch

2020-03-29 Thread Amit Kapila
On Sun, Mar 29, 2020 at 1:26 PM Julien Rouhaud  wrote:
>
> I'm not sure that I get your point.  I'm assuming that you meant
> parallel-read-only queries, but surely buffer usage infrastructure for
> parallel query relies on the same approach as non-parallel one (each node
> computes the process-local pgBufferUsage diff) and sums all of that at the end
> of the parallel query execution.  I also don't see how whether the query is
> read-only or not is relevant here as far as instrumentation is concerned,
> especially since read-only query can definitely do writes and increase the
> count of dirtied buffers, like a write query would.  For instance a hint
> bit change can be done in a parallel query AFAIK, and this can generate WAL
> records in wal_log_hints is enabled, so that's probably one way to test it.
>

Yeah, that way we can test it.  Can you try that?

> I now think that not adding support for WAL buffers in EXPLAIN output in the
> initial patch scope was a mistake, as this is probably the best way to test 
> the
> WAL counters for parallel queries.  This shouldn't be hard to add though, and 
> I
> can work on it quickly if there's still a chance to get this feature included
> in pg13.
>

I am not sure we will add it in Explain or not (maybe we need inputs
from others in this regard), but if it helps in testing this part of
the patch, then it is a good idea to write a patch for it.  You might
want to keep it separate from the main patch as we might not commit
it.

> > I think for now we should remove
> > that part of changes and rather think how to get that for parallel
> > operations that can write WAL.  For ex. we might need to do something
> > similar to what this patch has done in begin_parallel_vacuum and
> > end_parallel_vacuum.  Would you like to attempt that?
>
> Do you mean removing WAL buffers instrumentation from parallel query
> infrastructure?
>

Yes, I meant that but now I realize we need those and your proposed
way of testing it can help us in validating those changes.

> For parallel utility that can do writes it's probably better to keep the
> discussion in the other part of the thread.
>

Sure, I am fine with that but I am not sure if it is a good idea to
commit this patch without having a way to compute WAL utilization for
those commands.

  I tried to think a little bit
> about that, but for now I don't have a better idea than adding something
> similar to intrumentation for utility command to have a general 
> infrastructure,
> as building a workaround for specific utility looks like the wrong approach.
>

I don't know what exactly you have in mind as I don't see why it
should be too complex.  Let's wait for a patch from Sawada-San on
buffer usage stuff and in the meantime, we can work on other parts of
this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-03-29 Thread Amit Kapila
On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud  wrote:
>
> On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
>  wrote:
> >
> > I've run vacuum with/without parallel workers on the table having 5
> > indexes. The vacuum reads all blocks of table and indexes.
> >
> > * VACUUM command with no parallel workers
> > =# select total_time, shared_blks_hit, shared_blks_read,
> > shared_blks_hit + shared_blks_read as total_read_blks,
> > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > query ~ 'vacuum';
> >
> >   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > shared_blks_dirtied | shared_blks_written
> > --+-+--+-+-+-
> >  19857.217207 |   45238 |   226944 |  272182 |
> >  225943 |  225894
> > (1 row)
> >
> > * VACUUM command with 4 parallel workers
> > =# select total_time, shared_blks_hit, shared_blks_read,
> > shared_blks_hit + shared_blks_read as total_read_blks,
> > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > query ~ 'vacuum';
> >
> >  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > shared_blks_dirtied | shared_blks_written
> > -+-+--+-+-+-
> >  6932.117365 |   45205 |73079 |  118284 |
> >  72403 |   72365
> > (1 row)
> >
> > The total number of blocks of table and indexes are about 182243
> > blocks. As Julien reported, obviously the total number of read blocks
> > during parallel vacuum is much less than single process vacuum's
> > result.
> >
> > Parallel create index has the same issue but it doesn't exist in
> > parallel queries for SELECTs.
> >
> > I think we need to change parallel maintenance commands so that they
> > report buffer usage like what ParallelQueryMain() does; prepare to
> > track buffer usage during query execution by
> > InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> > after parallel maintenance command. To report buffer usage of parallel
> > maintenance command correctly, I'm thinking that we can (1) change
> > parallel create index and parallel vacuum so that they prepare
> > gathering buffer usage, or (2) have a common entry point for parallel
> > maintenance commands that is responsible for gathering buffer usage
> > and calling the entry functions for individual maintenance command.
> > I'll investigate it more in depth.
>
> As I just mentioned, (2) seems like a better design as it's quite
> likely that the number of parallel-aware utilities will probably
> continue to increase.  One problem also is that parallel CREATE INDEX
> has been introduced in pg11, so (2) probably won't be packpatchable
> (and (1) seems problematic too).
>

I am not sure if we can decide at this stage whether it is
back-patchable or not.  Let's first see the patch and if it turns out
to be complex, then we can try to do some straight-forward fix for
back-branches.  In general, I don't see why the fix here should be
complex?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: color by default

2020-03-29 Thread Peter Eisentraut

On 2020-03-24 15:34, Juan José Santamaría Flecha wrote:
I think there is also some value in adding the functionality proposed in 
terminal_supports_color().


What do you want to do with that functionality?

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: color by default

2020-03-29 Thread Peter Eisentraut

On 2020-03-26 07:36, Michael Paquier wrote:

I am not sure that deleting all the mentions would be a good idea, as
we'd lose track of which tool supports coloring or not, and that could
confuse users.  What about switching the existing paragraph to a
simple sentence with a link to the new appendix you are adding?  Say:
"pg_foo supports colorized output".


I didn't do this because it would create additional complications in the 
man pages.  But there is now an index entry, so it's possible to find 
more information.



+  
+   The actual colors to be used are configured using the environment variable
+   PG_COLORSPG_COLORS
+   (note plural).  The value is a colon-separated list of
+   
key=value
+   pairs.  The keys specify what the color is to be used for.  The values are
+   SGR (Select Graphic Rendition) specifications, which are interpreted by the
+   terminal.
+  


A reference to SGR to understand better what's the list of values
supported would be nice?


I'm not sure how to do that.  The full list of possible values is huge, 
and exactly what is supported depends on the terminal.



+  
+   The default value is error=01;31:warning=01;35:locus=01.
+  


Could it be possible to have more details about what those three
fields map to?


I have added information about that and explained the example values.  I 
think if you search for "Select Graphic Rendition" and look for the 
example values, you can make sense of this.


Committed with those changes.  This closes the commit fest item.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-03-29 Thread Jürgen Purtz

On 27.03.20 21:12, Justin Pryzby wrote:

On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:

+Archiver

Can you change that to archiver process ?

I prefer the short term without the addition of 'process' - concerning
'Archiver' as well as the other cases. But I'm not an native English
speaker.

I didn't like it due to lack of context.

What about "wal archiver" ?

It occured to me when I read this.
https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com

"WAL archiver" is ok for me. In the current documentation we have 2 
places with "WAL archiver" and 4 with "archiver"-only 
(high-availability.sgml, monitoring.sgml).


"backend process" is an exception to the other terms because the 
standalone term "backend" is sensibly used in diverse situations.


Kind regards, Jürgen






Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-29 Thread Dean Rasheed
On Sat, 28 Mar 2020 at 13:18, Dean Rasheed  wrote:
>
> OK, I've pushed that with your recommendation for that function name.
>

Does this now complete everything that you wanted to do for functional
dependency stats for PG13? Re-reading the thread, I couldn't see
anything else that needed looking at. If that's the case, the CF entry
can be closed.

Regards,
Dean




Re: Can we get rid of GetLocaleInfoEx() yet?

2020-03-29 Thread Juan José Santamaría Flecha
On Sun, Mar 29, 2020 at 3:29 AM Tom Lane  wrote:

>
> The reason for the hack, per the comments, is that VS2015
> omits a codepage field from the result of _create_locale();
> and some optimism is expressed therein that Microsoft might
> undo that oversight in future.  Has this been fixed in more
> recent VS versions?  If not, can we find another, more robust
> way to do it?
>

While working on another issue I have seen this issue reproduce in VS2019.
So no, it has not been fixed.

Please find attached a patch that provides a better detection of the "uft8"
cases.

Regards,

Juan José Santamaría Flecha


0001_fix_win32_langinfo_uft8_detection.patch
Description: Binary data


Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-03-29 Thread Julien Rouhaud
On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
 wrote:
>
> On Sun, 29 Mar 2020 at 15:19, Masahiko Sawada
>  wrote:
> >
> > On Sun, 29 Mar 2020 at 14:23, Amit Kapila  wrote:
> > >
> > > On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud  wrote:
> > > >
> > > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > > > >
> > > > > > I see some basic problems with the patch.  The way it tries to 
> > > > > > compute
> > > > > > WAL usage for parallel stuff doesn't seem right to me.  Can you 
> > > > > > share
> > > > > > or point me to any test done where we have computed WAL for parallel
> > > > > > operations like Parallel Vacuum or Parallel Create Index?
> > > > >
> > > > > Ah, that's indeed a good point and AFAICT WAL records from parallel 
> > > > > utility
> > > > > workers won't be accounted for.  That being said, I think that an 
> > > > > argument
> > > > > could be made that proper infrastructure should have been added in 
> > > > > the original
> > > > > parallel utility patches, as pg_stat_statement is already broken wrt. 
> > > > > buffer
> > > > > usage in parallel utility, unless I'm missing something.
> > > >
> > > > Just to be sure I did a quick test with pg_stat_statements behavior 
> > > > using
> > > > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly 
> > > > buffer usage
> > > > doesn't reflect parallel workers' activity.
> > > >
> > >
> > > Sawada-San would like to investigate this? If not, I will look into
> > > this next week.
> >
> > Sure, I'll investigate this issue today.

Thanks for looking at it!

> I've run vacuum with/without parallel workers on the table having 5
> indexes. The vacuum reads all blocks of table and indexes.
>
> * VACUUM command with no parallel workers
> =# select total_time, shared_blks_hit, shared_blks_read,
> shared_blks_hit + shared_blks_read as total_read_blks,
> shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> query ~ 'vacuum';
>
>   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> shared_blks_dirtied | shared_blks_written
> --+-+--+-+-+-
>  19857.217207 |   45238 |   226944 |  272182 |
>  225943 |  225894
> (1 row)
>
> * VACUUM command with 4 parallel workers
> =# select total_time, shared_blks_hit, shared_blks_read,
> shared_blks_hit + shared_blks_read as total_read_blks,
> shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> query ~ 'vacuum';
>
>  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> shared_blks_dirtied | shared_blks_written
> -+-+--+-+-+-
>  6932.117365 |   45205 |73079 |  118284 |
>  72403 |   72365
> (1 row)
>
> The total number of blocks of table and indexes are about 182243
> blocks. As Julien reported, obviously the total number of read blocks
> during parallel vacuum is much less than single process vacuum's
> result.
>
> Parallel create index has the same issue but it doesn't exist in
> parallel queries for SELECTs.
>
> I think we need to change parallel maintenance commands so that they
> report buffer usage like what ParallelQueryMain() does; prepare to
> track buffer usage during query execution by
> InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> after parallel maintenance command. To report buffer usage of parallel
> maintenance command correctly, I'm thinking that we can (1) change
> parallel create index and parallel vacuum so that they prepare
> gathering buffer usage, or (2) have a common entry point for parallel
> maintenance commands that is responsible for gathering buffer usage
> and calling the entry functions for individual maintenance command.
> I'll investigate it more in depth.

As I just mentioned, (2) seems like a better design as it's quite
likely that the number of parallel-aware utilities will probably
continue to increase.  One problem also is that parallel CREATE INDEX
has been introduced in pg11, so (2) probably won't be packpatchable
(and (1) seems problematic too).




Re: WAL usage calculation patch

2020-03-29 Thread Julien Rouhaud
On Sun, Mar 29, 2020 at 11:03:50AM +0530, Amit Kapila wrote:
> On Sat, Mar 28, 2020 at 7:08 PM Julien Rouhaud  wrote:
> >
> > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > >
> > > Basically,
> > > I don't know changes done in ExecInitParallelPlan and friends allow us
> > > to compute WAL for parallel operations.  Those will primarily cover
> > > parallel queries that won't write WAL.  How you have tested those
> > > changes?
> >
> > I didn't tested those, and I'm not even sure how to properly and reliably 
> > test
> > that.  Do you have any advice on how to achieve that?
> >
> > However the patch is mimicking the buffer instrumentation that already 
> > exists,
> > and the approach also looks correct to me.  Do you have a reason to believe
> > that the approach that works for buffer usage wouldn't work for WAL 
> > records? (I
> > of course agree that this should be tested anyway)
> >
> 
> The buffer usage infrastructure is for read-only queries (for ex. for
> stats like blks_hit, blks_read).  As far as I can think, there is no
> easy way to test the WAL usage via that API.  It might or might not be
> required in the future depending on whether we decide to use the same
> infrastructure for parallel writes.

I'm not sure that I get your point.  I'm assuming that you meant
parallel-read-only queries, but surely buffer usage infrastructure for
parallel query relies on the same approach as non-parallel one (each node
computes the process-local pgBufferUsage diff) and sums all of that at the end
of the parallel query execution.  I also don't see how whether the query is
read-only or not is relevant here as far as instrumentation is concerned,
especially since read-only query can definitely do writes and increase the
count of dirtied buffers, like a write query would.  For instance a hint
bit change can be done in a parallel query AFAIK, and this can generate WAL
records in wal_log_hints is enabled, so that's probably one way to test it.

I now think that not adding support for WAL buffers in EXPLAIN output in the
initial patch scope was a mistake, as this is probably the best way to test the
WAL counters for parallel queries.  This shouldn't be hard to add though, and I
can work on it quickly if there's still a chance to get this feature included
in pg13.

> I think for now we should remove
> that part of changes and rather think how to get that for parallel
> operations that can write WAL.  For ex. we might need to do something
> similar to what this patch has done in begin_parallel_vacuum and
> end_parallel_vacuum.  Would you like to attempt that?

Do you mean removing WAL buffers instrumentation from parallel query
infrastructure?

For parallel utility that can do writes it's probably better to keep the
discussion in the other part of the thread.  I tried to think a little bit
about that, but for now I don't have a better idea than adding something
similar to intrumentation for utility command to have a general infrastructure,
as building a workaround for specific utility looks like the wrong approach.
But this would require quite import changes in utility handling, which is maybe
not a good idea a couple of week before the feature freeze, and that is
definitely not backpatchable so that won't fix the issue for parallel index
build that exists since pg11.




Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-03-29 Thread Masahiko Sawada
On Sun, 29 Mar 2020 at 15:19, Masahiko Sawada
 wrote:
>
> On Sun, 29 Mar 2020 at 14:23, Amit Kapila  wrote:
> >
> > On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud  wrote:
> > >
> > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > > >
> > > > > I see some basic problems with the patch.  The way it tries to compute
> > > > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > > > or point me to any test done where we have computed WAL for parallel
> > > > > operations like Parallel Vacuum or Parallel Create Index?
> > > >
> > > > Ah, that's indeed a good point and AFAICT WAL records from parallel 
> > > > utility
> > > > workers won't be accounted for.  That being said, I think that an 
> > > > argument
> > > > could be made that proper infrastructure should have been added in the 
> > > > original
> > > > parallel utility patches, as pg_stat_statement is already broken wrt. 
> > > > buffer
> > > > usage in parallel utility, unless I'm missing something.
> > >
> > > Just to be sure I did a quick test with pg_stat_statements behavior using
> > > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer 
> > > usage
> > > doesn't reflect parallel workers' activity.
> > >
> >
> > Sawada-San would like to investigate this? If not, I will look into
> > this next week.
>
> Sure, I'll investigate this issue today.
>

I've run vacuum with/without parallel workers on the table having 5
indexes. The vacuum reads all blocks of table and indexes.

* VACUUM command with no parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';

  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
--+-+--+-+-+-
 19857.217207 |   45238 |   226944 |  272182 |
 225943 |  225894
(1 row)

* VACUUM command with 4 parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';

 total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
-+-+--+-+-+-
 6932.117365 |   45205 |73079 |  118284 |
 72403 |   72365
(1 row)

The total number of blocks of table and indexes are about 182243
blocks. As Julien reported, obviously the total number of read blocks
during parallel vacuum is much less than single process vacuum's
result.

Parallel create index has the same issue but it doesn't exist in
parallel queries for SELECTs.

I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

2020-03-29 Thread Andy Fan
>
>
>> Just update the patch which do some test case changes.
> 1.add "ANALYZE" command before running the explain.
> 2.order by with an explicit collate settings.
>

Thanks  Rushabh for pointing this out, or else I'd spend much more time to
figure
out why I get a different order on Windows.

3.As for the postgres_fdw.sql,  I just copied the results.out to
> expected.out,
> that's should be correct based on the result.  However I added my comment
> around that.
>
> The issue doesn't exist at all.  The confusion was introduced by a
misunderstanding
of the test case (I treated count (xx)  filter (xxx) as a window function
rather than an aggration
function). so just fixed the it cleanly.

Some other changes made in the new patch:
1.   Fixed bug for UniqueKey calculation for OUTER join.
2.   Fixed some typo error in comments.
3.   Renamed the field "grantee" as "guarantee".

Best Regards
Andy Fan


v3-0001-Maintain-UniqueKey-at-each-RelOptInfo-this-inform.patch
Description: Binary data


Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-03-29 Thread Masahiko Sawada
On Sun, 29 Mar 2020 at 14:23, Amit Kapila  wrote:
>
> On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud  wrote:
> >
> > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > >
> > > > I see some basic problems with the patch.  The way it tries to compute
> > > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > > or point me to any test done where we have computed WAL for parallel
> > > > operations like Parallel Vacuum or Parallel Create Index?
> > >
> > > Ah, that's indeed a good point and AFAICT WAL records from parallel 
> > > utility
> > > workers won't be accounted for.  That being said, I think that an argument
> > > could be made that proper infrastructure should have been added in the 
> > > original
> > > parallel utility patches, as pg_stat_statement is already broken wrt. 
> > > buffer
> > > usage in parallel utility, unless I'm missing something.
> >
> > Just to be sure I did a quick test with pg_stat_statements behavior using
> > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer 
> > usage
> > doesn't reflect parallel workers' activity.
> >
>
> Sawada-San would like to investigate this? If not, I will look into
> this next week.

Sure, I'll investigate this issue today.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-29 Thread Julien Rouhaud
On Fri, Mar 27, 2020 at 03:42:50PM +0100, Julien Rouhaud wrote:
> On Fri, Mar 27, 2020 at 2:01 PM Fujii Masao  
> wrote:
> >
> 
> > So what I'd like to say is that the information that users are interested
> > in would vary on each situation and case. At least for me it seems
> > enough for pgss to report only the basic information. Then users
> > can calculate to get the numbers (like total_time) they're interested in,
> > from those basic information.
> >
> > But of course, I'd like to hear more opinions about this...
> 
> +1
> 
> Unless someone chime in by tomorrow, I'll just drop the sum as it
> seems less controversial and not a blocker in userland if users are
> interested.

Done in attached v11, with also the s/querytext/query_text/ discrepancy noted
previously.

> > >
> > > I also exported BufferUsageAccumDiff as mentioned previously, as it seems
> > > clearner and will avoid future useless code churn, and run pgindent.
> >
> > Many thanks!! I'm thinking to commit this part separately.
> > So I made that patch based on your patch. Attached.
> 
> Thanks! It looks good to me.

I also kept that part in a distinct commit for convenience.
>From 0d9bf628dac15ce9e790d3a040ef1123c7c6e7d9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 28 Mar 2019 13:33:23 +0100
Subject: [PATCH v11 1/3] Pass query string to the planner

---
 src/backend/commands/copy.c  |  3 ++-
 src/backend/commands/createas.c  |  3 ++-
 src/backend/commands/explain.c   |  2 +-
 src/backend/commands/extension.c |  2 +-
 src/backend/commands/matview.c   |  2 +-
 src/backend/commands/portalcmds.c|  2 +-
 src/backend/executor/functions.c |  1 +
 src/backend/optimizer/plan/planner.c | 10 ++
 src/backend/tcop/postgres.c  | 13 -
 src/backend/utils/cache/plancache.c  |  3 ++-
 src/include/optimizer/optimizer.h|  3 ++-
 src/include/optimizer/planner.h  |  4 +++-
 src/include/tcop/tcopprot.h  |  6 --
 13 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fbde9f88e7..efb1e0d03e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1580,7 +1580,8 @@ BeginCopy(ParseState *pstate,
 		}
 
 		/* plan the query */
-		plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, NULL);
+		plan = pg_plan_query(query, pstate->p_sourcetext,
+			 CURSOR_OPT_PARALLEL_OK, NULL);
 
 		/*
 		 * With row level security and a user using "COPY relation TO", we
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3a5676fb39..9febdc5165 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,7 +329,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
 		Assert(query->commandType == CMD_SELECT);
 
 		/* plan the query */
-		plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, params);
+		plan = pg_plan_query(query, pstate->p_sourcetext,
+			 CURSOR_OPT_PARALLEL_OK, params);
 
 		/*
 		 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ff2f45cfb2..ee0e638f33 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -376,7 +376,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 		INSTR_TIME_SET_CURRENT(planstart);
 
 		/* plan the query */
-		plan = pg_plan_query(query, cursorOptions, params);
+		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 00cf4ef268..38cbea385a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -751,7 +751,7 @@ execute_sql_string(const char *sql)
 		   NULL,
 		   0,
 		   NULL);
-		stmt_list = pg_plan_queries(stmt_list, CURSOR_OPT_PARALLEL_OK, NULL);
+		stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
 
 		foreach(lc2, stmt_list)
 		{
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c3954f3e24..e5a5eef102 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -391,7 +391,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	CHECK_FOR_INTERRUPTS();
 
 	/* Plan the query which will generate data for the refresh. */
-	plan = pg_plan_query(query, 0, NULL);
+	plan = pg_plan_query(query, queryString, 0, NULL);
 
 	/*
 	 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 40be5069fe..6a2c233615 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -90,7 +90,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
 		elog(ERROR, "non-SELECT 

Re: error context for vacuum to include block number

2020-03-29 Thread Amit Kapila
On Sun, Mar 29, 2020 at 9:04 AM Masahiko Sawada
 wrote:
>
> On Sat, 28 Mar 2020 at 13:23, Amit Kapila  wrote:
> >
> >
> > Please find attached the updated patch with all the changes discussed.
> > Let me know if I have missed anything?
> >
>
> Thank you for updating the patch! Looks good to me.
>

Okay, I will push this tomorrow.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com