Re: In-placre persistance change of a relation

2020-11-12 Thread Kyotaro Horiguchi
At Fri, 13 Nov 2020 06:43:13 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> Hi Horiguchi-san,
> 
> 
> Thank you for making a patch so quickly.  I've started looking at it.
> 
> What makes you think this is a PoC?  Documentation and test cases?  If 
> there's something you think that doesn't work or are concerned about, can you 
> share it?

The latest version is heavily revised and is given much comment so it
might have exited from PoC state.  The necessity of documentation is
doubtful since this patch doesn't user-facing behavior other than
speed. Some tests are required especialy about recovery and
replication perspective but I haven't been able to make it. (One of
the tests needs to cause crash while a transaction is running.)

> Do you know the reason why data copy was done before?  And, it may be odd for 
> me to ask this, but I think I saw someone referred to the past discussion 
> that eliminating data copy is difficult due to some processing at commit.  I 
> can't find it.

To imagine that, just because it is simpler considering rollback and
code sharing, and maybe no one have been complained that SET
LOGGED/UNLOGGED looks taking a long time than required/expected.

The current implement is simple.  It's enough to just discard old or
new relfilenode according to the current transaction ends with commit
or abort. Tweaking of relfilenode under use leads-in some skews in
some places.  I used pendingDelete mechanism a bit complexified way
and a violated an abstraction (I think, calling AM-routines from
storage.c is not good.) and even introduce a new fork kind only to
mark a init fork as "not committed yet".  There might be better way,
but I haven't find it.

(The patch scans all shared buffer blocks for each relation).

> (1)
> @@ -168,6 +168,8 @@ extern PGDLLIMPORT int32 *LocalRefCount;
>   */
>  #define BufferGetPage(buffer) ((Page)BufferGetBlock(buffer))
>  
> +struct SmgrRelationData;
> 
> This declaration is already in the file:
> 
> /* forward declared, to avoid having to expose buf_internals.h here */
> struct WritebackContext;
> 
> /* forward declared, to avoid including smgr.h here */
> struct SMgrRelationData;

Hmmm. Nice chatch. And will fix in the next version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Skip ExecCheckRTPerms in CTAS with no data

2020-11-12 Thread Bharath Rupireddy
Thanks for the comments.

On Fri, Nov 13, 2020 at 9:19 AM Michael Paquier  wrote:
>
> Let's move any tests related to matviews to matviews.sql.  It does not
> seem consistent to me to have those tests in a test path reserved to
> CTAS, though I agree that there is some overlap and that setting up
> the permissions requires a bit of duplication.
>

Done.

>
> "permission", say (comment applies as well to the CTAS page):
> CREATE MATERIALIZED VIEW requires CREATE privileges on the schema used
> for the materialized view.  If using WITH DATA, the default, INSERT
> privileges are also required.
>

Done.

>
> +* Check INSERT permission on the constructed table. Skip this check if
> +* WITH NO DATA is specified as we do not actually insert the tuples, we
> +* just create the table. The insert permissions will be checked anyways
> +* while inserting tuples into the table.
> I would also use "privilege" here.  A nit.
>

Done.

> myState->reladdr = intoRelationAddr;
> -   myState->output_cid = GetCurrentCommandId(true);
> myState->ti_options = TABLE_INSERT_SKIP_FSM;
> -   myState->bistate = GetBulkInsertState();
> +   myState->output_cid = GetCurrentCommandId(true);
> The changes related to the bulk-insert state data look fine per se.
> One nit: I would set bistate to NULL for the data-skip case here.
>

It's not required to set bistate to null as we have allocated myState
with palloc0 in CreateIntoRelDestReceiver, it will anyways be null.
  if (!into->skipData)
myState->bistate = GetBulkInsertState();

Attaching v4 patch. Please review it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v4-0001-Skip-Insert-Perm-Check-Bulk-Insert-State-alloc-in.patch
Description: Binary data


Re: [PATCH] remove deprecated v8.2 containment operators

2020-11-12 Thread Peter Eisentraut

On 2020-11-12 23:28, Tom Lane wrote:

I'm on board with pulling these now --- 8.2 to v14 is plenty of
deprecation notice.  However, the patch seems incomplete in that
the code support for these is still there -- look for
RTOldContainedByStrategyNumber and RTOldContainsStrategyNumber.
Admittedly, there's not much to be removed except some case labels,
but it still seems like we oughta do that to avoid future confusion.


Yeah, the stuff in gistproc.c should be removed now.  But I wonder what 
the mentions in brin_inclusion.c are and whether or how they should be 
removed.






RE: In-placre persistance change of a relation

2020-11-12 Thread osumi.takami...@fujitsu.com
Hello, Tsunakawa-San


> Do you know the reason why data copy was done before?  And, it may be
> odd for me to ask this, but I think I saw someone referred to the past
> discussion that eliminating data copy is difficult due to some processing at
> commit.  I can't find it.
I can share 2 sources why to eliminate the data copy is difficult in hackers 
thread.

Tom's remark and the context to copy relation's data.
https://www.postgresql.org/message-id/flat/31724.1394163360%40sss.pgh.pa.us#31724.1394163...@sss.pgh.pa.us

Amit-San quoted this thread and mentioned that point in another thread.
https://www.postgresql.org/message-id/CAA4eK1%2BHDqS%2B1fhs5Jf9o4ZujQT%3DXBZ6sU0kOuEh2hqQAC%2Bt%3Dw%40mail.gmail.com

Best,
Takamichi Osumi




Re: POC: Cleaning up orphaned files using undo logs

2020-11-12 Thread Amit Kapila
On Thu, Nov 12, 2020 at 2:45 PM Antonin Houska  wrote:
>
>
> No background undo
> --
>
> Reduced complexity of the patch seems to be the priority at the moment. Amit
> suggested that cleanup of an orphaned relation file is simple enough to be
> done on foreground and I agree.
>

Yeah, I think we should try and see if we can make it work but I
noticed that there are few places like AbortOutOfAnyTransaction where
we have the assumption that undo will be executed in the background.
We need to deal with it.

> "undo worker" is still there, but it only processes undo requests after server
> restart because relation data can only be changed in a transaction - it seems
> cleaner to launch a background worker for this than to hack the startup
> process.
>

But, I see there are still multiple undoworkers that are getting
launched and I am not sure if that works correctly because a
particular undoworker is connected to a database and then it starts
processing all the pending undo.

> Since the concept of undo requests is closely related to the undo worker, I
> removed undorequest.c too. The new (much simpler) undo worker gets the
> information on incomplete / aborted transactions from the undo log as
> mentioned above.
>
> SMGR enhancement
> 
>
> I used the 0001 patch from [3] rather than [4], although it's more invasive
> because I noticed somewhere in the discussion that there should be no reserved
> database OID for the undo log. (InvalidOid cannot be used because it's already
> in use for shared catalogs.)
>
> Components added
> 
>
> pg_undo_dump utility and test framework for undoread.c. BTW, undoread.c seems
> to need some refactoring.
>
>
> Following are a few areas which are not implemented yet because more
> discussion is needed there:
>
> Discarding
> --
>
> There's no discard worker for the URS infrastructure yet. I thought about
> discarding the undo log during checkpoint, but checkpoint should probably do
> more straightforward tasks than the calculation of a new discard pointer for
> each undo log, so a background worker is needed. A few notes on that:
>
>   * until the zheap AM gets added, only the transaction that creates the undo
> records needs to access them. This assumption should make the discarding
> algorithm a bit simpler. Note that with zheap, the other transactions need
> to look for old versions of tuples, so the concept of oldestXidHavingUndo
> variable is needed there.
>
>   * it's rather simple to pass pointer the URS pointer to the discard worker
> when transaction either committed or the undo has been executed.
>

Why can't we have a separate discard worker which keeps on scanning
the undorecords and discard accordingly? Giving the onus of foreground
process might be tricky because say discard worker is not up to speed
and we ran out of space to pass such information for each commit/abort
request.

>
> Do not execute the same undo record multiple times
> --
>
> Although I've noticed in the zheap code that it checks whether particular undo
> action was already undone, I think this functionality fits better in the URS
> layer.
>

If you want to track at undo record level, then won't it lead to
performance overhead and probably additional WAL overhead considering
this action needs to be WAL-logged. I think recording at page-level
might be a better idea.

>
> I can spend more time on this project, but need a hint which part I should
> focus on.
>

I can easily imagine that this needs a lot of work and I can try to
help with this as much as possible from my side. I feel at this stage
we should try to focus on undo-related work (to start with you can
look at finishing the undoprocessing work for which I have shared some
thoughts) and then probably at some point in time we need to rebase
zheap over this.

-- 
With Regards,
Amit Kapila.




RE: In-placre persistance change of a relation

2020-11-12 Thread tsunakawa.ta...@fujitsu.com
Hi Horiguchi-san,


Thank you for making a patch so quickly.  I've started looking at it.

What makes you think this is a PoC?  Documentation and test cases?  If there's 
something you think that doesn't work or are concerned about, can you share it?

Do you know the reason why data copy was done before?  And, it may be odd for 
me to ask this, but I think I saw someone referred to the past discussion that 
eliminating data copy is difficult due to some processing at commit.  I can't 
find it.



(1)
@@ -168,6 +168,8 @@ extern PGDLLIMPORT int32 *LocalRefCount;
  */
 #define BufferGetPage(buffer) ((Page)BufferGetBlock(buffer))
 
+struct SmgrRelationData;

This declaration is already in the file:

/* forward declared, to avoid having to expose buf_internals.h here */
struct WritebackContext;

/* forward declared, to avoid including smgr.h here */
struct SMgrRelationData;


Regards
Takayuki Tsunakawa





Re: Strange behavior with polygon and NaN

2020-11-12 Thread Kyotaro Horiguchi
At Fri, 13 Nov 2020 15:35:58 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Thank you for the review, Georgios and Tom.
> 
> At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane  wrote in 
> > I spent some time looking this over, and have a few thoughts:
> > 
> > 1. I think it's useful to split the test changes into two patches,
> > as I've done below: first, just add the additional row in point_tbl
> > and let the fallout from that happen, and then in the second patch
> > make the code changes.  This way, it's much clearer what the actual
> > behavioral changes are.  Some of them don't look right, either.
> > For instance, in the very first hunk in geometry.out, we have
> > this:
> > 
> > - (Infinity,1e+300) | {1,0,5}   |   
> >  NaN |NaN
> > + (Infinity,1e+300) | {1,0,5}   |   
> > Infinity |   Infinity
> > 
> > which seems right, and also this:
> 
> For example, ('Infinity', 1e300) <-> {1,0,5}, that is:
> 
>line "x = -5" <-> point(1e300, Inf)
> 
> So sqrt((1e300 - 5)^2 + Inf^2) = Inf, which looks right.

??! Correction:

 It's sqrt((1e300 - 5)^2 + 0^2) = Inf, which looks right.

reagrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Strange behavior with polygon and NaN

2020-11-12 Thread Kyotaro Horiguchi
Thank you for the review, Georgios and Tom.

At Tue, 10 Nov 2020 14:30:08 -0500, Tom Lane  wrote in 
> I spent some time looking this over, and have a few thoughts:
> 
> 1. I think it's useful to split the test changes into two patches,
> as I've done below: first, just add the additional row in point_tbl
> and let the fallout from that happen, and then in the second patch
> make the code changes.  This way, it's much clearer what the actual
> behavioral changes are.  Some of them don't look right, either.
> For instance, in the very first hunk in geometry.out, we have
> this:
> 
> - (Infinity,1e+300) | {1,0,5}   |
> NaN |NaN
> + (Infinity,1e+300) | {1,0,5}   |   
> Infinity |   Infinity
> 
> which seems right, and also this:

For example, ('Infinity', 1e300) <-> {1,0,5}, that is:

   line "x = -5" <-> point(1e300, Inf)

So sqrt((1e300 - 5)^2 + Inf^2) = Inf, which looks right.


> - (1e+300,Infinity) | {1,-1,0}  |   
> Infinity |   Infinity
> - (1e+300,Infinity) | {-0.4,-1,-6}  |   
> Infinity |   Infinity
> - (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} |   
> Infinity |   Infinity
> + (1e+300,Infinity) | {1,-1,0}  |
> NaN |NaN
> + (1e+300,Infinity) | {-0.4,-1,-6}  |
> NaN |NaN
> + (1e+300,Infinity) | {-0.000184615384615,-1,15.3846153846} |
> NaN |NaN
> 
> which does not.  Why aren't these distances infinite as well?
> 
> For instance, {1,-1,0} is the line "x = y".  We could argue about
> whether it'd be sensible to return zero for the distance between that
> and the point (inf,inf), but surely any point with one inf and one
> finite coordinate must be an infinite distance away from that line.
> There's nothing ill-defined about that situation.

Mmm... (swinging my arms to mimic lines..)
dist(x = y, (1e300, Inf)) looks indeterminant to me..

The calcuation is performed in the following steps.

1. construct the perpendicular line for the line.
 perpine(1e300, 'Infinity') => {-1, -1, Inf}

2. calculate the cross point.
 corsspoint({-1, -1, Inf}, {1,-1,0}) => (Inf, NaN)

3. calculte the distance from the crosspoint to the point.
 point_dt((Inf, NaN), (1e300, 'Infinity'))
   = HYPOT(Inf - 1e300, NaN - Inf);
   = HYPOT(Inf, NaN);

4. HYPOT changed the behavior by the patch

   Before: HYPOT(Inf, NaN) = Inf
   After : HYPOT(Inf, NaN) = NaN- Result A


So if we will "fix" that, we should fix any, some, or all of 1-3.

1. seems to have no other way than the result.

2. crosspoint (x = - y + Inf, x = y)  could be (Inf, Inf)?

3.point_dt((Inf, Inf), (1e300, Inf))
   = HYPOT(Inf - 1e300, Inf - Inf)
   = HYPOT(Inf, -NaN)
   = NaN.   - Result B

  I'm not sure why Inf - Inf is negative, but |Inf-Inf| = NaN is
  reasonable.

That is, we don't get a "reasonable" result this way.


The formula for the distance((x0,y0) - (ax + by + c = 0)) is

  |ax0 + by0 + c|/sqrt(a^2 + b^2)

 where a = -1, b = -1, c = Inf, x0 = 1e300, y0 = Inf,

abs(-1 * 1e300 + -1 * Inf + Inf) / sqrt(1 + 1)
  = abs(-1e300 -Inf + Inf) / C
  = NaN.   - Result C

All of the Result A - C is NaN. At last NaN looks to be the right
result.

By the way that the formula is far simple than what we are doing
now. Is there any reason to take the above steps for the calculation?


> 2. Rather than coding around undesirable behavior of float8_min,
> it seems like it's better to add a primitive to float.h that
> does what you want, ie "NaN if either input is NaN, else the
> smaller input".  This is more readable, and possibly more efficient
> (depending on whether the compiler is smart enough to optimize
> away redundant isnan checks).  I did that in the attached.

Sounds reasonable. I found that I forgot to do the same thing to y
coordinate.

> 3. Looking for other calls of float8_min, I wonder why you did not
> touch the bounding-box calculations in box_interpt_lseg() or
> boxes_bound_box().

While doing that, I didn't make changes just by looking a code locally
since I thought that that might be looked as overdone.  Maybe, for
example box_interpt_lseg, even if bounding-box check overlooked NaNs,
I thought that the following calcualaions reflect any involved NaNs to
the result. (But I'm not confident that that is perfect, though..)

> 4. The line changes feel a bit weird, like there's no clear idea
> of what a "valid" or "invalid" line is.  For instance the first
> hunk in line_construct():
> 
> + /* Avoid creating a valid line from an invalid point */
> + if (unlikely(isnan(pt->y)))
> + result->C = get_float8_nan();
> 
> Why's it appropriate to set C and only C to NaN?

Not limited to here, 

RE: pgbench: option delaying queries till connections establishment?

2020-11-12 Thread kuroda.hay...@fujitsu.com
Dear Fabien, 

> and this will wait till its time comes. In the mean time, I think that you 
> should put the patch status as you see fit, independently of the other 
> patch: it seems unlikely that they would be committed together, and I'll 
> have to merge the remaining one anyway.

OK. I found the related thread[1], and I understood you will submit another 
patch
on the thread.

PostgreSQL Patch Tester says all regression tests are passed, and
I change the status to "Ready for committer."

[1]: https://commitfest.postgresql.org/31/2827/

Thank you for discussing with me.

Hayato Kuroda
FUJITSU LIMITED

-Original Message-
From: Fabien COELHO  
Sent: Wednesday, November 11, 2020 9:24 PM
To: Kuroda, Hayato/黒田 隼人 
Cc: Andres Freund ; Daniel Gustafsson ; 
pgsql-hack...@postgresql.org
Subject: RE: pgbench: option delaying queries till connections establishment?


Hello,

>> I can remove the line, but I strongly believe that reporting performance
>> figures if some client connection failed thus the bench could not run as
>> prescribed is a bad behavior. The good news is that it is probably quite
>> unlikely. So I'd prefer to keep it and possibly submit a patch to change
>> the behavior.
>
> I agree such a situation is very bad, and I understood you have a plan to
> submit patches for fix it. If so leaving lines as a TODO is OK.

Thanks.

>> Should be this one: https://commitfest.postgresql.org/30/2624/
>
> This discussion is still on-going, but I can see that the starting time
> may be delayed for looking up all pgbench-variables.

Yep, that's it.

> (I think the status of this thread might be wrong. it should be
> 'Needs review,' but now 'Waiting on Author.')

I changed it to "Needs review".

> This patch is mostly good and can change a review status soon,
> however, I think it should wait that related patch.

Hmmm.

> Please discuss how to fix it with Tom,

I would not have the presumption to pressure Tom's agenda in any way!

> and this will commit soon.

and this will wait till its time comes. In the mean time, I think that you 
should put the patch status as you see fit, independently of the other 
patch: it seems unlikely that they would be committed together, and I'll 
have to merge the remaining one anyway.

-- 
Fabien.




Re: Add docs stub for recovery.conf

2020-11-12 Thread Craig Ringer
On Fri, Nov 13, 2020 at 11:50 AM Bruce Momjian  wrote:


> > So you are saying you don't think you are getting sufficient thought
> > into your proposal, and getting just a reflex?  Just because we don't
> > agree with you don't mean we didn't think about it.  In fact, we have
> > thought about it a lot, which is evident from the URL I sent you
> > already.
>

I am mostly trying to say that I don't think the issues I raised were
actually addressed in the proposed alternatives. I put in a fair bit of
effort to clearly set out the problem that this is meant to solve, and was
frustrated to perceive the response as "yeah, nah, lets just do this other
thing that only addresses one part of the original issue." It wasn't clear
why my proposal appeared to be being rejected. Perhaps I didn't fully grasp
the context of the linked discussion.

Please review the docs on standbys with a "new user" hat on. It's confusing
( though the new front-matter and definitions in the HA chapter help) even
without upgrade considerations. See how long it takes you to determine the
answer to the question "what exactly puts a server into 'standby mode' " ?

This proposal was intended to address one part of that, stemming directly
from my own challenges with the docs when I as an experienced PostgreSQL
user and contributor went to adapt some tests to Pg 12 and 13. I knew we'd
removed recovery.conf, but for the life of me I couldn't remember how to
put the server in standby mode in 12 or 13 at the time (I've been working
with 11 too much lately)... and it took ages to actually find that in the
docs.

I can be pretty dense sometimes, but if it sent me for a loop it's going to
confuse others a great deal. Two things that would've helped me would've
been  some cross references to the old configuration terms, and a
non-vanishing documentation URL for newer versions. Hence this proposal.

What would be interesting, I think you were suggesting this, is a
> separate doc chapter that had a list of all the renames, what version
> they were renamed in, and a link from their to the new name in the docs.
>

Right, that is exactly what I am suggesting we add, as an appendix so it's
way out of the way of the main flow of the docs. Per the original patch and
the illustrative screenshots. I called it something like "removed and
renamed features and settings" or something in the proposed patch.
Alternatives would be welcomed, I don't like the name much.

This could be easily created by reading the old release notes.  Anyone
> looking for old names would automatically be sent to that page in the
> docs.  This would give us a definitive list, and make the list out of
> the main flow of the docs.
>

Exactly. Plus a few s where appropriate. That's pretty much all
I'm after.


Re: POC: Cleaning up orphaned files using undo logs

2020-11-12 Thread Thomas Munro
On Thu, Nov 12, 2020 at 10:15 PM Antonin Houska  wrote:
> Thomas Munro  wrote:
> > Thanks.  We decided to redesign a couple of aspects of the undo
> > storage and record layers that this patch was intended to demonstrate,
> > and work on that is underway.  More on that soon.
>
> As my boss expressed in his recent blog post, we'd like to contribute to the
> zheap development, and a couple of developers from other companies are
> interested in this as well. Amit Kapila suggested that the "cleanup of
> orphaned files" feature is a good start point in getting the code into PG
> core, so I've spent some time on it and tried to rebase the patch set.

Hi Antonin,

I saw that -- great news! -- and have been meaning to write for a
while.  I think I am nearly ready to talk about it again.  I agree
100% that it's worth trying to do something much simpler than a new
access manager, and this was the simplest useful feature solving a
real-world-problem-that-people-actually-have we could come up with
(based on an idea from Robert).  I think it needs a convincing
explanation for why there is no scenario where the relfilenode is
recycled for a new unlucky table before the rollback is executed,
which might depend on details that you might be working on/changing
(scenarios where you execute undo twice because you forgot you already
did it).

> In fact what I did is not mere rebasing against the current master branch -
> I've also (besides various bug fixes) done some design changes.
>
> Incorporated the new Undo Record Set (URS) infrastructure
> -
>
> This is also pointed out in [0].
>
> I started from [1] and tried to implement some missing parts (e.g. proper
> closing of the URSs after crash), introduced UNDO_DEBUG preprocessor macro
> which makes the undo log segments very small and fixed some bugs that the
> small segments exposed.

Cool!  Getting up to speed on all these made up concepts like URS, and
getting all these pieces assembled and rebased and up and running is
already quite something, let alone adding missing parts and debugging.

> The most significant change I've done was removal of the undo requests from
> checkpoint. I could not find any particular bug / race conditions related to
> including the requests into the checkpoint, but I concluded that it's easier
> to think about consistency and checkpoint timings if we scan the undo log on
> restart (after recovery has finished) and create the requests from scratch.

Interesting.  I guess that would be closer to textbook three-phase ARIES.

> [2] shows where I ended up before I started to rebase this patchset.
>
> No background undo
> --
>
> Reduced complexity of the patch seems to be the priority at the moment. Amit
> suggested that cleanup of an orphaned relation file is simple enough to be
> done on foreground and I agree.
>
> "undo worker" is still there, but it only processes undo requests after server
> restart because relation data can only be changed in a transaction - it seems
> cleaner to launch a background worker for this than to hack the startup
> process.

I suppose the simplest useful system would be one does the work at
startup before allowing connections, and also in regular backends, and
panics if a backend ever exits while it has pending undo (panic =
"goto crash recovery").  Then you don't have to deal with undo workers
running at the same time as regular sessions which might run into
trouble reacquiring locks (for an AM I mean), or due to OIDs being
recycled with multiple checkpoints, or undo work that gets deferred
until the next restart of the server.

> Since the concept of undo requests is closely related to the undo worker, I
> removed undorequest.c too. The new (much simpler) undo worker gets the
> information on incomplete / aborted transactions from the undo log as
> mentioned above.
>
> SMGR enhancement
> 
>
> I used the 0001 patch from [3] rather than [4], although it's more invasive
> because I noticed somewhere in the discussion that there should be no reserved
> database OID for the undo log. (InvalidOid cannot be used because it's already
> in use for shared catalogs.)

I give up thinking about the colour of the BufferTag shed and went
back to magic database 9, mainly because there seemed to be more
pressing matters.  I don't even think it's that crazy to store this
type of system-wide data in pseudo databases, and I know of other
systems that do similar sorts of things without blinking...

> Following are a few areas which are not implemented yet because more
> discussion is needed there:

Hmm.  I'm thinking about these questions.




Re: In-placre persistance change of a relation

2020-11-12 Thread Kyotaro Horiguchi
Hello.  Before posting the next version, I'd like to explain what this
patch is.

1. The Issue

Bulk data loading is a long-time taking, I/O consuming task.  Many
DBAs want that task is faster, even at the cost of increasing risk of
data-loss.  wal_level=minimal is an answer to such a
request. Data-loading onto a table that is created in the current
transaction omits WAL-logging and synced at commit.

However, the optimization doesn't benefit the case where the
data-loading is performed onto existing tables. There are quite a few
cases where data is loaded into tables that already contains a lot of
data. Those cases don't take benefit of the optimization.

Another possible solution for bulk data-loading is UNLOGGED
tables. But when we switch LOGGED/UNLOGGED of a table, all the table
content is copied to a newly created heap, which is costly.


2. Proposed Solutions.

There are two proposed solutions are discussed on this mailing
list. One is wal_level = none (*1), which omits WAL-logging almost at
all. Another is extending the existing optimization to the ALTER TABLE
SET LOGGED/UNLOGGED cases, which is to be discussed in this new
thread.


3. In-place Persistence Change

So the attached is a PoC patch of the "another" solution.  When we
want to change table persistence in-place, basically we need to do the
following steps.

(the talbe is exclusively locked)

(1) Flip BM_PERMANENT flag of all shared buffer blocks for the heap.

(2) Create or delete the init fork for existing heap.

(3) Flush all buffers of the relation to file system.

(4) Sync heap files.

(5) Make catalog changes.


4. Transactionality

The 1, 2 and 5 above need to be abort-able. 5 is rolled back by
existing infrastructure, and rolling-back of 1 and 2 are achieved by
piggybacking on the pendingDeletes mechanism.


5. Replication

Furthermore, that changes ought to be replicable to standbys.  Catalog
changes are replicated as usual. 

On-the-fly creation of the init fork leads to recovery mess. Even
though it is removed at abort, if the server crashed before
transaction end, the file is left alone and corrupts database in the
next recovery. I sought a way to create the init fork in
smgrPendingDelete but that needs relcache and relcache is not
available at that late of commit. Finally, I introduced the fifth fork
kind "INITTMP"(_itmp) only to signal that the init file is not
committed. I don't like that way but it seems working fine...


6. SQL Command

The second file in the patchset adds a syntax that changes persistence
of all tables in a tablespace.

ALTER TABLE ALL IN TABLESPACE  SET LOGGED/UNLOGGED [ NOWAIT ];


7. Testing

I tried to write TAP test for this, but IPC::Run::harness (or
interactive_psql) doesn't seem to work for me.  I'm not sure what
exactly is happening but pty redirection doesn't work.

  $in = "ls\n"; $out = ""; run ["/usr/bin/bash"], \$in, \$out; print $out;

works but

  $in = "ls\n"; $out = ""; run ["/usr/bin/bash"], 'pty>', 
\$out; print $out;

doesn't respond.


The patch is attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 05d1971d0f4f0f42899f5d6857892128487eeb40 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v3 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.
---
 src/backend/access/rmgrdesc/smgrdesc.c |  23 ++
 src/backend/catalog/storage.c  | 355 +++--
 src/backend/commands/tablecmds.c   | 217 ---
 src/backend/storage/buffer/bufmgr.c|  88 ++
 src/backend/storage/file/reinit.c  | 206 --
 src/backend/storage/smgr/smgr.c|   6 +
 src/common/relpath.c   |   3 +-
 src/include/catalog/storage.h  |   2 +
 src/include/catalog/storage_xlog.h |  16 ++
 src/include/common/relpath.h   |   5 +-
 src/include/storage/bufmgr.h   |   4 +
 src/include/storage/smgr.h |   1 +
 12 files changed, 784 insertions(+), 142 deletions(-)

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index a7c0cb1bc3..097dacfee6 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,23 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = 

Re: Add docs stub for recovery.conf

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 10:41:49PM -0500, Bruce Momjian wrote:
> On Fri, Nov 13, 2020 at 11:37:16AM +0800, Craig Ringer wrote:
> > I have a draft patch that adds them and various related index 
> > cross-referencing
> > in my tree to submit after the recovery.conf docs patch. Let me know if you
> > think that might be worthwhile, 'cos I won't invest time in it if it's 
> > going to
> > get reflexively blocked too.
> 
> So you are saying you don't think you are getting sufficient thought
> into your proposal, and getting just a reflex?  Just because we don't
> agree with you don't mean we didn't think about it.  In fact, we have
> thought about it a lot, which is evident from the URL I sent you
> already.

What would be interesting, I think you were suggesting this, is a
separate doc chapter that had a list of all the renames, what version
they were renamed in, and a link from their to the new name in the docs.
This could be easily created by reading the old release notes.  Anyone
looking for old names would automatically be sent to that page in the
docs.  This would give us a definitive list, and make the list out of
the main flow of the docs.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Skip ExecCheckRTPerms in CTAS with no data

2020-11-12 Thread Michael Paquier
On Thu, Nov 12, 2020 at 04:17:43PM +0530, Bharath Rupireddy wrote:
> On HEAD/master, the behaviour is as follows: a) for plain CTAS WITH NO
> DATA, ExecCheckRTPerms() will not be called. b) for explain analyze
> CTAS WITH NO DATA, ExecCheckRTPerms() will be called. So, we need a).
> This is what exactly this patch does i.e. ExecCheckRTPerms() will not
> be called for both cases.
> 
> Attaching V3 patch, please review it.

+CREATE MATERIALIZED VIEW selinto_schema.mv_nodata4 (a) AS
+   SELECT oid FROM pg_class WHERE relname like '%c%' WITH DATA;
-- Error
+ERROR:  permission denied for materialized view mv_nodata4
Let's move any tests related to matviews to matviews.sql.  It does not
seem consistent to me to have those tests in a test path reserved to
CTAS, though I agree that there is some overlap and that setting up
the permissions requires a bit of duplication.

+   refreshed later using REFRESH MATERIALIZED VIEW. Insert
+   permission is checked on the materialized view before populating the data
+   (unless WITH NO DATA is specified).
Let's document that in a new paragraph, using "privilege" instead of
"permission", say (comment applies as well to the CTAS page):
CREATE MATERIALIZED VIEW requires CREATE privileges on the schema used
for the materialized view.  If using WITH DATA, the default, INSERT
privileges are also required.

+* Check INSERT permission on the constructed table. Skip this check if
+* WITH NO DATA is specified as we do not actually insert the tuples, we
+* just create the table. The insert permissions will be checked anyways
+* while inserting tuples into the table.
I would also use "privilege" here.  A nit.

myState->reladdr = intoRelationAddr;
-   myState->output_cid = GetCurrentCommandId(true);
myState->ti_options = TABLE_INSERT_SKIP_FSM;
-   myState->bistate = GetBulkInsertState();
+   myState->output_cid = GetCurrentCommandId(true);
The changes related to the bulk-insert state data look fine per se.
One nit: I would set bistate to NULL for the data-skip case here.
--
Michael


signature.asc
Description: PGP signature


Re: Add docs stub for recovery.conf

2020-11-12 Thread Isaac Morland
On Thu, 12 Nov 2020 at 22:40, Bruce Momjian  wrote:

Because at a certain point the number of _old_ names in the docs
> obscures exactly how to operate the current software.  We have tried
> keeping stuff around, and we are very bad at removing stuff.
>

This is a good point, but does not attempt to explain why pages should
disappear entirely from /docs/current/. As I said in my previous comment,
there are lots of ways of doing this right. For example, we could have
pages that would disappear instead be replaced by a short page that
explains the page is removed and points to the current documentation of the
equivalent or replacement features; these hypothetical "useful 404" (or,
more correctly, "useful 410") pages don't even necessarily have to be
listed in the table of contents. In fact, serving them with a 410 HTTP
status code would be a reasonable thing to do.


Re: Add docs stub for recovery.conf

2020-11-12 Thread Bruce Momjian
On Fri, Nov 13, 2020 at 11:37:16AM +0800, Craig Ringer wrote:
> I have a draft patch that adds them and various related index 
> cross-referencing
> in my tree to submit after the recovery.conf docs patch. Let me know if you
> think that might be worthwhile, 'cos I won't invest time in it if it's going 
> to
> get reflexively blocked too.

So you are saying you don't think you are getting sufficient thought
into your proposal, and getting just a reflex?  Just because we don't
agree with you don't mean we didn't think about it.  In fact, we have
thought about it a lot, which is evident from the URL I sent you
already.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Add docs stub for recovery.conf

2020-11-12 Thread Isaac Morland
On Thu, 12 Nov 2020 at 22:31, Craig Ringer 
wrote:


> I maintain that simply vanishing terms from the docs without any sort of
> explanation is a user-hostile action that we should fix and stop doing If
> we had something in the docs and we remove it, it's not unduly burdensome
> to have some index entries that point to the replacement/renamed terms, and
> a short appendix entry explaining what happened.
>

This sounds very reasonable to me. I would add that while I am by no means
an expert in Postgres, although I do know a few things, I will state that
it is my professional opinion as a Web person that pages should not simply
disappear from formal documentation without some sort of indication of what
happened. There are lots of ways to accomplish an indication but for
https://www.postgresql.org/docs/current/recovery-config.html or other pages
to just disappear is definitely wrong.


Re: Add docs stub for recovery.conf

2020-11-12 Thread Bruce Momjian
On Fri, Nov 13, 2020 at 11:31:24AM +0800, Craig Ringer wrote:
> Can anyone tell me why the solution I proposed is not acceptable, and why we
> have to invent a different one instead? The website  redirect is good and all,
> but doesn't really solve the problem, and I still don't know what's wrong with
> just fixing the docs...

Because at a certain point the number of _old_ names in the docs
obscures exactly how to operate the current software.  We have tried
keeping stuff around, and we are very bad at removing stuff.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Add docs stub for recovery.conf

2020-11-12 Thread Craig Ringer
On Fri, Nov 13, 2020 at 11:31 AM Craig Ringer 
wrote:

>
> Can anyone tell me why the solution I proposed is not acceptable, and why
> we have to invent a different one instead? The website  redirect is good
> and all, but doesn't really solve the problem, and I still don't know
> what's wrong with just fixing the docs...
>

Also, while I'm at it, note that a search on common search engines for
"postgres standby" takes you to (an old version of) the hot standby docs.
Follow the link to the current docs. Then try to work out from there what
exactly makes a server in "archive recovery" or "standby mode".

https://www.postgresql.org/docs/current/hot-standby.html

We need some  terms on "archive recovery" and "standby mode"
there, and probably other places.

I have a draft patch that adds them and various related index
cross-referencing in my tree to submit after the recovery.conf docs patch.
Let me know if you think that might be worthwhile, 'cos I won't invest time
in it if it's going to get reflexively blocked too.


Re: Add statistics to pg_stat_wal view for wal related parameter tuning

2020-11-12 Thread lchch1...@sina.cn

>Now, pg_stat_wal supports reset all informantion in WalStats
>using pg_stat_reset_shared('wal') function.
>Isn't it enough?
Yes it ok, sorry I miss this infomation.


>> 3. I do not think it's a correct describe in document for
>> 'wal_buffers_full'.
 
>Where should I rewrite the description? If my understanding is not
>correct, please let me know.
Sorry I have not described it clearly, because I can not understand the meaning 
of this
column after I read the describe in document.
And now I read the source code of walwrite and found the place where 
'wal_buffers_full'
added is for a backend to wait a wal buffer which is occupied by other wal 
page, so the 
backend flush the old page in the wal buffer(after wait it can).
So i think the origin decribe in document is not so in point, we can describe 
it such as
'Total number of times WAL data written to the disk because a backend yelled a 
wal buffer
for an advanced wal page.

Sorry if my understand is wrong.





Re: Add docs stub for recovery.conf

2020-11-12 Thread Craig Ringer
On Thu, Nov 12, 2020 at 11:25 PM Stephen Frost  wrote:

>
> > Now, the pgweb feature that Jonathan wrote recently might actually be
> > exactly what we need to fix that, and to address the issue with
> > recovery config documentation that Craig raises.
>
> After chatting with Jonathan about this for a bit and testing it out in
> our test environment, I've gone ahead and added an entry for
> pgxlogdump.html to redirect to pgwaldump.html, and that seems to be
> working well.
>

Thanks.

With that then- Craig, can you look at how the pgxlogdump -> pgwaldump
> pages work and see if using that would address the concerns you've
> raised here..?
>

> Though we need to decide which page 'recovery-config' should go to in
> newer versions.
>

Since we basically vanished all evidence of the old configuration, I don't
think there is a suitable place.

I maintain that simply vanishing terms from the docs without any sort of
explanation is a user-hostile action that we should fix and stop doing If
we had something in the docs and we remove it, it's not unduly burdensome
to have some index entries that point to the replacement/renamed terms, and
a short appendix entry explaining what happened.

If that is for some reason unacceptable (and I don't see anyone giving any
actual reason why) the closest I can come up with is probably redirecting
to
https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION
. But that needs to be fixed to actually explicitly state what makes a
standby server into a standby server (per my patch), since right now it
just kind of assumes you know about standby.signal .

But... fiddling with the website addresses none of my other concerns. In
particular, it doesn't help a user understand that "standby_mode" is gone
and to look for "standby.signal" instead. It doesn't provide any "see also"
pointers for old terms to point to new terms in the index. Website
redirects won't help users with local copies of the docs or manpages who
are wondering what the heck happened to recovery.conf and standby_mode
either.

So I still think this needs a docs patch. Redirects on the website are not
sufficient. If you don't like how I spelled it, consider calling it
"important incompatible changes" or something.

The release notes are IMO not sufficient for this because (a) they don't
appear in the index; (b) you have to know something has been
removed/changed before you know to look in the relnotes for it; (c) you
have to find the relnotes for the correct release to find the info you
want. An appendix covering important renamings, removals and other
incompatible changes would address all those points *and* fix the web
links, man page names, etc.

Can anyone tell me why the solution I proposed is not acceptable, and why
we have to invent a different one instead? The website  redirect is good
and all, but doesn't really solve the problem, and I still don't know
what's wrong with just fixing the docs...


Re: public schema default ACL

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 06:36:39PM -0800, Noah Misch wrote:
> On Mon, Nov 09, 2020 at 02:56:53PM -0500, Bruce Momjian wrote:
> > On Mon, Nov  2, 2020 at 11:05:15PM -0800, Noah Misch wrote:
> > > My plan is for the default to become:
> > > 
> > >   GRANT USAGE ON SCHEMA public TO PUBLIC;
> > >   ALTER SCHEMA public OWNER TO DATABASE_OWNER;  -- new syntax
> > 
> > Seems it would be better to create a predefined role that owns the
> > public schema, or at least has create permission for the public schema
> > --- that way, when you are creating a role, you can decide if the role
> > should have creation permissions in the public schema, rather than
> > having people always using the database owner for this purpose.
> 
> Defaulting to a specific predefined role empowers the role's members in all
> databases simultaneously.  Folks who want it like that can create a role and
> issue "ALTER SCHEMA public OWNER TO that_role" in template1.  What's the
> better default?  I think that depends on whether you regard this schema as a
> per-database phenomenon or a per-cluster phenomenon.

Ah, I see your point.  I was just thinking we don't want everyone
logging in as the db user, or given super-user permissions, so haveing a
non-login role would help, but we can just document how to do it.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2

2020-11-12 Thread Michael Paquier
On Thu, Nov 05, 2020 at 03:41:23PM +0900, Michael Paquier wrote:
> This conflicted on HEAD with pgcrypto.  Please find attached a rebased
> set.

I got to think more about this stuff and attached is a new patch set
that redesigns the generic interface used for the crypto hash
functions, in order to use the same entry point at the end for SHA2,
SHA1, MD5 or even HMAC.  This is part of 0001:
- Introduction of a single file called cryptohash[_openssl].c, which
includes five functions to create, initialize, update, finalize and
free a crypto hash context.  The attached does the work for SHA2.
- The fallback implementations are in their own file in src/common/,
and get included in cryptohash.c.  cryptohash_openssl.c is much more
simple as it needs to use EVP for everything.
- Adding a new crypto function in the set is simple once this is done,
as a type needs to be added with the correct options plugged in.

0002 and 0003 don't have any changes.  I think that we could also
rename the existing cryptohashes.c to crypohashfuncs.c to be more
consistent, but I have left that out for now.
--
Michael
From 7175bb4610114c4d04c43c35e176d4c67241ebdd Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 13 Nov 2020 11:47:25 +0900
Subject: [PATCH v4 1/3] Rework SHA2 and crypto hash APIs

This will make easier a switch to EVP for the OpenSSL SHA2 layer.  Note
that the layer introduced here is generalized for the purpose of a
future integration with HMAC, MD5, and even more.
---
 src/include/common/checksum_helper.h  |  13 +-
 src/include/common/cryptohash.h   |  40 +
 src/include/common/scram-common.h |  17 +-
 src/include/common/sha2.h |  51 +-
 src/include/replication/backup_manifest.h |   3 +-
 src/backend/libpq/auth-scram.c|  94 +++
 src/backend/replication/backup_manifest.c |  25 ++-
 src/backend/replication/basebackup.c  |  24 ++-
 src/backend/utils/adt/cryptohashes.c  |  53 --
 src/common/Makefile   |   7 +-
 src/common/checksum_helper.c  |  79 +++--
 src/common/cryptohash.c   | 191 +
 src/common/cryptohash_openssl.c   | 196 ++
 src/common/scram-common.c | 165 --
 src/common/sha2.c |  63 +--
 src/common/sha2_openssl.c | 102 ---
 src/bin/pg_verifybackup/parse_manifest.c  |  15 +-
 src/bin/pg_verifybackup/pg_verifybackup.c |  24 ++-
 src/interfaces/libpq/fe-auth-scram.c  | 114 +++--
 contrib/pgcrypto/internal-sha2.c  | 188 +
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 src/tools/pgindent/typedefs.list  |   1 +
 22 files changed, 951 insertions(+), 518 deletions(-)
 create mode 100644 src/include/common/cryptohash.h
 create mode 100644 src/common/cryptohash.c
 create mode 100644 src/common/cryptohash_openssl.c
 delete mode 100644 src/common/sha2_openssl.c

diff --git a/src/include/common/checksum_helper.h b/src/include/common/checksum_helper.h
index 48b0745dad..b07a34e7e4 100644
--- a/src/include/common/checksum_helper.h
+++ b/src/include/common/checksum_helper.h
@@ -14,6 +14,7 @@
 #ifndef CHECKSUM_HELPER_H
 #define CHECKSUM_HELPER_H
 
+#include "common/cryptohash.h"
 #include "common/sha2.h"
 #include "port/pg_crc32c.h"
 
@@ -41,10 +42,10 @@ typedef enum pg_checksum_type
 typedef union pg_checksum_raw_context
 {
 	pg_crc32c	c_crc32c;
-	pg_sha224_ctx c_sha224;
-	pg_sha256_ctx c_sha256;
-	pg_sha384_ctx c_sha384;
-	pg_sha512_ctx c_sha512;
+	pg_cryptohash_ctx *c_sha224;
+	pg_cryptohash_ctx *c_sha256;
+	pg_cryptohash_ctx *c_sha384;
+	pg_cryptohash_ctx *c_sha512;
 } pg_checksum_raw_context;
 
 /*
@@ -66,8 +67,8 @@ typedef struct pg_checksum_context
 extern bool pg_checksum_parse_type(char *name, pg_checksum_type *);
 extern char *pg_checksum_type_name(pg_checksum_type);
 
-extern void pg_checksum_init(pg_checksum_context *, pg_checksum_type);
-extern void pg_checksum_update(pg_checksum_context *, const uint8 *input,
+extern int	pg_checksum_init(pg_checksum_context *, pg_checksum_type);
+extern int	pg_checksum_update(pg_checksum_context *, const uint8 *input,
 			   size_t len);
 extern int	pg_checksum_final(pg_checksum_context *, uint8 *output);
 
diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h
new file mode 100644
index 00..775c975451
--- /dev/null
+++ b/src/include/common/cryptohash.h
@@ -0,0 +1,40 @@
+/*-
+ *
+ * cryptohash.h
+ *	  Generic headers for cryptographic hash functions.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		  src/include/common/cryptohash.h
+ *
+ *-
+ */
+
+#ifndef 

Re: public schema default ACL

2020-11-12 Thread Noah Misch
On Mon, Nov 09, 2020 at 02:56:53PM -0500, Bruce Momjian wrote:
> On Mon, Nov  2, 2020 at 11:05:15PM -0800, Noah Misch wrote:
> > My plan is for the default to become:
> > 
> >   GRANT USAGE ON SCHEMA public TO PUBLIC;
> >   ALTER SCHEMA public OWNER TO DATABASE_OWNER;  -- new syntax
> 
> Seems it would be better to create a predefined role that owns the
> public schema, or at least has create permission for the public schema
> --- that way, when you are creating a role, you can decide if the role
> should have creation permissions in the public schema, rather than
> having people always using the database owner for this purpose.

Defaulting to a specific predefined role empowers the role's members in all
databases simultaneously.  Folks who want it like that can create a role and
issue "ALTER SCHEMA public OWNER TO that_role" in template1.  What's the
better default?  I think that depends on whether you regard this schema as a
per-database phenomenon or a per-cluster phenomenon.




Use extended statistics to estimate (Var op Var) clauses

2020-11-12 Thread Tomas Vondra
Hi,

Attached is a patch to allow estimation of (Var op Var) clauses using
extended statistics. Currently we only use extended stats to estimate
(Var op Const) clauses, which is sufficient for most cases, but it's not
very hard to support this second type of clauses.

This is not an entirely new patch - I've originally included it in the
patch series in [1] but it's probably better to discuss it separately,
so that it does not get buried in that discussion.

[1]
https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development

To illustrate the purpose of this patch, consider this:

db=# create table t (a int, b int);
CREATE TABLE

db=# insert into t select mod(i,10), mod(i,10)+1
   from generate_series(1,10) s(i);
INSERT 0 10

db=# analyze t;
ANALYZE

db=# explain select * from t where a < b;
   QUERY PLAN

 Seq Scan on t  (cost=0.00..1693.00 rows=3 width=8)
   Filter: (a < b)
(2 rows)

db=# explain select * from t where a > b;
   QUERY PLAN

 Seq Scan on t  (cost=0.00..1693.00 rows=3 width=8)
   Filter: (a > b)
(2 rows)

db=# create statistics s (mcv) on a,b from t;
CREATE STATISTICS

db=# analyze t;
ANALYZE

db=# explain select * from t where a < b;
   QUERY PLAN
-
 Seq Scan on t  (cost=0.00..1693.00 rows=10 width=8)
   Filter: (a < b)
(2 rows)

db=# explain select * from t where a > b;
 QUERY PLAN

 Seq Scan on t  (cost=0.00..1693.00 rows=1 width=8)
   Filter: (a > b)
(2 rows)


I'm not entirely convinced this patch (on it's own) is very useful, for
a couple of reasons:

(a) Clauses of this form are not particularly common, at least compared
to the Var op Const clauses. (I don't recall slow-query reports from any
of our mailing lists that might be attributed to such clauses.)

(b) For known cases of such queries (e.g. several TPC-H queries do use
clauses like "l_commitdate < l_receiptdate" etc.) this is somewhat
hindered by extended statistics only supporting MCV lists, which may not
work particularly well for high-cardinality columns like dates etc.

But despite that it seems like a useful feature / building block, and
those limitations may be addressed in some other way (e.g. we may add
multi-dimensional histograms to address the second one).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 439d39db128b9cbc06063a862283d663510d0fcc Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 13 Nov 2020 01:46:50 +0100
Subject: [PATCH] Support estimation of clauses of the form Var op Var

Until now, extended statistics were used only to estimate clauses of the
form (Var op Const). This patch extends the estimation to also handle
clauses (Var op Var).
---
 src/backend/statistics/extended_stats.c   | 67 +
 src/backend/statistics/mcv.c  | 71 +-
 .../statistics/extended_stats_internal.h  |  2 +-
 src/test/regress/expected/stats_ext.out   | 96 +++
 src/test/regress/sql/stats_ext.sql| 26 +
 5 files changed, 243 insertions(+), 19 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 36326927c6..1a6e7a3fc0 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -987,14 +987,18 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 	{
 		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
-		Var		   *var;
+		Var		   *var,
+   *var2;
 
 		/* Only expressions with two arguments are considered compatible. */
 		if (list_length(expr->args) != 2)
 			return false;
 
-		/* Check if the expression has the right shape (one Var, one Const) */
-		if (!examine_clause_args(expr->args, , NULL, NULL))
+		/*
+		 * Check if the expression has the right shape (one Var and one Const,
+		 * or two Vars).
+		 */
+		if (!examine_clause_args(expr->args, , , NULL, NULL))
 			return false;
 
 		/*
@@ -1034,7 +1038,20 @@ statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
 			!get_func_leakproof(get_opcode(expr->opno)))
 			return false;
 
-		return statext_is_compatible_clause_internal(root, (Node *) var,
+		/*
+		 * Check compatibility of the first Var - we get this one for both
+		 * types of supported expressions (Var op Const) and (Var op Var).
+		 */
+		if (!statext_is_compatible_clause_internal(root, (Node *) var,
+   relid, attnums))
+			return false;
+
+		/* For (Var op Const) we don't get the second Var, and 

RE: Detecting File Damage & Inconsistencies

2020-11-12 Thread tsunakawa.ta...@fujitsu.com
From: Simon Riggs 
> If a rogue user/process is suspected, this would allow you to identify
> more easily the changes made by specific sessions/users.

Isn't that kind of auditing a job of pgAudit or log_statement = mod?  Or, does 
"more easily" mean that you find pgAudit complex to use and/or log_statement's 
overhead is big?


Regards
Takayuki Tsunakawa



Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-12 Thread Michael Paquier
On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:
> Interesting enough, similar discussion happened about vaccumFlags before
> with the same conclusion that theoretically it's fine to update without
> holding the lock, but this assumption could change one day and it's
> better to avoid such risks. Having said that I believe it makes sense to
> continue with locking. Are there any other opinions? I'll try to
> benchmark it in the meantime.

Thanks for planning some benchmarking for this specific patch.  I have
to admit that the possibility of switching vacuumFlags to use atomics
is very appealing in the long term, with or without considering this
patch, even if we had better be sure that this patch has no actual
effect on concurrency first if atomics are not used in worst-case
scenarios.
--
Michael


signature.asc
Description: PGP signature


Re: Optimising latch signals

2020-11-12 Thread Thomas Munro
On Sun, Aug 9, 2020 at 11:48 PM Thomas Munro  wrote:
> Here are some more experimental patches to reduce system calls.
>
> 0001 skips sending signals when the recipient definitely isn't
> waiting, using a new flag-and-memory-barrier dance.  This seems to
> skip around 12% of the kill() calls for "make check", and probably
> helps with some replication configurations that do a lot of
> signalling.  Patch by Andres (with a small memory barrier adjustment
> by me).
>
> 0002 gets rid of the latch self-pipe on Linux systems.
>
> 0003 does the same on *BSD/macOS systems.

Here's a rebase over the recent signal handler/mask reorganisation.

Some thoughts, on looking at this again after a while:

1.  It's a bit clunky that pqinitmask() takes a new argument to say
whether SIGURG should be blocked; that's because the knowledge of
which latch implementation we're using is private to latch.c, and only
the epoll version needs to block it.  I wonder how to make that
tidier.
2.  It's a bit weird to have UnBlockSig (SIGURG remains blocked for
epoll builds) and UnBlockAllSig (SIGURG is also unblocked).  Maybe the
naming is confusing.
3.  Maybe it's strange to continue to use overloaded SIGUSR1 on
non-epoll, non-kqueue systems; perhaps we should use SIGURG
everywhere.
4.  As a nano-optimisation, SetLatch() on a latch the current process
owns might as well use raise(SIGURG) rather than kill().  This is
necessary to close races when SetLatch(MyLatch) runs in a signal
handler.  In other words, although this patch uses signal blocking to
close the race when other processes call SetLatch() and send us
SIGURG, there's still a race if, say, SIGINT is sent to the
checkpointer and it sets its own latch from its SIGINT handler
function; in the user context it may be in WaitEventSetWait() having
just seen latch->is_set == false, and now be about to enter
epoll_pwait()/kevent() after the signal handler returns, so we need to
give it a reason not to go to sleep.

By way of motivation for removing the self-pipe, and where possible
also the signal handler, here is a trace of the WAL writer handling
three requests to write data, on a FreeBSD system, with the patch:

kevent(9,0x0,0,{ SIGURG,... },1,{ 0.2 }) = 1 (0x1)
pwrite(4,"\b\M-Q\^E\0\^A\0\0\0\0\0\M-/\^\"...,8192,0xaf) = 8192 (0x2000)
kevent(9,0x0,0,{ SIGURG,... },1,{ 0.2 }) = 1 (0x1)
pwrite(4,"\b\M-Q\^E\0\^A\0\0\0\0 \M-/\^\\0"...,8192,0xaf2000) = 8192 (0x2000)
kevent(9,0x0,0,{ SIGURG,... },1,{ 0.2 }) = 1 (0x1)
pwrite(4,"\b\M-Q\^E\0\^A\0\0\0\0`\M-/\^\\0"...,8192,0xaf6000) = 8192 (0x2000)

Here is the same thing on unpatched master:

kevent(11,0x0,0,0x801c195b0,1,{ 0.2 })   ERR#4 'Interrupted system call'
SIGNAL 30 (SIGUSR1) code=SI_USER pid=66575 uid=1001
sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0) = 0 (0x0)
write(10,"\0",1) = 1 (0x1)
sigreturn(0x7fffc880)EJUSTRETURN
pwrite(4,"\b\M-Q\^E\0\^A\0\0\0\0`\M-w)\0\0"...,8192,0xf76000) = 8192 (0x2000)
kevent(11,0x0,0,{ 9,EVFILT_READ,0x0,0,0x1,0x801c19580 },1,{
0.2 }) = 1 (0x1)
read(9,"\0",16)  = 1 (0x1)
kevent(11,0x0,0,0x801c195b0,1,{ 0.2 })   ERR#4 'Interrupted system call'
SIGNAL 30 (SIGUSR1) code=SI_USER pid=66575 uid=1001
sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0) = 0 (0x0)
write(10,"\0",1) = 1 (0x1)
sigreturn(0x7fffc880)EJUSTRETURN
pwrite(4,"\b\M-Q\^E\0\^A\0\0\0\0 \M-y)\0\0"...,8192,0xf92000) = 8192 (0x2000)
kevent(11,0x0,0,{ 9,EVFILT_READ,0x0,0,0x1,0x801c19580 },1,{
0.2 }) = 1 (0x1)
SIGNAL 30 (SIGUSR1) code=SI_USER pid=66575 uid=1001
sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0) = 0 (0x0)
write(10,"\0",1) = 1 (0x1)
sigreturn(0x7fffc880)EJUSTRETURN
read(9,"\0\0",16)= 2 (0x2)
kevent(11,0x0,0,0x801c195b0,1,{ 0.2 })   ERR#4 'Interrupted system call'
SIGNAL 30 (SIGUSR1) code=SI_USER pid=66575 uid=1001
sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0) = 0 (0x0)
write(10,"\0",1) = 1 (0x1)
sigreturn(0x7fffc880)EJUSTRETURN
pwrite(4,"\b\M-Q\^E\0\^A\0\0\0\0\0\M-z)\0"...,8192,0xfa) = 8192 (0x2000)
kevent(11,0x0,0,{ 9,EVFILT_READ,0x0,0,0x1,0x801c19580 },1,{
0.2 }) = 1 (0x1)
read(9,"\0",16)  = 1 (0x1)

The improvement isn't quite as good on Linux, because as far as I can
tell you still have to have an (empty) signal handler installed and it
runs (can we find a way to avoid that?), but you still get to skip all
the pipe manipulation.
From 8dd1ae41ac3f2c8b378dfad64bbfe94eb41d39ab Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 8 Aug 2020 15:08:09 +1200
Subject: [PATCH v2 1/4] Optimize latches to send fewer signals.

Don't send signals to processes that aren't currently sleeping.

Author: Andres Freund 
Discussion: 

Re: recovery_target immediate timestamp

2020-11-12 Thread Euler Taveira
On Wed, 11 Nov 2020 at 22:40, Fujii Masao 
wrote:

>
> On 2020/11/12 6:00, Euler Taveira wrote:
>
> > The first patch adds a new message that prints the latest completed
> checkpoint
> > when the consistent state is reached.
>
> I'm not sure how useful this information is in practice.
>
> Fujii, thanks for reviewing it. It provides the same information as the
"last
completed transaction was" message.


> > It also exposes the checkpoint timestamp
> > in debug messages.
>
> ereport(DEBUG1,
> (errmsg("checkpoint record is at
> %X/%X",
> (uint32)
> (checkPointLoc >> 32), (uint32) checkPointLoc)));
> +   ereport(DEBUG1,
> +   (errmsg("checkpoint time is %s",
> str_time(checkPoint.time;
>
> The above first debug message displays the LSN of the checkpoint record.
> OTOH, the second message displays the time when the checkpoint started
> (not the time when checkpoint record was written at the end of checkpoint).
> So isn't it confusing to display those inconsistent information together?
>
> Indeed the checkpoint timestamp is from before it determines the REDO LSN.
Are
you saying the this checkpoint timestamp is inconsistent because it is not
near
the point it saves the RedoRecPtr? If so, let's move checkPoint.time a few
lines below.

/*
 * Here we update the shared RedoRecPtr for future XLogInsert calls;
this
 * must be done while holding all the insertion locks.
 *
 * Note: if we fail to complete the checkpoint, RedoRecPtr will be left
 * pointing past where it really needs to point.  This is okay; the only
 * consequence is that XLogInsert might back up whole buffers that it
 * didn't really need to.  We can't postpone advancing RedoRecPtr
because
 * XLogInserts that happen while we are dumping buffers must assume that
 * their buffer changes are not included in the checkpoint.
 */
RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
checkPoint.time = (pg_time_t) time(NULL);

I realized that I was using the wrong variable in one of the debug messages.


> > The second patch provides the checkpoint timestamp in the pg_waldump
> output and
> > also when you enable wal_debug parameter. The pg_waldump output looks
> like
>
> +1
>
> +#ifdef FRONTEND
> +   strftime(checkpointstr, sizeof(checkpointstr), "%c",
> localtime(_tmp));
> +#else
> +   pg_strftime(checkpointstr, sizeof(checkpointstr), "%c",
> pg_localtime(_tmp, log_timezone));
> +#endif
>
> You can simplify the code by using timestamptz_to_str() here instead, like
> xact_desc_commit() does.
>
> I have the same idea until I realized that checkPoint.time is pg_time_t
and not
TimestampTz. [digging the code a bit...] I figure out there is a function
that
converts from pg_time_t to TimestampTz: time_t_to_timestamptz(). I removed
that
ugly code but have to duplicate this function into compat.c. I don't have a
strong preference but I'm attaching a new patch.

At the end, I asked myself if it is worth changing this type from pg_time_t
to
TimestampTz.


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From df8f4d295ebc205d46e77c5037c399afda17c546 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Wed, 11 Nov 2020 11:24:25 -0300
Subject: [PATCH 1/2] Report latest completed checkpoint timestamp

A new detail message prints the latest completed checkpoint when
reaching the consistent state. This information is useful when you use
recovery_target = immediate because it is not clear what timestamp it
stops applying the WAL.

It also adds new debug messages that report the checkpoint timestamp.
It might be useful for debug purposes.
---
 src/backend/access/transam/xlog.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa63f37615..2ff83c49e2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6524,6 +6524,8 @@ StartupXLOG(void)
 			ereport(DEBUG1,
 	(errmsg("checkpoint record is at %X/%X",
 			(uint32) (checkPointLoc >> 32), (uint32) checkPointLoc)));
+			ereport(DEBUG1,
+	(errmsg("checkpoint time is %s", str_time(checkPoint.time;
 			InRecovery = true;	/* force recovery even if SHUTDOWNED */
 
 			/*
@@ -6657,6 +6659,8 @@ StartupXLOG(void)
 			ereport(DEBUG1,
 	(errmsg("checkpoint record is at %X/%X",
 			(uint32) (checkPointLoc >> 32), (uint32) checkPointLoc)));
+			ereport(DEBUG1,
+	(errmsg("checkpoint time is %s", str_time(ControlFile->checkPointCopy.time;
 		}
 		else
 		{
@@ -8033,7 +8037,9 @@ CheckRecoveryConsistency(void)
 		ereport(LOG,
 (errmsg("consistent recovery state reached at %X/%X",
 		(uint32) (lastReplayedEndRecPtr >> 32),
-		(uint32) 

Re: Deleting older versions in unique indexes to avoid page splits

2020-11-12 Thread Peter Geoghegan
On Thu, Nov 12, 2020 at 3:00 PM Peter Geoghegan  wrote:
> Attached is v8, which has the enhancements for low cardinality data
> that I mentioned earlier today. It also simplifies the logic for
> dealing with posting lists that we need to delete some TIDs from.
> These posting list simplifications also make the code a bit more
> efficient, which might be noticeable during benchmarking.

One more thing: I repeated a pgbench test that was similar to my
earlier low cardinality tests -- same indexes (fiver, tenner, score,
aid_pkey_include_abalance). And same queries. But longer runs: 4 hours
each. Plus a larger DB: scale 2,500. Plus a rate-limit of 5000 TPS.

Here is the high level report, with 4 runs -- one pair with 16
clients, another pair with 32 clients:

2020-11-11 19:03:26 -0800 - Start of initial data load for run
"patch.r1c16" (DB is also used by later runs)
2020-11-11 19:18:16 -0800 - End of initial data load for run "patch.r1c16"
2020-11-11 19:18:16 -0800 - Start of pgbench run "patch.r1c16"
2020-11-11 23:18:43 -0800 - End of pgbench run "patch.r1c16":
patch.r1c16.bench.out: "tps = 4999.16 (including connections
establishing)" "latency average = 3.355 ms" "latency stddev = 58.455
ms"
2020-11-11 23:19:12 -0800 - Start of initial data load for run
"master.r1c16" (DB is also used by later runs)
2020-11-11 23:34:33 -0800 - End of initial data load for run "master.r1c16"
2020-11-11 23:34:33 -0800 - Start of pgbench run "master.r1c16"
2020-11-12 03:35:01 -0800 - End of pgbench run "master.r1c16":
master.r1c16.bench.out: "tps = 5000.061623 (including connections
establishing)" "latency average = 8.591 ms" "latency stddev = 64.851
ms"
2020-11-12 03:35:41 -0800 - Start of pgbench run "patch.r1c32"
2020-11-12 07:36:10 -0800 - End of pgbench run "patch.r1c32":
patch.r1c32.bench.out: "tps = 5000.141420 (including connections
establishing)" "latency average = 1.253 ms" "latency stddev = 9.935
ms"
2020-11-12 07:36:40 -0800 - Start of pgbench run "master.r1c32"
2020-11-12 11:37:19 -0800 - End of pgbench run "master.r1c32":
master.r1c32.bench.out: "tps = 5000.542942 (including connections
establishing)" "latency average = 3.069 ms" "latency stddev = 24.640
ms"
2020-11-12 11:38:18 -0800 - Start of pgbench run "patch.r2c16"

We see a very significant latency advantage for the patch here. Here
is the breakdown on query latency from the final patch run,
patch.r1c32:

scaling factor: 2500
query mode: prepared
number of clients: 32
number of threads: 8
duration: 14400 s
number of transactions actually processed: 72002280
latency average = 1.253 ms
latency stddev = 9.935 ms
rate limit schedule lag: avg 0.406 (max 694.645) ms
tps = 5000.141420 (including connections establishing)
tps = 5000.142503 (excluding connections establishing)
statement latencies in milliseconds:
 0.002  \set aid1 random_gaussian(1, 10 * :scale, 4.0)
 0.001  \set aid2 random_gaussian(1, 10 * :scale, 4.5)
 0.001  \set aid3 random_gaussian(1, 10 * :scale, 4.2)
 0.001  \set bid random(1, 1 * :scale)
 0.001  \set tid random(1, 10 * :scale)
 0.001  \set delta random(-5000, 5000)
 0.063  BEGIN;
 0.361  UPDATE pgbench_accounts SET abalance = abalance +
:delta WHERE aid = :aid1;
 0.171  SELECT abalance FROM pgbench_accounts WHERE aid = :aid2;
 0.172  SELECT abalance FROM pgbench_accounts WHERE aid = :aid3;
 0.074  END;

Here is the equivalent for master:

scaling factor: 2500
query mode: prepared
number of clients: 32
number of threads: 8
duration: 14400 s
number of transactions actually processed: 72008125
latency average = 3.069 ms
latency stddev = 24.640 ms
rate limit schedule lag: avg 1.695 (max 1097.628) ms
tps = 5000.542942 (including connections establishing)
tps = 5000.544213 (excluding connections establishing)
statement latencies in milliseconds:
 0.002  \set aid1 random_gaussian(1, 10 * :scale, 4.0)
 0.001  \set aid2 random_gaussian(1, 10 * :scale, 4.5)
 0.001  \set aid3 random_gaussian(1, 10 * :scale, 4.2)
 0.001  \set bid random(1, 1 * :scale)
 0.001  \set tid random(1, 10 * :scale)
 0.001  \set delta random(-5000, 5000)
 0.078  BEGIN;
 0.560  UPDATE pgbench_accounts SET abalance = abalance +
:delta WHERE aid = :aid1;
 0.320  SELECT abalance FROM pgbench_accounts WHERE aid = :aid2;
 0.308  SELECT abalance FROM pgbench_accounts WHERE aid = :aid3;
 0.102  END;

So even the UPDATE is much faster here.

This is also something we see with pg_statio_tables, which looked like
this by the end for patch:

-[ RECORD 1 ]---+-
schemaname  | public
relname | pgbench_accounts
heap_blks_read  | 117,384,599
heap_blks_hit   | 1,051,175,835
idx_blks_read   | 24,761,513
idx_blks_hit| 4,024,776,723

For the patch:

-[ RECORD 1 ]---+-
schemaname  | public
relname | pgbench_accounts
heap_blks_read  | 

Re: Add important info about ANALYZE after create Functional Index

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 03:11:43PM -0600, Justin Pryzby wrote:
> I guess it should say "The system regularly ..."
> 
> Also, the last sentence begins "For new expression indexes" and ends with
> "about new expression indexes", which I guess could instead say "about the
> expressions".

How is this followup patch?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 844f00904b..29dee5689e 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -746,13 +746,13 @@ Indexes:
   
 
   
-   The regularly system collects statistics on all of a table's
+   The system regularly collects statistics on all of a table's
columns.  Newly-created non-expression indexes can immediately
use these statistics to determine an index's usefulness.
For new expression indexes, it is necessary to run ANALYZE or wait for
the autovacuum daemon to analyze
-   the table to generate statistics about new expression indexes.
+   the table to generate statistics for these indexes.
   
 
   


Re: Implement for window functions

2020-11-12 Thread Vik Fearing
On 11/12/20 11:35 PM, Krasiyan Andreev wrote:
> Hi, it looks like Vik Fearing's patch does not apply anymore, because there
> are many conflicts with recent changes, fixed patch attached.

Thanks!  I've been away from this list for a while and I have some
catching up to do.
-- 
Vik Fearing




Re: Zedstore - compressed in-core columnar storage

2020-11-12 Thread Tomas Vondra
Hi,

Thanks for the updated patch. It's a quite massive amount of code - I I
don't think we had many 2MB patches in the past, so this is by no means
a full review.

1) the psql_1.out is missing a bit of expected output (due to 098fb0079)

2) I'm getting crashes in intarray contrib, due to hitting this error in
lwlock.c (backtrace attached):

/* Ensure we will have room to remember the lock */
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");

I haven't investigates this too much, but it's regular build with
asserts and TAP tests, so it should be simple to reproduce using "make
check-world" I guess.


3) I did a very simple benchmark, loading a TPC-H data (for 75GB),
followed by pg_dump, and the duration (in seconds) looks like this:

   masterzedstore/pglzzedstore/lz4
  -
   copy  1855680922131
   dump   751  905 811

And the size of the lineitem table (as shown by \d+) is:

  master: 64GB
  zedstore/pglz: 51GB
  zedstore/lz4: 20GB

It's mostly expected lz4 beats pglz in performance and compression
ratio, but this seems a bit too extreme I guess. Per past benchmarks
(e.g. [1] and [2]) the difference in compression/decompression time
should be maybe 1-2x or something like that, not 35x like here.

[1]
https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de

[2]
https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de

Furthermore, the pglz compression is not consuming the most CPU, at
least that's what perf says:

24.82%  postgres  [.] encode_chunk_varlen
20.49%  postgres  [.] decode_chunk
13.01%  postgres  [.] merge_attstream_guts.isra.0
12.68%  libc-2.32.so  [.] __memmove_avx_unaligned_erms
 8.72%  postgres  [.] encode_chunk_fixed
 6.16%  postgres  [.] pglz_compress
 4.36%  postgres  [.] decode_attstream_cont
 2.27%  postgres  [.] 0x000baff0
 1.84%  postgres  [.] AllocSetAlloc
 0.79%  postgres  [.] append_attstream
 0.70%  postgres  [.] palloc

So I wonder if this is a sign of a deeper issue - maybe the lower
compression ratio (for pglz) triggers some sort of feedback loop in
zedstore, or something like that? Not sure, but this seems strange.


4) I looked at some of the code, like merge_attstream etc. and I wonder
if this might be related to some of the FIXME comments. For example this
bit in merge_attstream seems interesting:

* FIXME: we don't actually pay attention to the compression anymore.
* We never repack.
* FIXME: this is backwords, the normal fast path is if (firsttid1 >
lasttid2)

But I suppose that should affect both pglz and lz4, and I'm not sure how
up to date those comments actually are.

BTW the comments in general need updating and tidying up, to make
reviews easier. For example the merge_attstream comment references
attstream1 and attstream2, but those are not the current parameters of
the function.


5) IHMO there should be a #define specifying the maximum number of items
per chunk (60). Currently there are literal constants used in various
places, sometimes 60, sometimes 59 etc. which makes it harder to
understand the code. FWIW 60 seems a bit low, but maybe it's OK.


6) I do think ZSAttStream should track which compression is used by the
stream, for two main reasons. Firstly, there's another patch to support
"custom compression" methods, which (also) allows multiple compression
methods per column. It'd be a bit strange to support that for varlena
columns in heap table, and not here, I guess. Secondly, I think one of
the interesting columnstore features down the road will be execution on
compressed data, which however requires compression method designed for
that purpose, and it's often datatype-specific (delta encoding, ...).

I don't think we need to go as far as supporting "custom" compression
methods here, but I think we should allow different built-in compression
methods for different attstreams.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Core was generated by `postgres: user contrib_regression [local] CREATE INDEX   
 '.
Program terminated with signal SIGABRT, Aborted.
#0  0x74f9452239e5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.31-4.fc32.x86_64
(gdb) bt
#0  0x74f9452239e5 in raise () from /lib64/libc.so.6
#1  0x74f94520c895 in abort () from /lib64/libc.so.6
#2  0x00936246 in errfinish (filename=, 
lineno=, funcname=0xab4660 <__func__.5> "LWLockAcquire") at 
elog.c:592
#3  0x007fa895 in LWLockAcquire (lock=0x74f93bcb0400, 
mode=LW_EXCLUSIVE) at lwlock.c:1240
#4  0x0053c850 in AdvanceXLInsertBuffer (upto=28450816, 

Re: Implement for window functions

2020-11-12 Thread Krasiyan Andreev
Hi, it looks like Vik Fearing's patch does not apply anymore, because there
are many conflicts with recent changes, fixed patch attached.
I am interested in reviewing and testing it for the next commitfest, if
it's design and implementation is found to be acceptable.
Additionally, if it is also acceptable, I can add support for handling
negative indexes for nth_value(), to be able to reverse order from
first/from last for the window frame.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2783985b55..44c8d006d0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19735,6 +19735,14 @@ SELECT count(*) FROM sometable;
about frame specifications.
   
 
+  
+   The functions lead, lag,
+   first_value, last_value, and
+   nth_value accept a null treatment option which is
+   RESPECT NULLS or IGNORE NULLS.
+   If this option is not specified, the default is RESPECT NULLS.
+  
+
   
When an aggregate function is used as a window function, it aggregates
over the rows within the current row's window frame.
@@ -19748,14 +19756,9 @@ SELECT count(*) FROM sometable;
 
   

-The SQL standard defines a RESPECT NULLS or
-IGNORE NULLS option for lead, lag,
-first_value, last_value, and
-nth_value.  This is not implemented in
-PostgreSQL: the behavior is always the
-same as the standard's default, namely RESPECT NULLS.
-Likewise, the standard's FROM FIRST or FROM LAST
-option for nth_value is not implemented: only the
+The SQL standard defines a FROM FIRST or FROM LAST
+option for nth_value.  This is not implemented in
+PostgreSQL: only the
 default FROM FIRST behavior is supported.  (You can achieve
 the result of FROM LAST by reversing the ORDER BY
 ordering.)
diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml
index 3c1eaea651..31e08c26b4 100644
--- a/doc/src/sgml/ref/create_function.sgml
+++ b/doc/src/sgml/ref/create_function.sgml
@@ -28,6 +28,7 @@ CREATE [ OR REPLACE ] FUNCTION
   { LANGUAGE lang_name
 | TRANSFORM { FOR TYPE type_name } [, ... ]
 | WINDOW
+| TREAT NULLS
 | IMMUTABLE | STABLE | VOLATILE | [ NOT ] LEAKPROOF
 | CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
 | [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER
@@ -293,6 +294,17 @@ CREATE [ OR REPLACE ] FUNCTION
  
 
 
+
+ TREAT NULLS
+
+ 
+  TREAT NULLS indicates that the function is able
+  to handle the RESPECT NULLS and IGNORE
+  NULLS options.  Only window functions may specify this.
+  
+ 
+
+
 
  IMMUTABLE
  STABLE
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3fdd87823e..685454c1ec 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1770,6 +1770,8 @@ FROM generate_series(1,10) AS s(i);
 The syntax of a window function call is one of the following:
 
 
+function_name (expression , expression ... ) [ RESPECT NULLS | IGNORE NULLS ] OVER window_name
+function_name (expression , expression ... ) [ RESPECT NULLS | IGNORE NULLS ] OVER ( window_definition )
 function_name (expression , expression ... ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
 function_name (expression , expression ... ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )
 function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
@@ -1783,6 +1785,18 @@ FROM generate_series(1,10) AS s(i);
 [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ frame_clause ]
 
+   
+
+   
+
+ The versions with RESPECT NULLS or IGNORE
+ NULLS only apply to true window functions, whereas the versions
+ with FILTER only apply to aggregate functions used as
+ window functions.
+
+   
+
+   
 The optional frame_clause
 can be one of
 
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 7664bb6285..fea8778ec8 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -627,6 +627,7 @@ AggregateCreate(const char *aggName,
 	 * definable for agg) */
 			 false, /* isLeakProof */
 			 false, /* isStrict (not needed for agg) */
+			 false, /* null_treatment (not needed for agg) */
 			 PROVOLATILE_IMMUTABLE, /* volatility (not needed
 	 * for agg) */
 			 proparallel,
@@ -848,7 +849,7 @@ lookup_agg_function(List *fnName,
 			   nargs, input_types, false, false,
 			   , rettype, ,
 			   , ,
-			   _oid_array, NULL);
+			   _oid_array, NULL, NULL);
 
 	/* only valid case is a normal function not returning a set */
 	if (fdresult != FUNCDETAIL_NORMAL || !OidIsValid(fnOid))
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index 1dd9ecc063..2783b0d630 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -80,6 +80,7 @@ ProcedureCreate(const 

Re: [PATCH] remove deprecated v8.2 containment operators

2020-11-12 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-10-27 04:25, Justin Pryzby wrote:
>> They have been deprecated for a Long Time, so finally remove them for v14.
>> Four fewer exclamation marks makes the documentation less exciting, which is 
>> a
>> good thing.

> I have committed the parts that remove the built-in geometry operators 
> and the related regression test changes.

I'm on board with pulling these now --- 8.2 to v14 is plenty of
deprecation notice.  However, the patch seems incomplete in that
the code support for these is still there -- look for
RTOldContainedByStrategyNumber and RTOldContainsStrategyNumber.
Admittedly, there's not much to be removed except some case labels,
but it still seems like we oughta do that to avoid future confusion.

> The changes to the contrib modules appear to be incomplete in some ways. 
>   In cube, hstore, and seg, there are no changes to the extension 
> scripts to remove the operators.  All you're doing is changing the C 
> code to no longer recognize the strategy, but that doesn't explain what 
> will happen if the operator is still used.  In intarray, by contrast, 
> you're editing an existing extension script, but that should be done by 
> an upgrade script instead.

In the contrib modules, I'm afraid what you gotta do is remove the
SQL operator definitions but leave the opclass code support in place.
That's because there's no guarantee that users will update the extension's
SQL version immediately, so a v14 build of the .so might still be used
with the old SQL definitions.  It's not clear how much window we need
give for people to do that update, but I don't think "zero" is an
acceptable answer.

(The core code doesn't have to concern itself with such scenarios,
since we require the initial catalog contents to match the backend
major version.  Hence it is okay to remove the code support now in
the in-core opclasses.)

regards, tom lane




Re: Bogus documentation for bogus geometric operators

2020-11-12 Thread Tom Lane
Pavel Borisov  writes:
> [ v4-0001-Deprecate-and-replace-and-operators-for-points.patch ]

I made a cursory pass over this, and two things stood out to me:

1. The patch removes <^ and >^ from func.sgml, which is fine, but
shouldn't there be an addition for the new operators?  (I think
probably this need only be an addition of "point" as a possible
input type for <<| and |>>.)  Actually, as long we're not completely
removing these operators, I'm not sure it's okay to make them completely
undocumented.  Maybe instead of removing, change the text to be
"Deprecated, use the equivalent XXX operator instead."  Or we could
add a footnote similar to what was there for a previous renaming:

Note

Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called ~ and @. These names are still available, but
are deprecated and will eventually be removed.

2. I'm a bit queasy about removing these operators from the opclasses.
I'm not sure anyone will thank us for "the operator is still there, so
your query is still accepted, but it runs 1000X slower than before".
It seems like more plausible answers are either "nuke the operators
entirely, force people to change their queries now" or else "support
both old and new names in the opclasses for awhile to come".  In
previous operator renamings we've generally followed the second path,
so that's what I'm inclined to think should happen here.

regards, tom lane




Re: Deleting older versions in unique indexes to avoid page splits

2020-11-12 Thread Peter Geoghegan
On Wed, Nov 11, 2020 at 6:17 AM Victor Yegorov  wrote:
> I've looked at the latest (v7) patchset.
> I've decided to use a quite common (in my practice) setup with an indexed 
> mtime column over scale 1000 set:

Thanks for testing!

> We can see that:
> - unused index is not suffering from not-HOT updates at all, which is the 
> point of the patch
> - we have ordinary queries performing on the same level as on master
> - we have 5,2% slowdown in UPDATE speed

I think that I made a mistake with v7: I changed the way that we
detect low cardinality data during bottom-up deletion, which made us
do extra/early deduplication in more cases than we really should. I
suspect that this partially explains the slowdown in UPDATE latency
that you reported. I will fix this in v8.

I don't think that the solution is to go back to the v6 behavior in
this area, though. I now believe that this whole "proactive
deduplication for low cardinality data" thing only made sense as a way
of compensating for deficiencies in earlier versions of the patch.
Deficiencies that I've since fixed anyway. The best solution now is to
simplify. We can have generic criteria for "should we dedup the page
early after bottom-up deletion finishes without freeing up very much
space?". This seemed to work well during my latest testing. Probably
because heapam.c is now smart about the requirements from nbtree, as
well as the cost of accessing heap pages.

> I'm not sure if this should be counted as regression, though, as graphs go on 
> par pretty much.
> Still, I would like to understand why this combination of indexes and queries 
> slows down UPDATEs.

Another thing that I'll probably add to v8: Prefetching. This is
probably necessary just so I can have parity with the existing
heapam.c function that the new code is based on,
heap_compute_xid_horizon_for_tuples(). That will probably help here,
too.

> During compilation I got one warning for make -C contrib:

Oops.

> I agree with the rename to "bottom-up index deletion", using "vacuuming" 
> generally makes users think
> that functionality is used only during VACUUM (misleading).

Yeah. That's kind of a problem already, because sometimes we use the
word VACUUM when talking about the long established LP_DEAD deletion
stuff. But I see that as a problem to be fixed. Actually, I would like
to fix it very soon.

> I haven't looked at the code yet.

It would be helpful if you could take a look at the nbtree patch --
particularly the changes related to deprecating the page-level
BTP_HAS_GARBAGE flag. I would like to break those parts out into a
separate patch, and commit it in the next week or two. It's just
refactoring, really. (This commit can also make nbtree only use the
word VACUUM for things that strictly involve VACUUM. For example,
it'll rename _bt_vacuum_one_page() to _bt_delete_or_dedup_one_page().)

We almost don't care about the flag already, so there is almost no
behavioral change from deprecated BTP_HAS_GARBAGE in this way.

Indexes that use deduplication already don't rely on BTP_HAS_GARBAGE
being set ever since deduplication was added to Postgres 13 (the
deduplication code doesn't want to deal with LP_DEAD bits, and cannot
trust that no LP_DEAD bits can be set just because BTP_HAS_GARBAGE
isn't set in the special area). Trusting the BTP_HAS_GARBAGE flag can
cause us to miss out on deleting items with their LP_DEAD bits set --
we're better off "assuming that BTP_HAS_GARBAGE is always set", and
finding out if there really are LP_DEAD bits set for ourselves each
time.

Missing LP_DEAD bits like this can happen when VACUUM unsets the
page-level flag without actually deleting the items at the same time,
which is expected when the items became garbage (and actually had
their LP_DEAD bits set) after VACUUM began, but before VACUUM reached
the leaf pages. That's really wasteful, and doesn't actually have any
upside -- we're scanning all of the line pointers only when we're
about to split (or dedup) the same page anyway, so the extra work is
practically free.

-- 
Peter Geoghegan




Re: Proposition for autoname columns

2020-11-12 Thread Eugen Konkov

> On 2020-Nov-12, Tom Lane wrote:

>> On the whole, I'm on the side of the people who don't want to change this.
>> The implementation cost seems likely to greatly outweigh the value, plus
>> it feels more like a wart than a feature.

> I think if Eugen wants to spend some time with it and see how it could
> be implemented, then sent a patch for consideration, then we could make
> a better informed decision.  My own opinion is that it's not worth the
> trouble, but I'd rather us not stand in his way if he wants to try
> (With disclaimer that we might end up not liking the patch, of course).

Sorry, I am not C/C++ programmist and do not imagine how to start to patch.
I do not know internals of PG. The only useful thing from me is just that idea 
to make world better.

I suppose initially there were only ?column?, later names were implemented for 
count, sum etc
But it will be cool if PG will do step further and name sum( a ) as  sum_a 
instead of just sum

The purpose of this proposition is not about correct name generation, the 
purpose to get
more distinct default names:
?column?, ?column?, ?column?, ?column?, ?column?, ?column?, ?column?, 

?count?, ?count?, ?count?, ?sum?, ?sum?, ?sum?, ?sum?

?count_a?, ?count_b?, ?count_c?, ?sum_a?, ?sum_b?, ?sum_c?, ?sum_d?

Notice, that latest is more robust that first ;-)

I  suppose  we  just  ignore  comlex  cases  and left them as they are
current.  We  could  try some very very small step at the direction to
improve  default  names  and  see  feed back from many users how it is
useful  or  not. Then we can decide it worth or not to implement whole
system for default name generation.

Unfortunately I am not judje at which level those should occur: parser, 
analiser or so.
I just does not understand those things =(

Thank you.

-- 
Best regards,
Eugen Konkov





Re: Allow matching whole DN from a client certificate

2020-11-12 Thread Andrew Dunstan


On 11/12/20 8:37 AM, Daniel Gustafsson wrote:
>> On 11 Nov 2020, at 21:44, Andrew Dunstan  wrote:
>> If people like this idea I'll add tests and docco and add it to the next CF.
> Sounds like a good idea, please do.
>
> Can this case really happen in non-ancient OpenSSL version?
> + if (!x509name)


Probably not. I'll get rid of that.


> Doesn't this returnpath need a pfree(peer_cn)?
> + bio = BIO_new(BIO_s_mem());
> + if (!bio)
> + {
> + return -1;
> + }
>

Yeah, I'll make another pass over the cleanups.


Thanks for reviewing.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Add important info about ANALYZE after create Functional Index

2020-11-12 Thread Justin Pryzby
On Mon, Nov 09, 2020 at 06:27:20PM -0500, Bruce Momjian wrote:
> On Tue, Oct 27, 2020 at 12:12:00AM -0700, Nikolay Samokhvalov wrote:
> > On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello 
> >  wrote:
> > 
> > Would be nice if add some information about it into our docs but not 
> > sure
> > where. I'm thinking about:
> > - doc/src/sgml/ref/create_index.sgml
> > - doc/src/sgml/maintenance.sgml (routine-reindex)
> > 
> > 
> > Attaching the patches for the docs, one for 11 and older, and another for 
> > 12+
> > (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).
> > 
> > I still think that automating is the right thing to do but of course, it's a
> > much bigger topic that a quick fix dor the docs.
> 
> I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
> updated to mention the need to run ANALYZE or wait for autovacuum before
> expression indexes can be fully used by the optimizer.  Instead of
> putting this mention in the maintenance section, I thought the CREATE
> INDEX page make more sense, since it is more of a usability issue,
> rather than "why use expression indexes".  Patch attached, which I plan
> to apply to all supported branches.

The commited patch actually says:

--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -745,6 +745,16 @@ Indexes:
sort high, in queries that depend on indexes to avoid sorting steps.
   
 
+  
+   The regularly system collects statistics on all of a table's
+   columns.  Newly-created non-expression indexes can immediately
+   use these statistics to determine an index's usefulness.
+   For new expression indexes, it is necessary to run ANALYZE or wait for
+   the autovacuum daemon to analyze
+   the table to generate statistics about new expression indexes.
+  
+

I guess it should say "The system regularly ..."

Also, the last sentence begins "For new expression indexes" and ends with
"about new expression indexes", which I guess could instead say "about the
expressions".

> diff --git a/doc/src/sgml/ref/create_index.sgml 
> b/doc/src/sgml/ref/create_index.sgml
> new file mode 100644
> index 749db28..48c42db
> *** a/doc/src/sgml/ref/create_index.sgml
> --- b/doc/src/sgml/ref/create_index.sgml
> *** Indexes:
> *** 746,751 
> --- 746,761 
> 
>   
> 
> +The system collects statistics on all of a table's columns.
> +Newly-created non-expression indexes can immediately
> +use these statistics to determine an index's usefulness.
> +For new expression indexes, it is necessary to run  +linkend="sql-analyze">ANALYZE or wait for
> +the autovacuum daemon to analyze
> +the table to generate statistics about new expression indexes.
> +   
> + 
> +   
>  For most index methods, the speed of creating an index is
>  dependent on the setting of .
>  Larger values will reduce the time needed for index creation, so long


-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581




Re: avoid bitmapOR-ing indexes with scan condition inconsistent with partition constraint

2020-11-12 Thread Tom Lane
I started looking through this patch.  I really quite dislike solving
this via a kluge in indxpath.c.  There are multiple disadvantages
to that:

* It only helps for the very specific problem of redundant bitmap
index scans, whereas the problem of applying redundant qual checks
in partition scans seems pretty general.

* It's not unlikely that this will end up trying to make the same
proof multiple times (and the lack of any way to turn that off,
through constraint_exclusion or some other knob, isn't too cool).

* It does nothing to fix rowcount estimates in the light of the
knowledge that some of the restriction clauses are no-ops.  Now,
if we have up-to-date stats we'll probably manage to come out with
an appropriate 0 or 1 selectivity anyway, but we might not have those.
In any case, spending significant effort to estimate a selectivity
when some other part of the code has taken the trouble to *prove* the
clause true or false seems very undesirable.

* I'm not even convinced that the logic is correct, specifically that
it's okay to just "continue" if we refute the OR clause.  That seems
likely to break generate_bitmap_or_paths' surrounding loop logic about
"We must be able to match at least one index to each of the arms of
the OR".  At least, if that still works it requires more than zero
commentary about why.


So I like much better the idea of Konstantin's old patch, that we modify
the rel's baserestrictinfo list by removing quals that we can prove
true.  We could extend that to solve the bitmapscan problem by removing
OR arms that we can prove false.  So I started to review that patch more
carefully, and after awhile realized that it has a really fundamental
problem: it is trying to use CHECK predicates to prove WHERE clauses.
But we don't know that CHECK predicates are true, only that they are
not-false, and there is no proof mode in predtest.c that will allow
proving some clauses true based only on other ones being not-false.

We can salvage something by restricting the input quals to be only
partition quals, since those are built to be guaranteed-true-or-false;
we can assume they don't yield NULL.  There's a hole in that for
hashing, as I noted elsewhere, but we'll fail to prove anything anyway
from a satisfies_hash_partition() qual.  (In principle we could also use
attnotnull quals, which also have that property.  But I'm dubious that
that will help often enough to be worth the extra cycles for predtest.c
to process them.)

So after a bit of coding I had the attached.  This follows Konstantin's
original patch in letting relation_excluded_by_constraints() change
the baserestrictinfo list.  I read the comments in the older thread
about people not liking that, and I can see the point.  But I'm not
convinced that the later iterations of the patch were an improvement,
because (a) the call locations for
remove_restrictions_implied_by_constraints() seemed pretty random, and
(b) it seems necessary to have relation_excluded_by_constraints() and
remove_restrictions_implied_by_constraints() pretty much in bed with
each other if we don't want to duplicate constraint-fetching work.
Note the comment on get_relation_constraints() that it's called at
most once per relation; that's not something I particularly desire
to give up, because a relcache open isn't terribly cheap.  Also
(c) I think it's important that there be a way to suppress this
overhead when it's not useful.  In the patch as attached, turning off
constraint_exclusion does that since relation_excluded_by_constraints()
falls out before getting to the new code.  If we make
remove_restrictions_implied_by_constraints() independent then it
will need some possibly-quite-duplicative logic to check
constraint_exclusion.  (Of course, if we'd rather condition this
on some other GUC then that argument falls down.  But I think we
need something.)  So, I'm not dead set on this code structure, but
I haven't seen one I like better.

Anyway, this seems to work, and if the regression test changes are
any guide then it may fire often enough in the real world to be useful.
Nonetheless, I'm concerned about performance, because predtest.c is a
pretty expensive thing and there will be a lot of cases where the work
is useless.  I did a quick check using pgbench's option to partition
the tables, and observed that the -S (select only) test case seemed to
get about 2.5% slower with the patch than without.  That's not far
outside the noise floor, so maybe it's not real, but if it is real then
it seems pretty disastrous.  Perhaps we could avoid that problem by
removing the "predicate_implied_by" cases and only trying the
"predicate_refuted_by" case, so that no significant time is added
unless you've got an OR restriction clause on a partitioned table.
That seems like it'd lose a lot of the benefit though :-(.

So I'm not sure where to go from here.  Thoughts?  Anyone else
care to run some performance tests?

regards, tom lane

diff --git 

Re: Add important info about ANALYZE after create Functional Index

2020-11-12 Thread Bruce Momjian
On Mon, Nov  9, 2020 at 08:35:46PM -0300, Fabrízio de Royes Mello wrote:
> 
> 
> On Mon, 9 Nov 2020 at 20:27 Bruce Momjian  wrote:
> 
> 
> I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
> updated to mention the need to run ANALYZE or wait for autovacuum before
> expression indexes can be fully used by the optimizer.  Instead of
> putting this mention in the maintenance section, I thought the CREATE
> INDEX page make more sense, since it is more of a usability issue,
> rather than "why use expression indexes".  Patch attached, which I plan
> to apply to all supported branches.
>    
> 
> 
> Did a quick review and totally agree... thanks a lot Bruce to help us don’t
> miss it.

Patch applied to all branches.  Thanks for the review.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Aw: Re: Minor documentation error regarding streaming replication protocol

2020-11-12 Thread Bruce Momjian
On Thu, Oct 15, 2020 at 12:43:22PM -0400, Bruce Momjian wrote:
> On Thu, Oct 15, 2020 at 12:01:21PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > We would want the timeline history file contents label changed from
> > > BYTEA to TEXT in the docs changed for all supported versions, add a C
> > > comment to all backbranches that BYTEA is the same as TEXT protocol
> > > fields, and change the C code to return TEXT IN PG 14.  Is that what
> > > people want?
> > 
> > I still think there is no need to touch back branches.  What you
> > propose here is more likely to confuse people than help them.
> > Having the documentation disagree with the code about how the
> > field is labeled is not good either.
> 
> Understood.
> 
> > Furthermore, it absolutely does not make sense to say (or imply)
> > that the unknown-encoding business applies to all text fields.
> > There are a very small number of fields where we should say that.
> 
> Yes, I am seeing now that even IDENTIFY_SYSTEM above it properly does
> encoding.  I guess TIMELINE_HISTORY works this way because it is pulling
> from the file system, not from system tables.  I ended up with just a
> new doc sentence and C comment in back branches, and a relabeling of the
> timeline history 'content' field as TEXT in the C code and docs,
> attached.

I have applied the doc-only patch to back branches, and the data type
change to master;  the master change should cause no behavioral change.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposition for autoname columns

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 04:30:15PM -0300, Álvaro Herrera wrote:
> On 2020-Nov-12, Tom Lane wrote:
> 
> > On the whole, I'm on the side of the people who don't want to change this.
> > The implementation cost seems likely to greatly outweigh the value, plus
> > it feels more like a wart than a feature.
> 
> I think if Eugen wants to spend some time with it and see how it could
> be implemented, then sent a patch for consideration, then we could make
> a better informed decision.  My own opinion is that it's not worth the
> trouble, but I'd rather us not stand in his way if he wants to try
> (With disclaimer that we might end up not liking the patch, of course).

I think he would be better outlining how he wants it to behave before
even working on a patch;  from our TODO list:

Desirability -> Design -> Implement -> Test -> Review -> Commit

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposition for autoname columns

2020-11-12 Thread Alvaro Herrera
On 2020-Nov-12, Tom Lane wrote:

> On the whole, I'm on the side of the people who don't want to change this.
> The implementation cost seems likely to greatly outweigh the value, plus
> it feels more like a wart than a feature.

I think if Eugen wants to spend some time with it and see how it could
be implemented, then sent a patch for consideration, then we could make
a better informed decision.  My own opinion is that it's not worth the
trouble, but I'd rather us not stand in his way if he wants to try
(With disclaimer that we might end up not liking the patch, of course).




Re: Proposition for autoname columns

2020-11-12 Thread David G. Johnston
On Thursday, November 12, 2020, Bruce Momjian  wrote:

> On Thu, Nov 12, 2020 at 01:52:11PM -0500, Tom Lane wrote:
> > On the whole, I'm on the side of the people who don't want to change
> this.
> > The implementation cost seems likely to greatly outweigh the value, plus
> > it feels more like a wart than a feature.
>
> I think we can mark this as, "We thought about it, and we decided it is
> probably not a good idea."
>
>
+1

David J.


Re: Proposition for autoname columns

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 01:52:11PM -0500, Tom Lane wrote:
> On the whole, I'm on the side of the people who don't want to change this.
> The implementation cost seems likely to greatly outweigh the value, plus
> it feels more like a wart than a feature.

I think we can mark this as, "We thought about it, and we decided it is
probably not a good idea."

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposition for autoname columns

2020-11-12 Thread Tom Lane
"David G. Johnston"  writes:
> The query rewriter would only rewrite these expressions and provide an
> expression-related explicit alias clause if the expression is a single
> operator (same as single function today) and the right-hand side of the
> operator is a constant (meaning the constant is a reasonable representation
> of every output value that is going to appear in the result column).

I haven't been paying too close attention to this thread, but it seems
like there is a lot of misapprehension here about how this could
reasonably be implemented.  There is zero (not epsilon, but zero)
chance of changing column aliases at rewrite time.  Those have to be
assigned in the parser, else we will not understand how to resolve
references to sub-select output columns.  Specifically it has to happen
in FigureColname(), which means that resolving non-constant arguments
to constants isn't terribly practical.

Actually, since FigureColname() works on the raw parse tree, I'm not
even sure how you could make this happen in that context, unless you're
willing to say that "j ->> 'x'" resolves as "x" just based on the name
of the operator, without any info about its semantics.  That doesn't
seem very cool.  Now, in a quick look at the callers, it looks like it'd
be no problem from the callers' standpoint to switch things around to do
colname selection on the parsed tree instead, ie the existing choice is
for FigureColname's benefit not the callers'.  But it'd likely cost
a good deal to do it the other way, since now FigureColname would need
to perform catalog lookups to get column and function names.

Maybe you could do something like passing *both* trees to FigureColname,
and let it obtain the actual operator OID from the parsed tree when the
raw tree contains AEXPR_OP.  But the recursion in FigureColname would be
difficult to manage because the two trees often don't match one-to-one.

On the whole, I'm on the side of the people who don't want to change this.
The implementation cost seems likely to greatly outweigh the value, plus
it feels more like a wart than a feature.

regards, tom lane




Re: Delay of standby shutdown

2020-11-12 Thread Soumyadeep Chakraborty
Thanks! Marking this as ready for committer.

Regards,
Soumyadeep (VMware)




Re: Proposition for autoname columns

2020-11-12 Thread David G. Johnston
On Thu, Nov 12, 2020 at 9:32 AM Andrew Dunstan  wrote:

>
> On 11/12/20 11:12 AM, David G. Johnston wrote:
> > On Thu, Nov 12, 2020 at 8:59 AM Andrew Dunstan  > > wrote:
> >
> >
> >
> > So if we then say:
> >
> >
> > select x, j->>x from mytable;
> >
> >
> > you want both result columns named x? That seems like a recipe for
> > serious confusion. I really don't think this proposal has been
> > properly
> > thought through.
> >
> >
> > IMO It no worse than today's:
> >
> > select count(*), count(*) from (values (1), (2)) vals (v);
> > count | count
> > 2 | 2
> >
>
>
> I guess the difference here is that there's an extra level of
> indirection. So
>
>
> select x, j->>'x', j->>x from mytable
>
>
> would have 3 result columns all named x.
>
>
I totally missed the variable reference there - only two of those become
"x", the variable reference stays un-rewritten and thus results in
"?column?", similar to today:

select count(*), count(*) +1 from (values (1), (2)) vals (v);
count | ?column?
2 | 2

The query rewriter would only rewrite these expressions and provide an
expression-related explicit alias clause if the expression is a single
operator (same as single function today) and the right-hand side of the
operator is a constant (meaning the constant is a reasonable representation
of every output value that is going to appear in the result column).  If
the RHS is a variable then there is no good name that is known to cover all
output values and thus ?column? (i.e., do not rewrite/provide an alias
clause) is an appropriate choice.

My concerns in this area involve stored views and ruleutils, dump/reload by
extension.  Greenfield, this would have been nice, and worth the minimal
complexity given its usefulness in the common case, but is it useful enough
to introduce a whole new default naming mechanism and dealing with
dump/restore concerns?

David J.


Re: Proposition for autoname columns

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 11:32:49AM -0500, Andrew Dunstan wrote:
> On 11/12/20 11:12 AM, David G. Johnston wrote:
> > IMO It no worse than today's:
> >
> > select count(*), count(*) from (values (1), (2)) vals (v);
> > count | count
> > 2 | 2
> >
> 
> 
> I guess the difference here is that there's an extra level of
> indirection. So
> 
> select x, j->>'x', j->>x from mytable
> 
> would have 3 result columns all named x.

Yeah, I feel it would have to be something a user specifically asks for,
and we would have to say it would be the first or a random match of one
of the keys.  Ultimately, it might be so awkward as to be useless.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposition for autoname columns

2020-11-12 Thread Andrew Dunstan


On 11/12/20 11:12 AM, David G. Johnston wrote:
> On Thu, Nov 12, 2020 at 8:59 AM Andrew Dunstan  > wrote:
>
>
>
> So if we then say:
>
>
>     select x, j->>x from mytable;
>
>
> you want both result columns named x? That seems like a recipe for
> serious confusion. I really don't think this proposal has been
> properly
> thought through.
>
>
> IMO It no worse than today's:
>
> select count(*), count(*) from (values (1), (2)) vals (v);
> count | count
> 2 | 2
>


I guess the difference here is that there's an extra level of
indirection. So


select x, j->>'x', j->>x from mytable


would have 3 result columns all named x.


cheers


andrew








Re: Proposition for autoname columns

2020-11-12 Thread David G. Johnston
On Thu, Nov 12, 2020 at 8:59 AM Andrew Dunstan  wrote:

>
>
> So if we then say:
>
>
> select x, j->>x from mytable;
>
>
> you want both result columns named x? That seems like a recipe for
> serious confusion. I really don't think this proposal has been properly
> thought through.
>
>
IMO It no worse than today's:

select count(*), count(*) from (values (1), (2)) vals (v);
count | count
2 | 2
David J.


Re: Proposition for autoname columns

2020-11-12 Thread Pavel Stehule
čt 12. 11. 2020 v 16:59 odesílatel Andrew Dunstan 
napsal:

>
> On 11/12/20 9:14 AM, Eugen Konkov wrote:
> > Hello Andrew,
> >
> > Thursday, November 12, 2020, 3:19:39 PM, you wrote:
> >
> >
> >> On 11/11/20 7:55 PM, Bruce Momjian wrote:
> >>> On Thu, Nov 12, 2020 at 12:18:49AM +, Dagfinn Ilmari Mannsåker
> wrote:
>  Bruce Momjian  writes:
> > I think we could do it, but it would only work if the column was
> output
> > as a single json value, and not a multi-key/value field.  I am
> afraid if
> > we tried to do it, the result would be too inconsistent to be useful.
>  Could this be done via the support function, so that the top-level
>  operator/function in each select list item can return a suggested
> column
>  name if the relevant arguments are constants?
> >>> Yes, the user explicitly calling a function would be much easier to
> >>> predict.
> >>>
> >> I suspect this is doomed to failure. There is no guarantee that the path
> >> expression is going to be static or constant across rows. Say you have
> >> this table:
> >> x: foo, j: {"foo": 1, "bar": 2}
> >> x: bar  j: {"foo": 3, "bar": 4}
> >> and you say:
> >>   select j->>x from mytable;
> >> What should the column be named?
> > Suppose it should be named 'as x'
>
>
> So if we then say:
>
>
> select x, j->>x from mytable;
>
>
> you want both result columns named x? That seems like a recipe for
> serious confusion. I really don't think this proposal has been properly
> thought through.
>

Why? It is consistent - you will get a value of key x, and anybody expects,
so value should be different.

Regards

Pavel


>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
>
>


Re: Proposition for autoname columns

2020-11-12 Thread Andrew Dunstan


On 11/12/20 9:14 AM, Eugen Konkov wrote:
> Hello Andrew,
>
> Thursday, November 12, 2020, 3:19:39 PM, you wrote:
>
>
>> On 11/11/20 7:55 PM, Bruce Momjian wrote:
>>> On Thu, Nov 12, 2020 at 12:18:49AM +, Dagfinn Ilmari Mannsåker wrote:
 Bruce Momjian  writes:
> I think we could do it, but it would only work if the column was output
> as a single json value, and not a multi-key/value field.  I am afraid if
> we tried to do it, the result would be too inconsistent to be useful.
 Could this be done via the support function, so that the top-level
 operator/function in each select list item can return a suggested column
 name if the relevant arguments are constants?
>>> Yes, the user explicitly calling a function would be much easier to
>>> predict.
>>>
>> I suspect this is doomed to failure. There is no guarantee that the path
>> expression is going to be static or constant across rows. Say you have
>> this table:
>> x: foo, j: {"foo": 1, "bar": 2}
>> x: bar  j: {"foo": 3, "bar": 4}
>> and you say:
>>   select j->>x from mytable;
>> What should the column be named?
> Suppose it should be named 'as x'


So if we then say:


select x, j->>x from mytable;


you want both result columns named x? That seems like a recipe for
serious confusion. I really don't think this proposal has been properly
thought through.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: remove spurious CREATE INDEX CONCURRENTLY wait

2020-11-12 Thread Dmitry Dolgov
> On Mon, Nov 09, 2020 at 10:02:27PM -0500, Tom Lane wrote:
>
> Alvaro Herrera  writes:
> > Yeah ... it would be much better if we can make it use atomics instead.
>
> I was thinking more like "do we need any locking at all".
>
> Assuming that a proc's vacuumFlags can be set by only the process itself,
> there's no write conflicts to worry about.  On the read side, there's a
> hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
> that's not any different from what the outcome would be if they looked
> just before this stanza executes.  And even if they don't see it, at worst
> we lose the optimization being proposed.
>
> There is a question of whether it's important that both copies of the flag
> appear to update atomically ... but that just begs the question "why in
> heaven's name are there two copies?"

Sounds right, but after reading the thread about GetSnapshotData
scalability more thoroughly there seem to be an assumption that those
copies have to be updated at the same time under the same lock, and
claims that in some cases justification for correctness around not
taking ProcArrayLock is too complicated, at least for now.

Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.




Re: Add docs stub for recovery.conf

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 09:47:42AM -0500, Stephen Frost wrote:
> > Fortunately, this has already been discussed in the renaming of default
> > roles to predefined roles:
> > 
> > 
> > https://www.postgresql.org/message-id/flat/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org
> > 
> > This naming change has not happened yet, even though the issue is 11
> > months old, but the agreed-upon way to handle this is to use a website
> > redirect that links to the new text.  You can add a "tip" there so they
> > understand the renaming has happened.
> 
> That rename will suffer the same problem that Craig is concerned about
> here regarding the 'current' link, once it's done.  I tend to agree with
> Craig that it'd be good to improve on this situation, and I've reached
> out to Jonathan to ask about using his new feature to have those
> /current/ links redirect to the renamed page.  I'm actually wondering if
> maybe we should just always do that for all the dog page aliases..
> 
> Might make more sense to discuss this over on -www though.

Yes, I am thinking someone could go back and add redirects for previous
renames too.  It would be interesting also to scrape the web logs for
404 errors to see which renames cause the most failures and do those
first.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Add docs stub for recovery.conf

2020-11-12 Thread Stephen Frost
Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, Nov 12, 2020 at 10:21:02AM +0800, Craig Ringer wrote:
> > > Here's how the rendered docs look: https://imgur.com/a/VyjzEw5
> > > 
> > > Think. You're used to recovery.conf. You've recently switched to pg 12. 
> > > You
> > > search for "recovery.conf" or "primary_slot_name" or "standby_mode" or
> > > something. You of course land up at https://www.postgresql.org/docs/11/
> > > recovery-config.html or another version.
> > > 
> > > What do you do now? There's no "12" or "current" link for your version. 
> > > You
> > > don't follow postgres development closely, and you have no idea we 
> > > removed the
> > > file. You might work it out from the release notes. But even then, if you 
> > > try
> > > searching for "standby_mode" in the updated docs will tell you basically
> > > nothing, it has just vanished
> > > 
> > > Also by simply deleting the page, we've broken web links to https://
> > > www.postgresql.org/docs/current/recovery-config.html
> > > 
> > > So there are three really good reasons:
> > > 
> > > * Help users of the web docs navigate to the right place when we remove 
> > > things
> > > * Don't break web links (breaking links without redirects is bad, the web 
> > > is
> > > sad)
> > > * Help upgrading users who know the old terms find the new terms
> > > 
> > > I'd welcome your suggestions on other ways to arrange this, so long as it 
> > > meets
> > > the basic requirement "retain the linktable target 'recovery-config' "
> > 
> > This is certainly not the first or last time we will rename things. 
> 
> Indeed, we've renamed things a number of times before, eg:
> 
> https://www.postgresql.org/docs/current/pgwaldump.html
> 
> where the 9.6 link goes to:
> 
> https://www.postgresql.org/docs/9.6/pgxlogdump.html
> 
> and the 'current' link from the 9.6 page goes to the pgwaldump page,
> which all works pretty well, if all you're looking at is our
> documentation and not considering external links into the documentation.
> 
> However, that isn't what Craig's raising a concern over here (at least,
> not exclusively), it's this issue:
> 
> https://www.postgresql.org/docs/current/pgxlogdump.html
> 
> Which currently goes to a 404.
> 
> Now, the pgweb feature that Jonathan wrote recently might actually be
> exactly what we need to fix that, and to address the issue with
> recovery config documentation that Craig raises.

After chatting with Jonathan about this for a bit and testing it out in
our test environment, I've gone ahead and added an entry for
pgxlogdump.html to redirect to pgwaldump.html, and that seems to be
working well.

With that then- Craig, can you look at how the pgxlogdump -> pgwaldump
pages work and see if using that would address the concerns you've
raised here..?

Though we need to decide which page 'recovery-config' should go to in
newer versions.

I'm continuing to chat with Jonathan about if it'd make sense to do the
same for the other doc aliases.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposition for autoname columns

2020-11-12 Thread David G. Johnston
On Thu, Nov 12, 2020 at 7:18 AM Eugen Konkov  wrote:

> Hello Andrew,
>
> Thursday, November 12, 2020, 3:19:39 PM, you wrote:
>
>
> > On 11/11/20 7:55 PM, Bruce Momjian wrote:
>
> >   select j->>x from mytable;
> > What should the column be named?
>
> Suppose it should be named 'as x'
>

+1

>
>
> > I think we'd be trying to manage a set of corner cases, and all because
> > someone didn't want to put "as foo" in their query. And if we generate a
> > column name in some cases and not in others there will be complaints of
> > inconsistency.
>
>
Yes, this is suggesting a behavior that is contrary to (but not prohibited
by) the natural expression and expectations of SQL.  That said, we already
take a function's name and use it to specify the name of it output column
as opposed to using "?column?" and requiring a user to apply a specific
alias.  This is only a step beyond that, choosing the default name for an
operator's output column based upon not the name of the operator (or its
underlying function) but based upon its one (and only possible) right-hand
argument.  It is purely a user convenience feature and can be rejected on
that grounds but I'm not seeing any fundamental issue with only having some
operator combinations doing this. It's nice when it works and you are no
worse off than today when it doesn't.

David J.


Re: [HACKERS] xlog min recovery request ... is past current point ...

2020-11-12 Thread Robert Haas
[ blast-from-the-past department ]

On Fri, Feb 3, 2012 at 11:33 AM Christophe Pettus  wrote:
> While bringing up a streaming replica, and while it is working its way 
> through the WAL segments before connecting to the primary, I see a lot of 
> messages of the form:
>
> 2012-02-01 21:26:13.978 PST,,,24448,,4f2a1e61.5f80,54,,2012-02-01 21:25:53 
> PST,1/0,0,LOG,0,"restored log file ""00010DB40065"" from 
> archive",""
> 2012-02-01 21:26:14.032 PST,,,24448,,4f2a1e61.5f80,55,,2012-02-01 21:25:53 
> PST,1/0,0,WARNING,01000,"xlog min recovery request DB5/42E15098 is past 
> current point DB4/657FA490","writing block 5 of relation 
> base/155650/156470_vm
> xlog redo insert: rel 1663/155650/1658867; tid 9640/53"""
> 2012-02-01 21:26:14.526 PST,,,24448,,4f2a1e61.5f80,56,,2012-02-01 21:25:53 
> PST,1/0,0,LOG,0,"restored log file ""00010DB40066"" from 
> archive",""
>
> All of these are on _vm relations.  The recovery completed successfully and 
> the secondary connected to the primary without issue, so: Are these messages 
> something to be concerned over?

It appears that we never did anything about this, and I just came
across some evidence that it's still a live issue. And, looking at the
code, I can't see any reason why it shouldn't be. I found Heikki's
list of steps (in his 2012 reply) a bit difficult to understand, so
here's my attempt to restate the scenario he describes:

Suppose that a user begins a base backup. While the base backup is
running, the user performs a DML-operation on an all-visible page,
clearing the visibility map bit. Then, they run VACUUM, resetting the
bit. The page is copied after both of these things happen. Then, the
base backup ends. Next, the user tries to start up the standby. When
the DML operation is replayed, the visibility map bit is cleared
without changing the page LSN (e.g. heap_xlog_delete ->
visibilitymap_clear). If the buffer is now flushed out of
shared_buffers before the VACUUM is replayed, then we can get the
complaint mentioned in $SUBJECT.

OK, so what could/should we do about this?

VM pages are subject to a somewhat unusual scheme with regard to
write-ahead logging: to minimize WAL traffic, records that clear the
visibilitymap bit in passing do not register the buffer, meaning that
such records generate no full-page images for the visibility map.
Records that set the visibility map bit do register the buffer, but
suppress full-page images unless checksums or wal_log_hints are in
use. At least if checksums are not in use, this is reasonable, because
modifications to VM pages don't rely on the existing page contents, so
even if the page is torn, replay need not get confused, as long as it
ignores the page LSN when clearing the bit. And that's just what
heap_xlog_delete() and friends actually do.  It's acceptable but
useless to pay attention to the page LSN when setting a bit, because
the only thing we can do is decide to not set a bit that should have
been set, which doesn't break anything, and in fact
heap_xlog_visible() does pay attention to the page LSN "out of an
abundance of conservatism" -- a comment I wrote originally, and with
which I'm no longer very impressed, because it seems like it's just
making things more confusing.

The overall picture here is that the LSN of the VM page matters very
little. If heap_xlog_visible() didn't have that "if (lsn >
PageGetLSN(vmpage))" test guarding visibilitymap_set here, the rule
would be simple: the LSN of a VM page would have no effect on replay
of related WAL records, period. However, it would probably make the
warning that is the subject of this thread more common. Imagine the
same scenario outlined above but, instead of the bit starting out set
and then being cleared and set again, it starts out cleared and gets
set and cleared again. The rest is the same: the page is copied by the
base backup in its final form. As things stand, that if-test prevents
the VM bit from being set, which also means the page LSN is not set
nor is the page dirtied, which means that evicting the page can't
trigger this warning.

But, regardless of whether we actually remove that if-test,
considering it as a hypothetical seems useful. If we actually had a
rule that the page LSN doesn't really affect whether actions affecting
VM pages are replayed, then we would just need visibilitymap_clear()
to set the page LSN just like visibilitymap_set() does. The only thing
the page LSN would be doing is causing the warning which is the
subject of the thread to get generated, or not, so it would make sense
to just set in a way that avoids the warning, since it matters to
nothing else.

However, checksums seem to complicate the situation considerably.
Ignore everything proposed above and suppose that, with checksums
enabled, the user deletes a tuple from a page marked all-visible,
clearing the VM bit. The dirty page is flushed to the OS. As it is
being written back to durable storage, the system 

Re: Add docs stub for recovery.conf

2020-11-12 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Nov 12, 2020 at 10:21:02AM +0800, Craig Ringer wrote:
> > Here's how the rendered docs look: https://imgur.com/a/VyjzEw5
> > 
> > Think. You're used to recovery.conf. You've recently switched to pg 12. You
> > search for "recovery.conf" or "primary_slot_name" or "standby_mode" or
> > something. You of course land up at https://www.postgresql.org/docs/11/
> > recovery-config.html or another version.
> > 
> > What do you do now? There's no "12" or "current" link for your version. You
> > don't follow postgres development closely, and you have no idea we removed 
> > the
> > file. You might work it out from the release notes. But even then, if you 
> > try
> > searching for "standby_mode" in the updated docs will tell you basically
> > nothing, it has just vanished
> > 
> > Also by simply deleting the page, we've broken web links to https://
> > www.postgresql.org/docs/current/recovery-config.html
> > 
> > So there are three really good reasons:
> > 
> > * Help users of the web docs navigate to the right place when we remove 
> > things
> > * Don't break web links (breaking links without redirects is bad, the web is
> > sad)
> > * Help upgrading users who know the old terms find the new terms
> > 
> > I'd welcome your suggestions on other ways to arrange this, so long as it 
> > meets
> > the basic requirement "retain the linktable target 'recovery-config' "
> 
> This is certainly not the first or last time we will rename things. 

Indeed, we've renamed things a number of times before, eg:

https://www.postgresql.org/docs/current/pgwaldump.html

where the 9.6 link goes to:

https://www.postgresql.org/docs/9.6/pgxlogdump.html

and the 'current' link from the 9.6 page goes to the pgwaldump page,
which all works pretty well, if all you're looking at is our
documentation and not considering external links into the documentation.

However, that isn't what Craig's raising a concern over here (at least,
not exclusively), it's this issue:

https://www.postgresql.org/docs/current/pgxlogdump.html

Which currently goes to a 404.

Now, the pgweb feature that Jonathan wrote recently might actually be
exactly what we need to fix that, and to address the issue with
recovery config documentation that Craig raises.

> Fortunately, this has already been discussed in the renaming of default
> roles to predefined roles:
> 
>   
> https://www.postgresql.org/message-id/flat/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org
> 
> This naming change has not happened yet, even though the issue is 11
> months old, but the agreed-upon way to handle this is to use a website
> redirect that links to the new text.  You can add a "tip" there so they
> understand the renaming has happened.

That rename will suffer the same problem that Craig is concerned about
here regarding the 'current' link, once it's done.  I tend to agree with
Craig that it'd be good to improve on this situation, and I've reached
out to Jonathan to ask about using his new feature to have those
/current/ links redirect to the renamed page.  I'm actually wondering if
maybe we should just always do that for all the dog page aliases..

Might make more sense to discuss this over on -www though.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Add docs stub for recovery.conf

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 10:21:02AM +0800, Craig Ringer wrote:
> Here's how the rendered docs look: https://imgur.com/a/VyjzEw5
> 
> Think. You're used to recovery.conf. You've recently switched to pg 12. You
> search for "recovery.conf" or "primary_slot_name" or "standby_mode" or
> something. You of course land up at https://www.postgresql.org/docs/11/
> recovery-config.html or another version.
> 
> What do you do now? There's no "12" or "current" link for your version. You
> don't follow postgres development closely, and you have no idea we removed the
> file. You might work it out from the release notes. But even then, if you try
> searching for "standby_mode" in the updated docs will tell you basically
> nothing, it has just vanished
> 
> Also by simply deleting the page, we've broken web links to https://
> www.postgresql.org/docs/current/recovery-config.html
> 
> So there are three really good reasons:
> 
> * Help users of the web docs navigate to the right place when we remove things
> * Don't break web links (breaking links without redirects is bad, the web is
> sad)
> * Help upgrading users who know the old terms find the new terms
> 
> I'd welcome your suggestions on other ways to arrange this, so long as it 
> meets
> the basic requirement "retain the linktable target 'recovery-config' "

This is certainly not the first or last time we will rename things. 
Fortunately, this has already been discussed in the renaming of default
roles to predefined roles:


https://www.postgresql.org/message-id/flat/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org

This naming change has not happened yet, even though the issue is 11
months old, but the agreed-upon way to handle this is to use a website
redirect that links to the new text.  You can add a "tip" there so they
understand the renaming has happened.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Additional improvements to extended statistics

2020-11-12 Thread Dean Rasheed
On Thu, 12 Nov 2020 at 14:18, Tomas Vondra  wrote:
>
> Here is an improved WIP version of the patch series, modified to address
> the issue with repeatedly applying the extended statistics, as discussed
> with Dean in this thread. It's a bit rough and not committable, but I
> need some feedback so I'm posting it in this state.
>

Cool. I haven't forgotten that I promised to look at this.

> Dean, does this address the issue you had in mind? Can you come up with
> an example of that issue in the form of a regression test or something?
>

I'm quite busy with my day job at the moment, but I expect to have
time to review this next week.

Regards,
Dean




Re: Additional improvements to extended statistics

2020-11-12 Thread Tomas Vondra
Hi,

Here is an improved WIP version of the patch series, modified to address
the issue with repeatedly applying the extended statistics, as discussed
with Dean in this thread. It's a bit rough and not committable, but I
need some feedback so I'm posting it in this state.

(Note: The WIP patch is expected to fail regression tests. A couple
stats_ext regression tests fail due to changed estimate - I've left it
like that to make the changes more obvious for now.)

Earlier in this thread I used this example:


   CREATE TABLE t (a int, b int);
   INSERT INTO t SELECT mod(i, 10), mod(i, 10)
 FROM generate_series(1,10) s(i);
   CREATE STATISTICS s (mcv) ON a,b FROM t;
   ANALYZE t;

   EXPLAIN SELECT * FROM t WHERE a = 0 OR b = 0;

which had this call graph with two statext_mcv_clauselist_selectivity
calls (which was kinda the issue):

   clauselist_selectivity
 statext_clauselist_selectivity
   statext_mcv_clauselist_selectivity  <--- (1)
 clauselist_selectivity_simple
   clause_selectivity
 clauselist_selectivity_or
   statext_clauselist_selectivity
 statext_mcv_clauselist_selectivity  <--- (2)
   clauselist_selectivity_simple_or
 clause_selectivity
 clause_selectivity
   mcv_clauselist_selectivity
   clauselist_selectivity_simple_or
 mcv_clauselist_selectivity
   clauselist_selectivity_simple
 (already estimated)

with the patches applied, the call looks like this:

   clauselist_selectivity_internal (use_extended_stats=1)
 statext_clauselist_selectivity
   statext_mcv_clauselist_selectivity (is_or=0)
 clauselist_selectivity_simple
   clause_selectivity_internal (use_extended_stats=0)
 clauselist_selectivity_or (use_extended_stats=0)
   clauselist_selectivity_simple_or
 clause_selectivity_internal (use_extended_stats=0)
 clause_selectivity_internal (use_extended_stats=0)
 mcv_clauselist_selectivity (is_or=0)
   clauselist_selectivity_simple

The nested call is removed, which I think addresses the issue. As for
the effect on estimates, there's a couple regression tests where the
estimates change - not much though, an example is:

 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial
WHERE a = 0 OR b = 0 OR c = 10');
  estimated | actual
 ---+
-   412 |104
+   308 |104
 (1 row)

This is on top of 0001, though. Interestingly enough, this ends up with
the same estimate as current master, but I consider that a coincidence.


As for the patches:

0001 is the original patch improving estimates of OR clauses

0002 adds thin wrappers for clause[list]_selectivity, with "internal"
functions allowing to specify whether to keep considering extended stats

0003 does the same for the "simple" functions


I've kept it like this to demonstrate that 0002 is not sufficient. In my
response from March 24 I wrote this:

> Isn't it the case that clauselist_selectivity_simple (and the OR 
> variant) should ignore extended stats entirely? That is, we'd need
> to add a flag (or _simple variant) to clause_selectivity, so that it
> calls causelist_selectivity_simple_or.
But that's actually wrong, as 0002 shows (as it breaks a couple of
regression tests), because of the way we handle OR clauses. At the top
level, an OR-clause is actually just a single clause and it may get
passed to clauselist_selectivity_simple. So entirely disabling extended
stats for the "simple" functions would also mean disabling extended
stats for a large number of OR clauses. Which is clearly wrong.

So 0003 addresses that, by adding a flag to the two "simple" functions.
Ultimately, this should probably do the same thing as 0002 and add thin
wrappers, because the existing functions are part of the public API.

Dean, does this address the issue you had in mind? Can you come up with
an example of that issue in the form of a regression test or something?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From a6b4ddc2777e29170f6caef41ec25a75899d14d3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 12 Nov 2020 00:39:17 +0100
Subject: [PATCH 1/4] Improve estimation of OR clauses using extended
 statistics

Until now, OR clauses were estimated using extended statistics only when
the whole clause (all the arguments) are compatible. If even just one
argument was found to be incompatible, the whole clause was estimated
ignoring extended statistics. Estimation errors for OR clauses tend to
be fairly mild, so this was considered acceptable, but it may become an
issue for OR clauses with more complex arguments, etc.

This commit relaxes the restriction, using mostly the same logic as AND
clauses. We first apply extended statistics to as many arguments as
possible, and 

Re: Proposition for autoname columns

2020-11-12 Thread Eugen Konkov
Hello Andrew,

Thursday, November 12, 2020, 3:19:39 PM, you wrote:


> On 11/11/20 7:55 PM, Bruce Momjian wrote:
>> On Thu, Nov 12, 2020 at 12:18:49AM +, Dagfinn Ilmari Mannsåker wrote:
>>> Bruce Momjian  writes:
 I think we could do it, but it would only work if the column was output
 as a single json value, and not a multi-key/value field.  I am afraid if
 we tried to do it, the result would be too inconsistent to be useful.
>>> Could this be done via the support function, so that the top-level
>>> operator/function in each select list item can return a suggested column
>>> name if the relevant arguments are constants?
>> Yes, the user explicitly calling a function would be much easier to
>> predict.
>>


> I suspect this is doomed to failure. There is no guarantee that the path
> expression is going to be static or constant across rows. Say you have
> this table:


> x: foo, j: {"foo": 1, "bar": 2}

> x: bar  j: {"foo": 3, "bar": 4}


> and you say:


>   select j->>x from mytable;
> What should the column be named?

Suppose it should be named 'as x'


> I think we'd be trying to manage a set of corner cases, and all because
> someone didn't want to put "as foo" in their query. And if we generate a
> column name in some cases and not in others there will be complaints of
> inconsistency.


> cheers


> andrew


> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com




-- 
Best regards,
Eugen Konkov





Re: truncating timestamps on arbitrary intervals

2020-11-12 Thread Peter Eisentraut

On 2020-06-30 06:34, John Naylor wrote:

In v9, I've simplified the patch somewhat to make it easier for future
work to build on.

- When truncating on month-or-greater intervals, require the origin to
align on month. This removes the need to handle weird corner cases
that have no straightforward behavior.
- Remove hackish and possibly broken code to allow origin to be after
the input timestamp. The default origin is Jan 1, 1 AD, so only AD
dates will behave correctly by default. This is not enforced for now,
since it may be desirable to find a way to get this to work in a nicer
way.
- Rebase docs over PG13 formatting changes.


This looks pretty solid now.  Are there any more corner cases and other 
areas with unclear behavior that you are aware of?


A couple of thoughts:

- After reading the discussion a few times, I'm not so sure anymore 
whether making this a cousin of date_trunc is the right way to go.  As 
you mentioned, there are some behaviors specific to date_trunc that 
don't really make sense in date_trunc_interval, and maybe we'll have 
more of those.  Also, date_trunc_interval isn't exactly a handy name. 
Maybe something to think about.  It's obviously fairly straightforward 
to change it.


- There were various issues with the stride interval having months and 
years.  I'm not sure we even need that.  It could be omitted unless you 
are confident that your implementation is now sufficient.


- Also, negative intervals could be prohibited, but I suppose that 
matters less.


- I'm curious about the origin being set to 0001-01-01.  This seems to 
work correctly in that it sets the origin to a Monday, which is what we 
wanted, but according to Google that day was a Saturday.  Something to 
do with Julian vs. Gregorian calendar?  Maybe we should choose a date 
that is a bit more recent and easier to reason with.


- Then again, I'm thinking that maybe we should make the origin 
mandatory.  Otherwise, the default answers when having strides larger 
than a day are entirely arbitrary, e.g.,


=> select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
0190-01-01 00:00:00 BC

=> select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
0191-01-01 00:00:00

Perhaps the origin could be defaulted if the interval is less than a day 
or something like that.


- I'm wondering whether we need the date_trunc_interval(interval, 
timestamptz, timezone) variant.  Isn't that the same as 
date_trunc_interval(foo AT ZONE xyz, value)?





Re: PATCH: Batch/pipelining support for libpq

2020-11-12 Thread Alvaro Herrera
(Adding previous reviewers to CC)

On 2020-Nov-03, David G. Johnston wrote:

> Given the caveats around blocking mode connections why not just require
> non-blocking mode, in a similar fashion to how synchronous functions are
> disallowed?

This is a very good question.  Why indeed?  Does anybody have a good
answer to this?  If not, I propose we just require that non-blocking
mode is in use in order for batch mode to be used.

I've been doing a review pass over this patch and have an updated
version, which I intend to share later today (after I fix what appears
to be a misunderstanding in the "singlerow" test in testlibpqbatch.c)




Re: Allow matching whole DN from a client certificate

2020-11-12 Thread Daniel Gustafsson
> On 11 Nov 2020, at 21:44, Andrew Dunstan  wrote:

> If people like this idea I'll add tests and docco and add it to the next CF.

Sounds like a good idea, please do.

Can this case really happen in non-ancient OpenSSL version?
+   if (!x509name)

Doesn't this returnpath need a pfree(peer_cn)?
+   bio = BIO_new(BIO_s_mem());
+   if (!bio)
+   {
+   return -1;
+   }

cheers ./daniel



Re: Strange GiST logic leading to uninitialized memory access in pg_trgm gist code

2020-11-12 Thread Andrew Gierth
> "Alexander" == Alexander Korotkov  writes:

 >> Another issue I don't understand yet is that even though this code
 >> is largely unchanged since 8.x, the original reporter could not
 >> reproduce the crash on any version before 13.0.

 Alexander> I think this is related to my commit 911e702077. It has
 Alexander> changed the memory allocation for the signatures to support
 Alexander> the signatures of variable length. So, it seems that despite
 Alexander> the error existing since 8.x, it started causing segfaults
 Alexander> only since 911e702077.

Aha. Prior to that change, cache[i].sign was an array rather than a
pointer, so it would not crash even when accessed without
initialization. What would happen instead is that an incorrect signature
would be used, which might lead to problems later in index lookups
(though I haven't tested that).

 Alexander> I would rather propose to rip off special handling of the
 Alexander> last item completely (see the attached patch).

Yeah. I'll go with that, once I finish testing it.

-- 
Andrew (irc:RhodiumToad)




Re: Proposition for autoname columns

2020-11-12 Thread Andrew Dunstan


On 11/11/20 7:55 PM, Bruce Momjian wrote:
> On Thu, Nov 12, 2020 at 12:18:49AM +, Dagfinn Ilmari Mannsåker wrote:
>> Bruce Momjian  writes:
>>> I think we could do it, but it would only work if the column was output
>>> as a single json value, and not a multi-key/value field.  I am afraid if
>>> we tried to do it, the result would be too inconsistent to be useful.
>> Could this be done via the support function, so that the top-level
>> operator/function in each select list item can return a suggested column
>> name if the relevant arguments are constants?
> Yes, the user explicitly calling a function would be much easier to
> predict.
>


I suspect this is doomed to failure. There is no guarantee that the path
expression is going to be static or constant across rows. Say you have
this table:


x: foo, j: {"foo": 1, "bar": 2}

x: bar  j: {"foo": 3, "bar": 4}


and you say:


  select j->>x from mytable;


What should the column be named?


I think we'd be trying to manage a set of corner cases, and all because
someone didn't want to put "as foo" in their query. And if we generate a
column name in some cases and not in others there will be complaints of
inconsistency.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Online checksums patch - once again

2020-11-12 Thread Daniel Gustafsson
> On 5 Oct 2020, at 14:14, Heikki Linnakangas  wrote:
> 
> Replying to an older message in this thread:
> 
>>> + /*
>>> + * If we reach this point with checksums in inprogress state, we notify
>>> + * the user that they need to manually restart the process to enable
>>> + * checksums. This is because we cannot launch a dynamic background worker
>>> + * directly from here, it has to be launched from a regular backend.
>>> + */
>>> + if (ControlFile->data_checksum_version == 
>>> PG_DATA_CHECKSUM_INPROGRESS_VERSION)
>>> + ereport(WARNING,
>>> + (errmsg("checksum state is \"inprogress\" with no worker"),
>>> + errhint("Either disable or enable checksums by calling the
>>> pg_disable_data_checksums() or pg_enable_data_checksums()
>>> functions.")));
>>> This seems pretty half-baked.
>> I don't disagree with that.  However, given that enabling checksums is a 
>> pretty
>> intensive operation it seems somewhat unfriendly to automatically restart.  
>> As
>> a DBA I wouldn't want that to kick off without manual intervention, but there
>> is also the risk of this being missed due to assumptions that it would 
>> restart.
>> Any ideas on how to treat this?
>> If/when we can restart the processing where it left off, without the need to 
>> go
>> over all data again, things might be different wrt the default action.
> 
> The later patch version do support restarting, so I think we should revisit 
> this issue.

Agreed, now it makes sense to restart automatically.

> I would expect the checksums worker to be automatically started at postmaster 
> startup. Can we make that happen?

A dynamic background worker has to be registered from a regular backend, so
it's not entirely clear to me where in startup processing that would take
place.  Do you have any good suggestions?

cheers ./daniel



Phrase search vs. multi-lexeme tokens

2020-11-12 Thread Alexander Korotkov
Hackers,

I'm investigating the bug report [1] about the behavior of
websearch_to_tsquery() with quotes and multi-lexeme tokens.  See the
example below.

# select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class
foo"');
 ?column?
--
 f

So, tsvector doesn't match tsquery, when absolutely the same text was
put to the to_tsvector() and to the quotes of websearch_to_tsquery().
Looks wrong to me.  Let's examine output of to_tsvector() and
websearch_to_tsquery().

# select to_tsvector('pg_class foo');
   to_tsvector
--
 'class':2 'foo':3 'pg':1

# select websearch_to_tsquery('"pg_class foo"');
 websearch_to_tsquery
--
 ( 'pg' & 'class' ) <-> 'foo'
(1 row)

So, 'pg_class' token was split into two lexemes 'pg' and 'class'.  But
the output websearch_to_tsquery() connects 'pg' and 'class' with &
operator.  tsquery expects 'pg' and 'class' to be both neighbors of
'foo'.  So, 'pg' and 'class' are expected to share the same position,
and that isn't true for tsvector.  Let's see how phraseto_tsquery()
handles that.

# select to_tsvector('pg_class foo') @@ phraseto_tsquery('pg_class foo');
 ?column?
--
 t

# select phraseto_tsquery('pg_class foo');
  phraseto_tsquery

 'pg' <-> 'class' <-> 'foo'

phraseto_tsquery() connects all the lexemes with phrase operators and
everything works OK.

For me it's obvious that phraseto_tsquery() and websearch_to_tsquery()
with quotes should work the same way.  Noticeably, current behavior of
websearch_to_tsquery() is recorded in the regression tests.  So, it
might look that this behavior is intended, but it's too ridiculous and
I think the regression tests contain oversight as well.

I've prepared a fix, which doesn't break the fts parser abstractions
too much (attached patch), but I've faced another similar issue in
to_tsquery().

# select to_tsvector('pg_class foo') @@ to_tsquery('pg_class <-> foo');
 ?column?
--
 f

# select to_tsquery('pg_class <-> foo');
  to_tsquery
--
 ( 'pg' & 'class' ) <-> 'foo'

I think if a user writes 'pg_class <-> foo', then it's expected to
match 'pg_class foo' independently on which lexemes 'pg_class' is
split into.

This issue looks like the much more complex design bug in phrase
search.  Fixing this would require some kind of readahead or multipass
processing, because we don't know how to process 'pg_class' in
advance.

Is this really a design bug existing in phrase search from the
beginning.  Or am I missing something?

Links
1. https://www.postgresql.org/message-id/16592-70b110ff9731c07d%40postgresql.org

--
Regards,
Alexander Korotkov


websearch_fix_p2.patch
Description: Binary data


Re: Refactor pg_rewind code and make it work against a standby

2020-11-12 Thread Heikki Linnakangas

On 04/11/2020 11:23, Heikki Linnakangas wrote:

I read through the patches one more time, fixed a bunch of typos and
such, and pushed patches 1-4. I'm going to spend some more time on
testing the last patch. It allows using a standby server as the source,
and we don't have any tests for that yet. Thanks for the review!


Did some more testing, fixed one bug, and pushed.

To test this, I set up a cluster with one primary, a standby, and a 
cascaded standby. I launched a test workload against the primary that 
creates tables, inserts rows, and drops tables continuously. In another 
shell, I promoted the cascaded standby, run some updates on the promoted 
server, and finally, run pg_rewind pointed at the standby, and start it 
again as a cascaded standby. Repeat.


Attached are the scripts I used. I edited them between test runs to test 
slightly different scenarios. I don't expect them to be very useful to 
anyone else, but the Internet is my backup.


I did find one bug in the patch with that, so the time was well spent: 
the code in process_queued_fetch_requests() got confused and errored 
out, if a file was removed in the source system while pg_rewind was 
running. There was code to deal with that, but it was broken. Fixed that.


- Heikki


rewind-cascading-test.tar.gz
Description: application/gzip


Remove unused variable from SharedSort

2020-11-12 Thread Dilip Kumar
While going through the code I noticed that the nTapes member in
SharedSort is unused.  This is just initialized with nworkers but
never used.  The attached patch removes this variable.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Remove-unused-structure-member-from-Sharedsort.patch
Description: Binary data


Re: Implementing Incremental View Maintenance

2020-11-12 Thread Tatsuo Ishii
> 1. Create pgbench database with scale 100.
> pgbench speed at my desktop is about 10k TPS:
> 
> pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
> tps = 10194.951827 (including connections establishing)
> 
> 2. Then I created incremental materialized view:
> 
> create incremental materialized view teller_sums as select
> t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on
> a.bid=t.bid group by t.tid;
> SELECT 1000
> Time: 20805.230 ms (00:20.805)
> 
> 20 second is reasonable time, comparable with time of database
> initialization.
> 
> Then obviously we see advantages of precalculated aggregates:
> 
> postgres=# select * from teller_sums where tid=1;
>  tid |  sum
> -+
>    1 | -96427
> (1 row)
> 
> Time: 0.871 ms
> postgres=# select t.tid,sum(abalance) from pgbench_accounts a join
> pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1
> ;
>  tid |  sum
> -+
>    1 | -96427
> (1 row)
> 
> Time: 915.508 ms
> 
> Amazing. Almost 1000 times difference!
> 
> 3. Run pgbench once again:
> 
> Ooops! Now TPS are much lower:
> 
> tps = 141.767347 (including connections establishing)
> 
> Speed of updates is reduced more than 70 times!
> Looks like we loose parallelism because almost the same result I get
> with just one connection.

How much TPS do you get if you execute pgbench -c 1 without
incremental materialized view defined? If it's around 141 then we
could surely confirm that the major bottle neck is locking contention.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Skip ExecCheckRTPerms in CTAS with no data

2020-11-12 Thread Bharath Rupireddy
Thanks for the comments.

On Thu, Nov 12, 2020 at 2:36 PM Michael Paquier  wrote:
>
> On Wed, Nov 11, 2020 at 07:31:49PM +0530, Bharath Rupireddy wrote:
> > Yes we do not have anything related to permissions mentioned in the
> > documentation. So, I'm not adding it now.
>
> It would be good to clarify that in the docs while we are on it.
>

Added.

>
> I don't think this is sufficient.  Could you add more test cases here?
> I can think of, coming down actually to the callers of
> CreateIntoRelDestReceiver:
> - A plain CTAS WITH NO DATA, that should pass,
> - CTAS EXECUTE WITH NO DATA, that should pass.
> - CTAS EXECUTE WITH DATA, that should not pass.
> - EXPLAIN CTAS WITH DATA, that should not pass.
>

Done.

On HEAD/master, the behaviour is as follows: a) for plain CTAS WITH NO
DATA, ExecCheckRTPerms() will not be called. b) for explain analyze
CTAS WITH NO DATA, ExecCheckRTPerms() will be called. So, we need a).
This is what exactly this patch does i.e. ExecCheckRTPerms() will not
be called for both cases.

Attaching V3 patch, please review it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v3-0001-Skip-Insert-Perm-Check-Bulk-Insert-State-alloc-in.patch
Description: Binary data


Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-12 Thread Alexey Kondratov

On 2020-11-11 06:59, Tom Lane wrote:

Alexey Kondratov  writes:
After looking on the autoprewarm code more closely I have realised 
that

this 'double dump' issues was not an issues at all. I have just
misplaced a debug elog(), so its second output in the log was only
indicating that we calculated delay_in_ms one more time.


Ah --- that explains why I couldn't see a problem.

I've pushed 0001+0002 plus some followup work to fix other places
that could usefully use TimestampDifferenceMilliseconds().  I have
not done anything with 0003 (the TAP test for pg_prewarm), and will
leave that to the judgment of somebody who's worked with pg_prewarm
before.  To me it looks like it's not really testing things very
carefully at all; on the other hand, we have exactly zero test
coverage of that module today, so maybe something is better than
nothing.



Great, thank you for generalisation of the issue and working on it.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Issue with server side statement-level rollback

2020-11-12 Thread Gilles Darold

Hi,


We're facing some issue in a new extension we use at LzLabs to
emulate server side rollback at statement level in PostgreSQL,
see for full detail https://github.com/lzlabs/pg_statement_rollback/

The problem we are encountering is when PostgreSQL is compiled in debug
mode with --enable-cassert. At line 1327 of src/backend/tcop/pquery.c
the following assert fail:

    /*
     * Clear subsidiary contexts to recover temporary memory.
     */
    Assert(portal->portalContext == CurrentMemoryContext);

    MemoryContextDeleteChildren(portal->portalContext);

This extension, although it is a risky implementation, works extremely
well when used in a fully controlled environment. It avoid the latency
of the extra communication for the RELEASE+SAVEPOINT usually controlled at
client side. The client is only responsible to issue the "ROLLBACK TO 
autosavepoint"

when needed.  The extension allow a high performances gain for this feature
that helps customers using Oracle or DB2 to migrate to PostgreSQL.


Actually with the extension the memory context is not CurrentMemoryContext
as expected by the assert.

    (gdb) b pquery.c:1327
    Breakpoint 1 at 0x55792fd7a04d: file pquery.c, line 1327.
    (gdb) c
    Continuing.

    Breakpoint 1, PortalRunMulti (portal=portal@entry=0x5579316e3e10, 
isTopLevel=isTopLevel@entry=true,
        setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x557931755ce8, altdest=altdest@entry=0x557931755ce8,

        qc=qc@entry=0x7ffc4aa1f8a0) at pquery.c:1327
    1327            Assert(portal->portalContext == CurrentMemoryContext);
    (gdb) p portal->sourceText
    $1 = 0x557931679c80 "INSERT INTO savepoint_test SELECT 1;"
    (gdb) p MemoryContextStats(portal->portalContext)
    $2 = void
    (gdb)

The memory context dump output from log:

    PortalContext: 1024 total in 1 blocks; 704 free (1 chunks); 320 
used: 

    Grand total: 1024 bytes in 1 blocks; 704 free (1 chunks); 320 used

If I naively remove the assert on pquery.c everything works without any
new assert error.

As I said I am aware that this is clearly not a standard PostgreSQL use
but we have no choice. We are emulating DB2 statement-level rollback
behavior and we have chosen to not create a new fork of PostgreSQL and
only work with extensions. As there is no hook or API that could allow a 
perfect

server side integration of this feature we have done what is possible to do
in the extension.


So my question is should we allow such use through an extension and in
this case what is the change to PostgreSQL code that could avoid the
assert crash? Or perhaps we have missed something in this extension to
be able to make the assert happy but I don't think so.

Cheers

--
Gilles Darold





​generated columns in MergeAttributesIntoExisting

2020-11-12 Thread Sergei Kornilov
Hello

I found a simple test case:

CREATE TABLE test(id int NOT NULL, gen int GENERATED ALWAYS AS (id + 1) STORED) 
partition by range (id);
create table test_part_create partition of test for values from ( 0) to (10);
create table test_part_attach (id int NOT NULL, gen int);
alter table test attach partition test_part_attach for values from (10) to (20);

postgres=# \d test_part_create 
Table "public.test_part_create"
 Column |  Type   | Collation | Nullable |   Default   
+-+---+--+-
 id | integer |   | not null | 
 gen| integer |   |  | generated always as (id + 1) stored
Partition of: test FOR VALUES FROM (0) TO (10)

postgres=# \d test_part_attach 
  Table "public.test_part_attach"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 id | integer |   | not null | 
 gen| integer |   |  | 
Partition of: test FOR VALUES FROM (10) TO (20)

Both partitions are attached, but alter table attach patition did not check nor 
enforce a generated column. Same for table inheritance stuff. While looking at 
MergeAttributesIntoExisting in src/backend/commands/tablecmds.c I did not 
notice any special handling or comments for attgenerated. It's an oversight and 
a bug?

Also,
postgres=# alter table test alter COLUMN gen drop expression ;
ERROR:  column "gen" of relation "test_part_attach" is not a stored generated 
column

Regards, Sergei




Re: Asynchronous Append on postgres_fdw nodes.

2020-11-12 Thread Etsuro Fujita
On Thu, Oct 8, 2020 at 8:40 PM Andrey Lepikhov
 wrote:
> I want to suggest one more improvement. Currently the
> is_async_capable_path() routine allow only ForeignPath nodes as async
> capable path. But in some cases we can allow SubqueryScanPath as async
> capable too.

> The patch in attachment tries to improve this situation.

Seems like a good idea.  Will look at the patch in detail.

Best regards,
Etsuro Fujita




Re: Asynchronous Append on postgres_fdw nodes.

2020-11-12 Thread Etsuro Fujita
On Thu, Oct 8, 2020 at 6:39 PM Andrey V. Lepikhov
 wrote:
> I found a small problem. If we have a mix of async and sync subplans
> when we catch an assertion on a busy connection. Just for example:
>
> PLAN
> 
> Nested Loop  (cost=100.00..174316.95 rows=975 width=8) (actual
> time=5.191..9.262 rows=9 loops=1)
> Join Filter: (frgn.a = l.a)
> Rows Removed by Join Filter: 8991
> ->  Append  (cost=0.00..257.20 rows=11890 width=4) (actual
> time=0.419..2.773 rows=1000 loops=1)
>   Async subplans: 4
>   ->  Async Foreign Scan on f_1 l_2  (cost=100.00..197.75
> rows=2925 width=4) (actual time=0.381..0.585 rows=211 loops=1)
>   ->  Async Foreign Scan on f_2 l_3  (cost=100.00..197.75
> rows=2925 width=4) (actual time=0.005..0.206 rows=195 loops=1)
>   ->  Async Foreign Scan on f_3 l_4  (cost=100.00..197.75
> rows=2925 width=4) (actual time=0.003..0.282 rows=187 loops=1)
>   ->  Async Foreign Scan on f_4 l_5  (cost=100.00..197.75
> rows=2925 width=4) (actual time=0.003..0.316 rows=217 loops=1)
>   ->  Seq Scan on l_0 l_1  (cost=0.00..2.90 rows=190 width=4)
> (actual time=0.017..0.057 rows=190 loops=1)
> ->  Materialize  (cost=100.00..170.94 rows=975 width=4) (actual
> time=0.001..0.002 rows=9 loops=1000)
>   ->  Foreign Scan on frgn  (cost=100.00..166.06 rows=975
> width=4) (actual time=0.766..0.768 rows=9 loops=1)

Actually I also found a similar issue before [1].  But in the first
place I'm not sure the way of handling concurrent data fetches by
multiple ForeignScan nodes using the same connection in postgres_fdw
implemented in Horiguchi-san's patch would be really acceptable,
because that would impact performance *negatively* in some cases as
mentioned in [1].  So I feel inclined to just disable this feature in
problematic cases including the above one in the first cut.  Even with
such a limitation, I think it would be useful, because it would cover
typical use cases such as partitionwise joins and partitionwise
aggregates.

Thanks for the report!

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAPmGK16E1erFV9STg8yokoewY6E-zEJtLzHUJcQx%2B3dyivCT%3DA%40mail.gmail.com




Re: logical streaming of xacts via test_decoding is broken

2020-11-12 Thread Dilip Kumar
On Tue, Nov 10, 2020 at 7:20 PM Amit Kapila  wrote:
>
> On Tue, Nov 10, 2020 at 2:25 PM Dilip Kumar  wrote:
> >
> > On Tue, Nov 10, 2020 at 11:18 AM Dilip Kumar  wrote:
> > >
> > > On Tue, Nov 10, 2020 at 10:52 AM Amit Kapila  
> > > wrote:
> > > > For this case, users can use skip_empty_xacts = true and
> > > > skip_empty_streams = false. I am just asking if the user has only used
> > > > skip_empty_xacts = true and didn't use the 'skip_empty_streams'
> > > > option.
> > >
> > > Ok, thanks for the clarification.
> > >
> >
> > I have prepared a patch for the same.
> >
>
> Few comments:
> 1.
> + else if (strcmp(elem->defname, "skip-empty-streams") == 0)
> + {
> + if (elem->arg == NULL)
> + data->skip_empty_streams = true;
> + else if (!parse_bool(strVal(elem->arg), >skip_empty_streams))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("could not parse value \"%s\" for parameter \"%s\"",
> + strVal(elem->arg), elem->defname)));
> + if (!data->skip_empty_xacts && data->skip_empty_streams)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("the skip-empty-streams can not be true if skip-empty-xacts
> is false")));
>   }
>
> You can probably add a comment as to why we are disallowing this case.
> I thought of considering 'stream-changes' parameter here because it
> won't make sense to give this parameter without it, however, it seems
> that is not necessary but maybe adding a comment
> here in that regard would be a good idea.

As per our latest discussion, I have removed the extra input parameter
so this comment is not needed now.

> 2.
> pg_decode_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
>  {
>   TestDecodingData *data = ctx->output_plugin_private;
> + TestDecodingTxnData *txndata =
> + MemoryContextAllocZero(ctx->context, sizeof(TestDecodingTxnData));
> +
>
> Shall we free this memory at commit time for the sake of consistency,
> otherwise also it would be freed with decoding context?

Done

> 3. Can you please prepare a separate patch for test case changes so
> that it would be easier to verify that it fails without the patch and
> passed after the patch?

Done

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v2-0001-Bug-fix-skip-empty-xacts-in-streaming-mode.patch
Description: Binary data


v2-0002-Test-case-to-test-the-interleaved-empty-transacti.patch
Description: Binary data


Re: PATCH: Batch/pipelining support for libpq

2020-11-12 Thread Matthieu Garrigues
Hi David,

Thanks for the feedback. I did rework a bit the doc based on your
remarks. Here is the v24 patch.

Matthieu Garrigues

On Tue, Nov 3, 2020 at 6:21 PM David G. Johnston
 wrote:
>
> On Mon, Nov 2, 2020 at 8:58 AM Alvaro Herrera  wrote:
>>
>> On 2020-Nov-02, Alvaro Herrera wrote:
>>
>> > In v23 I've gone over docs; discovered that PQgetResults docs were
>> > missing the new values.  Added those.  No significant other changes yet.
>>
>
> Just reading the documentation of this patch, haven't been following the 
> longer thread:
>
> Given the caveats around blocking mode connections why not just require 
> non-blocking mode, in a similar fashion to how synchronous functions are 
> disallowed?
>
> "Batched operations will be executed by the server in the order the client
> sends them. The server will send the results in the order the statements
> executed."
>
> Maybe:
>
> "The server executes statements, and returns results, in the order the client 
> sends them."
>
> Using two sentences and relying on the user to mentally link the two "in the 
> order" descriptions together seems to add unnecessary cognitive load.
>
> + The client interleaves result
> + processing with sending batch queries, or for small batches may
> + process all results after sending the whole batch.
>
> Suggest: "The client may choose to interleave result processing with sending 
> batch queries, or wait until the complete batch has been sent."
>
> I would expect to process the results of a batch only after sending the 
> entire batch to the server.  That I don't have to is informative but knowing 
> when I should avoid doing so, and why, is informative as well.  To the 
> extreme while you can use batch mode and interleave if you just poll 
> getResult after every command you will make the whole batch thing pointless.  
> Directing the reader from here to the section "Interleaving Result Processing 
> and Query Dispatch" seems worth considering.  The dynamics of small sizes and 
> sockets remains a bit unclear as to what will break (if anything, or is it 
> just process memory on the server) if interleaving it not performed and sizes 
> are large.
>
> I would suggest placing commentary about "all transactions subsequent to a 
> failed transaction in a batch are ignored while previous completed 
> transactions are retained" in the "When to Use Batching".  Something like 
> "Batching is less useful, and more complex, when a single batch contains 
> multiple transactions (see Error Handling)."
>
> My imagined use case would be to open a batch, start a transaction, send all 
> of its components, end the transaction, end the batch, check for batch 
> failure and if it doesn't fail have the option to easily continue without 
> processing individual pgResults (or if it does fail, have the option to 
> extract the first error pgResult and continue, ignoring the rest, knowing 
> that the transaction as a whole was reverted and the batch unapplied).  I've 
> never interfaced with libpq directly.  Though given how the existing C API 
> works what is implemented here seems consistent.
>
> The "queueing up queries into a pipeline to be executed as a batch on the 
> server" can be read as a client-side behavior where nothing is sent to the 
> server until the batch has been completed.  Reading further it becomes clear 
> that all it basically is is a sever-side toggle that instructs the server to 
> continue processing incoming commands even while prior commands have their 
> results waiting to be ingested by the client.
>
> Batch seems like the user-visible term to describe this feature.  Pipeline 
> seems like an implementation detail that doesn't need to be mentioned in the 
> documentation - especially given that pipeline doesn't get a mentioned beyond 
> the first two paragraphs of the chapter and never without being linked 
> directly to "batch".  I would probably leave the indexterm and have a 
> paragraph describing that batching is implemented using a query pipeline so 
> that people with the implementation detail on their mind can find this 
> chapter, but the prose for the user should just stick to batching.
>
> Sorry, that all is a bit unfocused, but the documentation for the user of the 
> API could be cleaned up a bit and some more words spent on what trade-offs 
> are being made when using batching versus normal command-response processing. 
>  That said, while I don't see all of this purely a matter of style I'm also 
> not seeing anything demonstrably wrong with the documentation at the moment.  
> Hopefully my perspective helps though, and depending on what happens next I 
> may try and make my thoughts more concrete with an actual patch.
>
> David J.
>
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 9ce32fb39b..044e3a705f 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3061,6 +3061,30 @@ ExecStatusType PQresultStatus(const PGresult *res);

   
  

Re: Feature request: Improve allowed values for generate series

2020-11-12 Thread Eugen Konkov
Title: Re: Feature request: Improve allowed values for generate series


Hello David,

I have a table with services, each service have a period. After which service is auto renewal

Services also could be one-time. At this case its interval is '00:00:00'

The renewal is calculated via generate_series, when interval '00:00:00' pass to that function
query died =(

Generate dates for one time service:
test=> SELECT * FROM generate_series( '2020-11-09', '2020-11-09', INTERVAL '00:00:00' );
    generate_series

 2020-11-09 00:00:00+02
(1 row)

Generate dates for auto-renew service:
test=> SELECT * FROM generate_series( '2020-11-09', '2020-11-10', INTERVAL '1 day' );
    generate_series    

 2020-11-09 00:00:00+02
 2020-11-10 00:00:00+02
(2 rows)

So it is useful in my case. Also behavior is not surprising.



Wednesday, November 11, 2020, 9:17:28 PM, you wrote:





On Wed, Nov 11, 2020 at 12:12 PM Eugen Konkov  wrote:





> So   I   feature  request  to  allow  zero size step for cases when start point is equest to finish

> What do you think?



hm  probably  with  step 0 we always should generate series of one
value and exit, despite on finish value.
Because  with  step  0 we always stay at current position, so there is
always should be just one value.




How is this better than writing "VALUES (start date)"?

David J.






--
Best regards,
Eugen Konkov





Re: Skip ExecCheckRTPerms in CTAS with no data

2020-11-12 Thread Michael Paquier
On Wed, Nov 11, 2020 at 07:31:49PM +0530, Bharath Rupireddy wrote:
> Yes we do not have anything related to permissions mentioned in the
> documentation. So, I'm not adding it now.

It would be good to clarify that in the docs while we are on it.

> Apart from the above, I also noticed that we unnecessarily allocate
> bulk insert state(16MB memory) in case of WITH NO DATA, just to free
> it in intorel_shutdown() without actually using it. So, in the v2
> patch I have made changes to not allocate bulk insert state.
> 
> Attaching v2 patch. Consider it for further review.

+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
+   CREATE TABLE selinto_schema.tmp0 (a) AS
+SELECT oid FROM pg_class WHERE relname like '%c%' WITH NO DATA; -- OK

I don't think this is sufficient.  Could you add more test cases here?
I can think of, coming down actually to the callers of
CreateIntoRelDestReceiver:
- A plain CTAS WITH NO DATA, that should pass,
- CTAS EXECUTE WITH NO DATA, that should pass.
- CTAS EXECUTE WITH DATA, that should not pass.
- EXPLAIN CTAS WITH DATA, that should not pass.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] logical decoding of two-phase transactions

2020-11-12 Thread Ajin Cherian
On Wed, Nov 11, 2020 at 12:35 AM Amit Kapila  wrote:
 I have rearranged the code in DecodeCommit/Abort/Prepare so
> that it does only the required things (like in DecodeCommit was still
> processing subtxns even when it has to just perform FinishPrepared,
> also the stats were not updated properly which I have fixed.) and
> added/edited the comments. Apart from 0001 and 0002, I have not
> changed anything in the remaining patches.

One small comment on the patch:

- DecodeCommit(ctx, buf, , xid);
+ /*
+ * If we have already decoded this transaction data then
+ * DecodeCommit doesn't need to decode it again. This is
+ * possible iff output plugin supports two-phase commits and
+ * doesn't skip the transaction at prepare time.
+ */
+ if (info == XLOG_XACT_COMMIT_PREPARED && ctx->twophase)
+ {
+ already_decoded = !(ctx->callbacks.filter_prepare_cb &&
+ ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed.twophase_gid));
+ }
+

Just a small nitpick but the way already_decoded is assigned here is a
bit misleading. It appears that the callbacks determine if the
transaction is already decoded when in
reality the callbacks only decide if the transaction should skip two
phase commits. I think it's better to either move it to the if
condition or if that is too long then have one more variable
skip_twophase.

if (info ==  XLOG_XACT_COMMIT_PREPARED && ctx->twophase &&
 !(ctx->callbacks.filter_prepare_cb &&
ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed.twophase_gid)))
already_decoded = true;

OR
bool skip_twophase = false;
skip_twophase =  !(ctx->callbacks.filter_prepare_cb &&
ReorderBufferPrepareNeedSkip(ctx->reorder, xid, parsed.twophase_gid));
if (info ==  XLOG_XACT_COMMIT_PREPARED && ctx->twophase && skip_twophase)
already_decoded = true;

regards,
Ajin Cherian
Fujitsu Australia




Re: Implementing Incremental View Maintenance

2020-11-12 Thread Yugo NAGATA
On Thu, 5 Nov 2020 22:58:25 -0600
Justin Pryzby  wrote:

> On Mon, Oct 05, 2020 at 06:16:18PM +0900, Yugo NAGATA wrote:
> This needs to be rebased again - the last version doesn't apply anymore.
> http://cfbot.cputube.org/yugo-nagata.html

I attached the rebased patch (v19).
 
> I looked though it a bit and attach some fixes to the user-facing docs.

Thank you for pointing out a lot of typos and making the patch to fix it!
Your fixes are included in the latest patch.
 
> There's some more typos in the source that I didn't fix:
> constrains
> materliazied
> cluase
> immediaite
> clumn
> Temrs
> migth
> recalculaetd
> speified
> secuirty
> 
> commit message: comletion
>
> psql and pg_dump say 13 but should say 14 now:
> pset.sversion >= 13 

These were also fixed.
 
> # bag union
> big union?

"bag union" is union operation of bag (multi-set) that does not eliminate 
duplicate of tuples.
 
> +   relisivm bool
> +  
> +  
> +   True if materialized view enables incremental view maintenance
> 
> This isn't clear, but I think it should say "True for materialized views which
> are enabled for incremental view maintenance (IVM)."

Yes, you are right. I also fixed it in this way.

Regards,
Yugo Nagata


-- 
Yugo NAGATA 


IVM_patches_v19.tar.gz
Description: application/gzip


Re: Add session statistics to pg_stat_database

2020-11-12 Thread Georgios



‐‐‐ Original Message ‐‐‐
On Thursday, November 12, 2020 9:31 AM, Laurenz Albe  
wrote:

> I wrote:
>
> > On Tue, 2020-11-10 at 15:03 +, Georgios Kokolatos wrote:
> >
> > > I noticed that the cfbot fails for this patch.
> > > For this, I am setting the status to: 'Waiting on Author'.
> >
> > Thanks for noticing, it was only the documentation build.
> > Version 5 attached, status changed back to "waiting for review".
>
> The patch is still failing, so I looked again:
>
> make[3]: Entering directory 
> '/home/travis/build/postgresql-cfbot/postgresql/doc/src/sgml'
> { \
> echo ""; \
>
> echo ""; \\
>
>
> } > version.sgml
> '/usr/bin/perl' ./mk_feature_tables.pl YES 
> ../../../src/backend/catalog/sql_feature_packages.txt 
> ../../../src/backend/catalog/sql_features.txt > features-supported.sgml
> '/usr/bin/perl' ./mk_feature_tables.pl NO 
> ../../../src/backend/catalog/sql_feature_packages.txt 
> ../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml
> '/usr/bin/perl' ./generate-errcodes-table.pl 
> ../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
> '/usr/bin/perl' ./generate-keywords-table.pl . > keywords-table.sgml
> /usr/bin/xmllint --path . --noout --valid postgres.sgml
> error : Unknown IO error
> postgres.sgml:21: /usr/bin/bison -Wno-deprecated -d -o gram.c gram.y
> warning: failed to load external entity 
> "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd;
> ]>
>
> ^
>
>
> postgres.sgml:23: element book: validity error : No declaration for attribute 
> id of element book
> 
>
>  ^
>
>
> postgres.sgml:24: element title: validity error : No declaration for element 
> title
> PostgreSQL  Documentation
>
> I have the impression that this is not the fault of my patch, something seems 
> to be
> wrong with the cfbot.
>
> I see that other patches are failing with the same error.

You are indeed correct. Unfortunately the cfbot is a bit unstable due
to some issues related to the documentation. I alerted a contributor
and he was quick to try to address the issue in pgsql-www [1].

Thank you very much for looking and apologies for the chatter.

>
> Yours,
> Laurenz Albe

[1] 
https://www.postgresql.org/message-id/E2EE6B76-2D96-408A-B961-CAE47D1A86F0%40yesql.se




Re: Add session statistics to pg_stat_database

2020-11-12 Thread Laurenz Albe
I wrote:
> On Tue, 2020-11-10 at 15:03 +, Georgios Kokolatos wrote:
> > I noticed that the cfbot fails for this patch.
> > 
> > For this, I am setting the status to: 'Waiting on Author'.
> 
> Thanks for noticing, it was only the documentation build.
> 
> Version 5 attached, status changed back to "waiting for review".

The patch is still failing, so I looked again:

  make[3]: Entering directory 
'/home/travis/build/postgresql-cfbot/postgresql/doc/src/sgml'
  { \
echo ""; \
echo ""; \
  } > version.sgml
  '/usr/bin/perl' ./mk_feature_tables.pl YES 
../../../src/backend/catalog/sql_feature_packages.txt 
../../../src/backend/catalog/sql_features.txt > features-supported.sgml
  '/usr/bin/perl' ./mk_feature_tables.pl NO 
../../../src/backend/catalog/sql_feature_packages.txt 
../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml
  '/usr/bin/perl' ./generate-errcodes-table.pl 
../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
  '/usr/bin/perl' ./generate-keywords-table.pl . > keywords-table.sgml
  /usr/bin/xmllint --path . --noout --valid postgres.sgml
  error : Unknown IO error
  postgres.sgml:21: /usr/bin/bison -Wno-deprecated  -d -o gram.c gram.y
  warning: failed to load external entity 
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd;
  ]>
^
  postgres.sgml:23: element book: validity error : No declaration for attribute 
id of element book
  
 ^
  postgres.sgml:24: element title: validity error : No declaration for element 
title
   PostgreSQL  Documentation

I have the impression that this is not the fault of my patch, something seems 
to be
wrong with the cfbot.

I see that other patches are failing with the same error.

Yours,
Laurenz Albe





Re: Skip ExecCheckRTPerms in CTAS with no data

2020-11-12 Thread Michael Paquier
On Wed, Nov 11, 2020 at 01:34:05PM +0530, Bharath Rupireddy wrote:
> The ExecCheckRTPerms() with ACL_INSERT permission will be called
> before inserting the data to the table that's created with CREATE AS
> WITH NO DATA.

Perhaps you meant s/WITH NO DATA/WITH DATA/ here?

> The insertion into the table can happen either with
> INSERT command(ExecCheckRTPerms() with ACL_INSERT permission will be
> called from InitPlan()) or with COPY FROM command(ExecCheckRTPerms()
> with ACL_INSERT permission will be called from DoCopy()).
> 
> Effectively, we are not bypassing the call to
> ExecutorCheckPerms_hook_type. Unless I miss anything else, I think it
> makes sense to skip ExecCheckRTPerms() for CTAS WITH NO DATA.

Oh, I see what you mean here.  If you have a EXPLAIN ANALYZE CTAS or
CTAS EXECUTE, then we forbid the creation of the table if the user has
no INSERT rights, while we actually allow the creation of the table
when using WITH NO DATA for a plain CTAS:
--- a/src/test/regress/sql/select_into.sql
+++ b/src/test/regress/sql/select_into.sql
@@ -34,6 +34,9 @@ SELECT oid AS clsoid, relname, relnatts + 10 AS x
 CREATE TABLE selinto_schema.tmp3 (a,b,c)
AS SELECT oid,relname,relacl FROM pg_class
WHERE relname like '%c%';-- Error
+CREATE TABLE selinto_schema.tmp4 (a,b,c)
+  AS SELECT oid,relname,relacl FROM pg_class
+  WHERE relname like '%c%' WITH NO DATA; -- ok
+EXPLAIN ANALYZE CREATE TABLE selinto_schema.tmp5 (a,b,c)
+   AS SELECT oid,relname,relacl FROM pg_class
+  WHERE relname like '%c%' WITH NO DATA; -- error
 RESET SESSION AUTHORIZATION;

What your patch set does is to allow the second case to pass (or even
the EXECUTE case to pass).  HEAD is indeed a bit inconsistent as it is
now in this area.
--
Michael


signature.asc
Description: PGP signature


Re: ModifyTable overheads in generic plans

2020-11-12 Thread Amit Langote
On Wed, Nov 11, 2020 at 10:14 PM Heikki Linnakangas  wrote:
> I'm still a bit confused and unhappy about the initialization of
> ResultRelInfos and the various fields in them. We've made progress in
> the previous patches, but it's still a bit messy.
>
> >   /*
> >* If transition tuples will be captured, initialize a map to 
> > convert
> >* child tuples into the format of the table mentioned in the 
> > query
> >* (root relation), because the transition tuple store can 
> > only store
> >* tuples in the root table format.  However for INSERT, the 
> > map is
> >* only initialized for a given partition when the partition 
> > itself is
> >* first initialized by ExecFindPartition.  Also, this map is 
> > also
> >* needed if an UPDATE ends up having to move tuples across
> >* partitions, because in that case the child tuple to be 
> > moved first
> >* needs to be converted into the root table's format.  In 
> > that case,
> >* we use GetChildToRootMap() to either create one from 
> > scratch if
> >* we didn't already create it here.
> >*
> >* Note: We cannot always initialize this map lazily, that 
> > is, use
> >* GetChildToRootMap(), because AfterTriggerSaveEvent(), 
> > which needs
> >* the map, doesn't have access to the "target" relation that 
> > is
> >* needed to create the map.
> >*/
> >   if (mtstate->mt_transition_capture && operation != CMD_INSERT)
> >   {
> >   Relationrelation = 
> > resultRelInfo->ri_RelationDesc;
> >   RelationtargetRel = 
> > mtstate->rootResultRelInfo->ri_RelationDesc;
> >
> >   resultRelInfo->ri_ChildToRootMap =
> >   
> > convert_tuples_by_name(RelationGetDescr(relation),
> >  
> > RelationGetDescr(targetRel));
> >   /* First time creating the map for this result 
> > relation. */
> >   Assert(!resultRelInfo->ri_ChildToRootMapValid);
> >   resultRelInfo->ri_ChildToRootMapValid = true;
> >   }
>
> The comment explains that AfterTriggerSaveEvent() cannot use
> GetChildToRootMap(), because it doesn't have access to the root target
> relation. But there is a field for that in ResultRelInfo:
> ri_PartitionRoot. However, that's only set up when we do partition routing.
>
> How about we rename ri_PartitionRoot to e.g ri_RootTarget, and set it
> always, even for non-partition inheritance? We have that information
> available when we initialize the ResultRelInfo, so might as well.

Yeah, I agree it's better to use ri_PartitionRoot more generally like
you describe here.

> Some code currently checks ri_PartitionRoot, to determine if a tuple
> that's been inserted, has been routed. For example:
>
> >   /*
> >* Also check the tuple against the partition constraint, if 
> > there is
> >* one; except that if we got here via tuple-routing, we 
> > don't need to
> >* if there's no BR trigger defined on the partition.
> >*/
> >   if (resultRelationDesc->rd_rel->relispartition &&
> >   (resultRelInfo->ri_PartitionRoot == NULL ||
> >(resultRelInfo->ri_TrigDesc &&
> > 
> > resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
> >   ExecPartitionCheck(resultRelInfo, slot, estate, true);
>
> So if we set ri_PartitionRoot always, we would need some other way to
> determine if the tuple at hand has actually been routed or not. But
> wouldn't that be a good thing anyway? Isn't it possible that the same
> ResultRelInfo is sometimes used for routed tuples, and sometimes for
> tuples that have been inserted/updated "directly"?
> ExecLookupUpdateResultRelByOid() sets that field lazily, so I think it
> would be possible to get here with ri_PartitionRoot either set or not,
> depending on whether an earlier cross-partition update was routed to the
> table.

ri_RelationDesc != ri_PartitionRoot gives whether the result relation
is the original target relation of the query or not, so checking that
should be enough here.

> The above check is just an optimization, to skip unnecessary
> ExecPartitionCheck() calls, but I think this snippet in
> ExecConstraints() needs to get this right:
>
> >   /*
> >* If the tuple has been routed, it's been 
> > converted to the
> >* partition's rowtype, which might differ 
> > from the root
> >* table's.  We must convert it back to