Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-05 Thread Amit Kapila
On Fri, Aug 6, 2021 at 10:09 AM Amit Kapila  wrote:
>
> On Fri, Aug 6, 2021 at 5:09 AM Masahiko Sawada  wrote:
> >
>
> But, isn't this happening because of your suggestion to compare the
> current set of relations with relations from publications that doesn't
> need to be removed? Do we have better ideas to fix or shall we just
> say that we will refresh based on whatever current set of relations
> are present in publication to be dropped?
>

The other options could be that (a) for drop publication, we refresh
all the publications, (b) for both add/drop publication, we refresh
all the publications.

Any better ideas?

As this is introduced in PG-14 via the below commit, I am adding
authors and committer to see if they have any better ideas.

commit 82ed7748b710e3ddce3f7ebc74af80fe4869492f
Author: Peter Eisentraut 
Date:   Tue Apr 6 10:44:26 2021 +0200

ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION

 Author: Japin Li 
 Author: Bharath Rupireddy 


-- 
With Regards,
Amit Kapila.




Re: RFC: Improve CPU cache locality of syscache searches

2021-08-05 Thread Yura Sokolov

Andres Freund писал 2021-08-06 06:49:

Hi,

On 2021-08-06 06:43:55 +0300, Yura Sokolov wrote:
Why don't use simplehash or something like that? Open-addressing 
schemes

show superior cache locality.


I thought about that as well - but it doesn't really resolve the 
question of
what we want to store in-line in the hashtable and what not. We can't 
store
the tuples themselves in the hashtable for a myriad of reasons (need 
pointer
stability, they're variably sized, way too large to move around 
frequently).



Well, simplehash entry will be 24 bytes this way. If simplehash 
template
supports external key/element storage, then it could be shrunk to 16 
bytes,

and syscache entries will not need dlist_node. (But it doesn't at the
moment).


I think storing keys outside of the hashtable entry defeats the purpose 
of the
open addressing, given that they are always checked and that our 
conflict

ratio should be fairly low.


It's opposite: if conflict ratio were high, then key outside of 
hashtable will
be expensive, since lookup to non-matched key will cost excess memory 
access.
But with low conflict ratio we will usually hit matched entry at first 
probe.
And since we will use entry soon, it doesn't matter when it will go to 
CPU L1

cache: during lookup or during actual usage.

regards,
Yura Sokolov




Re: .ready and .done files considered harmful

2021-08-05 Thread Kyotaro Horiguchi
At Fri, 6 Aug 2021 02:34:24 +, "Bossart, Nathan"  
wrote in 
> On 8/5/21, 6:26 PM, "Kyotaro Horiguchi"  wrote:
> > It works the current way always at the first iteration of
> > pgarch_ArchiveCopyLoop() becuse in the last iteration of
> > pgarch_ArchiveCopyLoop(), pgarch_readyXlog() erases the last
> > anticipated segment.  The shortcut works only when
> > pgarch_ArchiveCopyLoop archives more than once successive segments at
> > once.  If the anticipated next segment found to be missing a .ready
> > file while archiving multiple files, pgarch_readyXLog falls back to
> > the regular way.
> >
> > So I don't see the danger to happen perhaps you are considering.
> 
> I think my concern is that there's no guarantee that we will ever do
> another directory scan.  A server that's generating a lot of WAL could
> theoretically keep us in the next-anticipated-log code path
> indefinitely.

Theoretically possible. Supposing that .ready may be created
out-of-order (for the following reason, as a possibility), when once
the fast path bailed out then the fallback path finds that the second
oldest file has .ready, the succeeding fast path continues running
leaving the oldest file.

> > In the first place, .ready are added while holding WALWriteLock in
> > XLogWrite, and while removing old segments after a checkpoint (which
> > happens while recovery). Assuming that no one manually remove .ready
> > files on an active server, the former is the sole place doing that. So
> > I don't see a chance that .ready files are created out-of-order way.
> 
> Perhaps a more convincing example is when XLogArchiveNotify() fails.
> AFAICT this can fail without ERROR-ing, in which case the server can
> continue writing WAL and creating .ready files for later segments.  At
> some point, the checkpointer process will call RemoveOldXlogFiles()
> and try to create the missing .ready file.

Mmm. Assuming that could happen, a history file gets cursed to lose a
chance to be archived forever once that disaster falls onto it.  Apart
from this patch, maybe we need a measure to notify the history files
that are once missed a chance.

Assuming that all such forgotten files would be finally re-marked as
.ready anywhere, they can be re-found by archiver by explicitly
triggering the fallback path.  Currently the trigger fires implicitly
by checking shared timeline movement, but by causing the trigger by,
for example by a signal as mentioned in a nearby message, that
behavior would be easily to implement.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-05 Thread Amit Kapila
On Fri, Aug 6, 2021 at 5:09 AM Masahiko Sawada  wrote:
>
> On Thu, Aug 5, 2021 at 11:40 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > > To summary, I think that what we want to do in DROP SUBSCRIPTION cases is 
> > > to
> > > drop relations from pg_subscription_rel that are no longer included in 
> > > the set of
> > > after-calculated publications. In the latter example, when ALTER
> > > SUBSCRIPTION ... DROP PUBLICATION pub12, we should compare the current
> > > set of relations associated with {pub12, pub13} to the set of relcations 
> > > associated
> > > with pub13, not pub12. Then we can find out t2 is no longer associated 
> > > with the
> > > after-calculated publication (i.g., pub13) so remove it.
> >
> > Thanks for the review. You are right, I missed this point.
> > Attach new version patch which fix these problems(Also add a new pl-test).
> >
>
> Thank you for updating the patch!
>
> Here are some comments:
>
> I think that it still could happen that DROP PULIBCATION misses
> dropping relations. Please consider the following case:
>
> On publisher and subscriber:
> create table t1 (a int);
> create table t2 (a int);
> create table t3 (a int);
>
> On publisher:
> create publication pub12 for table t1, t2;
> create publication pub3 for table t3;
>
> On subscriber:
> create subscription sub connection 'dbname=postgres' publication pub12, pub3;
>
> On publisher:
> alter publication pub3 add table t1;
>
> On subscriber:
> alter subscription sub drop publication pub12;
> select srsubid, srrelid::regclass::text, srsubstate, srsublsn from
> pg_subscription_rel;
>  srsubid | srrelid | srsubstate | srsublsn
> -+-++---
>16393 | t3  | r  | 0/1707CE0
>16393 | t1  | r  | 0/1707D18
>
> With v3 patch, pg_subscription_rel has entries for t3 and t1. But if
> ALTER SUBSCRIPTION ... DROP PUBLICATION pub12 refreshes only relations
> associated with pub12, t1 should be removed and be added when we
> refresh pub3.
>

But, isn't this happening because of your suggestion to compare the
current set of relations with relations from publications that doesn't
need to be removed? Do we have better ideas to fix or shall we just
say that we will refresh based on whatever current set of relations
are present in publication to be dropped?

One more thing, I think it would be better to write a separate refresh
function for add/drop rather than adding checks in the current refresh
functionality. We can extract common functionality code which can be
called from the different refresh functions.

-- 
With Regards,
Amit Kapila.




Re: Commitfest overflow

2021-08-05 Thread Noah Misch
On Thu, Aug 05, 2021 at 03:06:54PM +0200, Tomas Vondra wrote:
> On 8/5/21 8:39 AM, Andrey Borodin wrote:
> >>...
> >>
> >>Early commitfests recognized a rule that patch authors owed one review per
> >>patch registered in the commitfest.  If authors were holding to that, then
> >>both submissions and reviews would slow during vacations, but the neglected
> >>fraction of the commitfest would be about the same.  I think it would help 
> >>to
> >>track each author's balance (reviews_done - reviews_owed).
> >
> >+1 for tracking this.
> 
> Yeah, I agree we should be stricter about this rule, but I'm somewhat
> skeptical about tracking it in the CF app - judging patch and review
> complexity seems quite subjective, etc.

The CF app presently lacks the data to track this, but with relevant feature
additions, I suspect it would be the best place to manage tracking.  Something
like this: when a CF app user changes a patch status from Needs Review or
Ready for Committer to anything else, invite that user to name a recipient of
review credit.

> >BTW when review is done? When first revision is published? Or when patch is 
> >committed\rollbacked?
> >When the review is owed? At the moment when patch is submitted? Or when it 
> >is committed?

Those are important questions.  Here are answers that feel reasonable to me,
but they may need refinement.  Anyone can increase their reviews_done by
sending a review that causes a patch to leave Needs Review status in the CF
app for the first time in a given commitfest.  Trivial defect reports,
e.g. that the patch doesn't compile, are not reviews; the person setting
Waiting on Author shall not assign review credit.  Committers can increase
their reviews_done by sending a review that causes a patch to leave Ready for
Committer status for the first time in a given commitfest; however, nobody
receives two credits for the same patch, in the same commitfest.  Whenever a
CF entry is increasing someone's reviews_done, it also increases reviews_owed
for the entry's author.  What do you think?

(Consequences: each CF entry can increment reviews_done of up to two users per
commitfest, but it increments reviews_owed just once.  If the patch bounces
between Needs Review and Waiting on Author several times in a single CF, that
counts as a total of one review.)

> I think the rule is roughly that when you submit a patch to a CF, you're
> expected to review a patch of comparable complexity in the same CF. It's not
> tied to whether the patch is committed, etc.

Yep.  Would we get reasonable incentives without tracking "comparable
complexity" explicitly?  That bit is harder to judge.




Re: RFC: Improve CPU cache locality of syscache searches

2021-08-05 Thread Andres Freund
Hi,

On 2021-08-06 06:43:55 +0300, Yura Sokolov wrote:
> Why don't use simplehash or something like that? Open-addressing schemes
> show superior cache locality.

I thought about that as well - but it doesn't really resolve the question of
what we want to store in-line in the hashtable and what not. We can't store
the tuples themselves in the hashtable for a myriad of reasons (need pointer
stability, they're variably sized, way too large to move around frequently).


> Well, simplehash entry will be 24 bytes this way. If simplehash template
> supports external key/element storage, then it could be shrunk to 16 bytes,
> and syscache entries will not need dlist_node. (But it doesn't at the
> moment).

I think storing keys outside of the hashtable entry defeats the purpose of the
open addressing, given that they are always checked and that our conflict
ratio should be fairly low.

Greetings,

Andres Freund




Re: RFC: Improve CPU cache locality of syscache searches

2021-08-05 Thread Yura Sokolov

Andres Freund писал 2021-08-05 23:12:

Hi,

On 2021-08-05 12:27:49 -0400, John Naylor wrote:
On Wed, Aug 4, 2021 at 3:44 PM Andres Freund  
wrote:

> On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> > typedef struct cc_bucket
> > {
> >   uint32 hashes[4];
> >   catctup *ct[4];
> >   dlist_head;
> > };
>
> I'm not convinced that the above the right idea though. Even if the hash
> matches, you're still going to need to fetch at least catctup->keys[0]
from
> a separate cacheline to be able to return the cache entry.

I see your point. It doesn't make sense to inline only part of the
information needed.


At least not for the unconditionally needed information.


Although I'm guessing inlining just two values in the 4-key case 
wouldn't

buy much.


Not so sure about that. I'd guess that two key comparisons take more 
cycles
than a cacheline fetch the further keys (perhaps not if we had inlined 
key
comparisons). I.e. I'd expect out-of-order + speculative execution to 
hide the

latency for fetching the second cacheline for later key values.



> If we stuffed four values into one bucket we could potentially SIMD the
hash
> and Datum comparisons ;)

;-) That's an interesting future direction to consider when we support
building with x86-64-v2. It'd be pretty easy to compare a vector of 
hashes
and quickly get the array index for the key comparisons (ignoring for 
the

moment how to handle the rare case of multiple identical hashes).
However, we currently don't memcmp() the Datums and instead call an
"eqfast" function, so I don't see how that part  would work in a 
vector

setting.


It definitely couldn't work unconditionally - we have to deal with 
text,

oidvector, comparisons after all. But we could use it for the other
types. However, looking at the syscaches, I think it'd not very often 
be

applicable for caches with enough columns.


I have wondered before whether we should have syscache definitions 
generate
code specific to each syscache definition. I do think that'd give a 
good bit
of performance boost. But I don't see a trivial way to get there 
without

notational overhead.

We could define syscaches in a header using a macro that's defined 
differently

in syscache.c than everywhere else. The header would declare a set of
functions for each syscache, syscache.c would define them to call an
always_inline function with the relevant constants.

Or perhaps we should move syscache definitions into the pg_*.h headers, 
and
generate the relevant code as part of their processing. That seems like 
it
could be nice from a modularity POV alone. And it could do better than 
the
current approach, because we could hardcode the types for columns in 
the

syscache definition without increasing notational overhead.


Why don't use simplehash or something like that? Open-addressing schemes
show superior cache locality.

It could be combined: use hashValue as a key in simplehash and dlist for
hashValue collision handling. 99.99% of time there will be no collisions
on hashValue itself, therefore it will be almost always two-three memory
lookups: one-two for dlist_head in simple_hash by hashValue and then
fetching first element in dlist.

And code will remain almost same. Just "bucket" search will change a 
bit.


(And I'd recommend use lower fill factor for this simplehash. At most 
0.85).


Well, simplehash entry will be 24 bytes this way. If simplehash template
supports external key/element storage, then it could be shrunk to 16 
bytes,

and syscache entries will not need dlist_node. (But it doesn't at the
moment).

And custom open-addressing table could be even with 8 bytes per element:
- element is a pointer to entry,
- missing node is encoded as NULL,
- with fill factor as low as 0.66 there will be small amount of 
collisions,
  therefore non-empty entry will be matched entry most of time, and 
memory

  lookup for key comparison will be amortized free.
Note that 8byte entry with fill factor 0.66 consumes amortized 12.12 
byte,

while 16byte entry with fill factor 0.99 consumes amortized 16.16byte.

regards,
Yura Sokolov

y.soko...@postgrespro.ru
funny.fal...@gmail.com




Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

2021-08-05 Thread Andres Freund
Hi,

When testing EXEC_BACKEND on linux I see occasional test failures as long as I
don't disable ASLR. There's a code comment to that effect:

 * If testing EXEC_BACKEND on Linux, you should run this as root before
 * starting the postmaster:
 *
 * echo 0 >/proc/sys/kernel/randomize_va_space

but I don't like doing that on a system wide basis.

Linux allows disabling ASLR on a per-process basis using
personality(ADDR_NO_RANDOMIZE). There's a wrapper binary to do that as well,
setarch --addr-no-randomize.

I was wondering if we should have postmaster do personality(ADDR_NO_RANDOMIZE)
for EXEC_BACKEND builds? It seems nicer to make it automatically work than
have people remember that they need to call "setarch --addr-no-randomize make 
check".

Not that it actually matters for EXEC_BACKEND, but theoretically doing
personality(ADDR_NO_RANDOMIZE) in postmaster is a tad more secure than doing
it via setarch, as in the personality() case postmaster's layout itself is
still randomized...


Or perhaps we should just add a comment mentioning setarch.

Greetings,

Andres Freund




Re: Numeric x^y for negative x

2021-08-05 Thread Tom Lane
Dean Rasheed  writes:
> On Thu, 5 Aug 2021 at 17:04, Tom Lane  wrote:
>> It looks like castoroides is not happy with this patch:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=castoroides=2021-08-01%2008%3A52%3A43

> Hmm, there's something very weird going on there.

Yeah.  I tried to reproduce this on the gcc compile farm's Solaris 10
machine, but the test passed fine for me.  The only obvious configuration
difference I can find is that that machine has

$ cc -V
cc: Sun C 5.10 SunOS_sparc Patch 141861-10 2012/11/07

whereas castorides' compiler seems to be a few years older.  So this
does seem like it could be a compiler bug.

regards, tom lane




Re: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS

2021-08-05 Thread Andres Freund
Hi,

On 2021-08-05 20:02:02 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to 
> > treat
> > it as a required flag going forward.
> 
> +1
> 
> > The second question is what we want to do in the backbranches. I think the
> > reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an
> > error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used.
> 
> I think doing nothing is fine.  Given the lack of complaints, we're
> more likely to break something than fix anything useful.

Done in the attached patch. I don't think we need to add more to the docs than
the flag being required?

Greetings,

Andres Freund
>From a07e45853abd231b4ec54fc1e82b7c4a669ce593 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 5 Aug 2021 19:33:00 -0700
Subject: [PATCH] Remove support for background workers without
 BGWORKER_SHMEM_ACCESS.

Background workers without shared memory access have been broken on
EXEC_BACKEND / windows builds since shortly after background workers have been
introduced without that being reported. Clearly they are not commonly used.

The problem is that bgworker startup requires to be attached to shared memory
in EXEC_BACKEND child processes. StartBackgroundWorker() detaches from shared
memory for unconnected workers, but at that point we already have initialized
subsystems referencing shared memory.

Fixing this problem is not entirely trivial, so removing the option seems the
best way forward. In most use cases the advantages of being connected to
shared memory far outweigh the disadvantages.

As there have been no reports about this issue so far, we have decided that it
is not worth trying to address the problem in the back branches.

Per discussion with Alvaro Herrera, Robert Haas and Tom Lane.

Author: Andres Freund 
Discussion: https://postgr.es/m/20210802065116.j763tz3vz4egq...@alap3.anarazel.de
---
 src/include/postmaster/bgworker.h   |  1 +
 src/backend/postmaster/bgworker.c   | 67 +++--
 src/backend/postmaster/postmaster.c | 23 --
 doc/src/sgml/bgworker.sgml  | 10 ++---
 4 files changed, 38 insertions(+), 63 deletions(-)

diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index 8e9ef7c7bfa..6d4122b4c7e 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -48,6 +48,7 @@
 
 /*
  * Pass this flag to have your worker be able to connect to shared memory.
+ * This flag is required.
  */
 #define BGWORKER_SHMEM_ACCESS		0x0001
 
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 11c4ceddbf6..c05f5006393 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -652,17 +652,24 @@ static bool
 SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
 {
 	/* sanity check for flags */
+
+	/*
+	 * We used to support workers not connected to shared memory, but don't
+	 * anymore. Thus this is a required flag now. We're not removing the flag
+	 * for compatibility reasons and because the flag still provides some
+	 * signal when reading code.
+	 */
+	if (!(worker->bgw_flags & BGWORKER_SHMEM_ACCESS))
+	{
+		ereport(elevel,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("background worker \"%s\": background worker without shared memory access are not supported",
+		worker->bgw_name)));
+		return false;
+	}
+
 	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
 	{
-		if (!(worker->bgw_flags & BGWORKER_SHMEM_ACCESS))
-		{
-			ereport(elevel,
-	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-	 errmsg("background worker \"%s\": must attach to shared memory in order to request a database connection",
-			worker->bgw_name)));
-			return false;
-		}
-
 		if (worker->bgw_start_time == BgWorkerStart_PostmasterStart)
 		{
 			ereport(elevel,
@@ -745,20 +752,6 @@ StartBackgroundWorker(void)
 	MyBackendType = B_BG_WORKER;
 	init_ps_display(worker->bgw_name);
 
-	/*
-	 * If we're not supposed to have shared memory access, then detach from
-	 * shared memory.  If we didn't request shared memory access, the
-	 * postmaster won't force a cluster-wide restart if we exit unexpectedly,
-	 * so we'd better make sure that we don't mess anything up that would
-	 * require that sort of cleanup.
-	 */
-	if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0)
-	{
-		ShutdownLatchSupport();
-		dsm_detach_all();
-		PGSharedMemoryDetach();
-	}
-
 	SetProcessingMode(InitProcessing);
 
 	/* Apply PostAuthDelay */
@@ -832,29 +825,19 @@ StartBackgroundWorker(void)
 	PG_exception_stack = _sigjmp_buf;
 
 	/*
-	 * If the background worker request shared memory access, set that up now;
-	 * else, detach all shared memory segments.
+	 * Create a per-backend PGPROC struct in shared memory, except in the
+	 * EXEC_BACKEND case where this was done in SubPostmasterMain. We must
+	 * do this before we can use LWLocks (and in the 

Re: .ready and .done files considered harmful

2021-08-05 Thread Kyotaro Horiguchi
At Thu, 5 Aug 2021 21:53:30 +0530, Dipesh Pandit  
wrote in 
> > I'm not sure. I think we need the value to be accurate during
> > recovery, so I'm not sure whether replayEndTLI would get us there.
> > Another approach might be to set ThisTimeLineID on standbys also.
> > Actually just taking a fast look at the code I'm not quite sure why
> > that isn't happening already. Do you have any understanding of that?
> 
> During investigation I found that the current timeline ID (ThisTimeLineID)
> gets updated in XLogCtl’s ThisTimeLineID once it gets finalised as part
> of archive recovery.
> 
> /*
>  * Write the timeline history file, and have it archived. After this
>  * point (or rather, as soon as the file is archived), the timeline
>  * will appear as "taken" in the WAL archive and to any standby
>  * servers.  If we crash before actually switching to the new
>  * timeline, standby servers will nevertheless think that we
> switched
>  * to the new timeline, and will try to connect to the new timeline.
>  * To minimize the window for that, try to do as little as possible
>  * between here and writing the end-of-recovery record.
>  */
> 
> In case of Standby this happens only when it gets promoted.
> 
> If Standby is in recovery mode then replayEndTLI points to the most
> recent TLI corresponding to the replayed records. Also, if replying a
> record causes timeline switch then replayEndTLI gets updated with
> the new timeline. As long as it is in recovery mode replayEndTLI should
> point to the current timeline ID on Standby. Thoughts?

As I mentioned in another branch of this thread, pgarch_readyXlog()
always goes into the fall back path at the first iteration of
pgarch_ArchiverCopyLoop() and the current (or expected) TLI is
informed there.  So no need of shared timeline ID at that time.

When pgarch_ArchiverCopyLoop meets a timeline switch, the short cut
path fails to find the next anticipated .ready file then goes into the
fallback path, which should find the history file for the next TLI
(unless any timing misalignment I'm not aware of happens).

So the shared timeline id works only to let the fast path give way to
the fall back path to find the just created history file as earlier as
possible.  Notifying the timeline ID that the startup process
recognizes to archiver makes thing more complex than requied.
Currently archiver doesn't use SIGINT, so I think we can use sigint
for the purpose.

Furthermore, it seems to me that we can make the TLI and the next
anticipated segment number function-local static variables.  It would
be workable assuming that the only caller pgarch_ArchiverCopyLoop
obeys the contract that it must call pgarch_readyXlog() until it
returns false.  However, there seems to be no reason for it not to
work even otherwise, unless I'm missing something (that's likely),
though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: .ready and .done files considered harmful

2021-08-05 Thread Bossart, Nathan
On 8/5/21, 6:26 PM, "Kyotaro Horiguchi"  wrote:
> It works the current way always at the first iteration of
> pgarch_ArchiveCopyLoop() becuse in the last iteration of
> pgarch_ArchiveCopyLoop(), pgarch_readyXlog() erases the last
> anticipated segment.  The shortcut works only when
> pgarch_ArchiveCopyLoop archives more than once successive segments at
> once.  If the anticipated next segment found to be missing a .ready
> file while archiving multiple files, pgarch_readyXLog falls back to
> the regular way.
>
> So I don't see the danger to happen perhaps you are considering.

I think my concern is that there's no guarantee that we will ever do
another directory scan.  A server that's generating a lot of WAL could
theoretically keep us in the next-anticipated-log code path
indefinitely.

> In the first place, .ready are added while holding WALWriteLock in
> XLogWrite, and while removing old segments after a checkpoint (which
> happens while recovery). Assuming that no one manually remove .ready
> files on an active server, the former is the sole place doing that. So
> I don't see a chance that .ready files are created out-of-order way.

Perhaps a more convincing example is when XLogArchiveNotify() fails.
AFAICT this can fail without ERROR-ing, in which case the server can
continue writing WAL and creating .ready files for later segments.  At
some point, the checkpointer process will call RemoveOldXlogFiles()
and try to create the missing .ready file.

Nathan



Re: .ready and .done files considered harmful

2021-08-05 Thread Kyotaro Horiguchi
At Tue, 3 Aug 2021 20:46:57 +, "Bossart, Nathan"  
wrote in 
> + /*
> +  * Perform a full directory scan to identify the next log segment. There
> +  * may be one of the following scenarios which may require us to 
> perform a
> +  * full directory scan.
> +  *
> +  * 1. This is the first cycle since archiver has started and there is no
> +  * idea about the next anticipated log segment.
> +  *
> +  * 2. There is a timeline switch, i.e. the timeline ID tracked at 
> archiver
> +  * does not match with current timeline ID. Archive history file as 
> part of
> +  * this timeline switch.
> +  *
> +  * 3. The next anticipated log segment is not available.
> 
> One benefit of the current implementation of pgarch_readyXlog() is
> that .ready files created out of order will be prioritized before
> segments with greater LSNs.  IIUC, with this patch, as long as there
> is a "next anticipated" segment available, the archiver won't go back
> and archive segments it missed.  I don't think the archive status
> files are regularly created out of order, but XLogArchiveCheckDone()
> has handling for that case, and the work to avoid creating .ready
> files too early [0] seems to make it more likely.  Perhaps we should
> also force a directory scan when we detect that we are creating a
> .ready file for a segment that is older than the "next anticipated"
> segment.
> 
> Nathan
> 
> [0] https://postgr.es/m/DA71434B-7340-4984-9B91-F085BC47A778%40amazon.com

It works the current way always at the first iteration of
pgarch_ArchiveCopyLoop() becuse in the last iteration of
pgarch_ArchiveCopyLoop(), pgarch_readyXlog() erases the last
anticipated segment.  The shortcut works only when
pgarch_ArchiveCopyLoop archives more than once successive segments at
once.  If the anticipated next segment found to be missing a .ready
file while archiving multiple files, pgarch_readyXLog falls back to
the regular way.

So I don't see the danger to happen perhaps you are considering.

In the first place, .ready are added while holding WALWriteLock in
XLogWrite, and while removing old segments after a checkpoint (which
happens while recovery). Assuming that no one manually remove .ready
files on an active server, the former is the sole place doing that. So
I don't see a chance that .ready files are created out-of-order way.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Assert triggered during RE_compile_and_cache

2021-08-05 Thread Tom Lane
I wrote:
> Hm.  I thought that this might be an old issue, but I'm not seeing the
> crash in pre-v14 branches.  That means it's some bug introduced in
> the performance improvements I did a few months ago.  Open item added.

git bisect says the trouble started with

ea1268f6301cc7adce571cc9c5ebe8d9342a2ef4 is the first bad commit
commit ea1268f6301cc7adce571cc9c5ebe8d9342a2ef4
Author: Tom Lane 
Date:   Sat Feb 20 19:26:41 2021 -0500

Avoid generating extra subre tree nodes for capturing parentheses.

I probably won't be able to dig deeper for a little while.

regards, tom lane




Re: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS

2021-08-05 Thread Tom Lane
Andres Freund  writes:
> First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to treat
> it as a required flag going forward.

+1

> The second question is what we want to do in the backbranches. I think the
> reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an
> error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used.

I think doing nothing is fine.  Given the lack of complaints, we're
more likely to break something than fix anything useful.

regards, tom lane




Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-05 Thread Masahiko Sawada
On Thu, Aug 5, 2021 at 11:40 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, August 5, 2021 1:09 PM Masahiko Sawada  
> wrote
> > I've reviewed v2 patch. Here are some comments:
> >
> > +   if (type == ALTER_SUBSCRIPTION_SET_PUBLICATION ||
> > +   type == ALTER_SUBSCRIPTION_REFRESH)
> > +   drop_table = !bsearch(, pubrel_local_oids,
> > + list_length(pubrel_names),
> > + sizeof(Oid), oid_cmp);
> > +   else if (type == ALTER_SUBSCRIPTION_DROP_PUBLICATION)
> > +   drop_table = bsearch(, pubrel_local_oids,
> > +list_length(pubrel_names),
> > +sizeof(Oid), oid_cmp);
> >
> > IIUC, in DROP PUBLICATION cases, we don't need to refer to the tables in the
> > publication on the publisher that we're dropping in order to check whether 
> > to
> > remove the relation. If we drop a table from the publication on the 
> > publisher
> > before dropping the publication from the subscription on the subscriber, the
> > pg_subscription_rel entry for the table remains. For example, please 
> > consider the
> > following case:
> >
> > On publisher and subscriber:
> > create table t1 (a int);
> > create table t2 (a int);
> > create table t3 (a int);
> >
> > On publisher:
> > create publication pub12 for table t1, t2; create publication pub3 for 
> > table t3;
> >
> > On subscriber:
> > create subscription sub connection 'dbname=postgres' publication pub12, 
> > pub3;
> >
> > On publisher:
> > alter publication pub12 drop table t2;
> >
> > On subscriber:
> > alter subscription sub drop publication pub12; select srsubid,
> > srrelid::regclass::text, srsubstate, srsublsn from pg_subscription_rel;
> >
> >  srsubid | srrelid | srsubstate | srsublsn
> > -+-++---
> >16393 | t3  | r  | 0/1707CE0
> >16393 | t2  | r  | 0/1707CE0
> >
> > With v2 patch, the pg_subscription_rel has the entry for 't2' but it should 
> > not
> > exist.
> >
> > But that doesn't mean that drop_table should always be true in DROP
> > PULIBCATION cases. Since the tables associated with two publications can
> > overlap, we cannot remove the relation from pg_subscription_rel if it is 
> > also
> > associated with the remaining publication. For example:
> >
> > On publisher and subscriber:
> > create table t1 (a int);
> > create table t2 (a int);
> > create table t3 (a int);
> >
> > On publisher:
> > create publication pub12 for table t1, t2; create publication pub13 for 
> > table t1, t3;
> >
> > On subscriber:
> > create subscription sub connection 'dbname=postgres' publication pub12,
> > pub13;
> >
> > On subscriber:
> > alter subscription sub drop publication pub12; select srsubid,
> > srrelid::regclass::text, srsubstate, srsublsn from pg_subscription_rel;  
> > srsubid |
> > srrelid | srsubstate | srsublsn
> > -+-++---
> >16393 | t3  | r  | 0/17080B0
> > (1 row)
> >
> > With v2 patch, the pg_subscription_rel has only one entry for t3 but it 
> > should
> > have also the one for t1 since it's still subscribing to pub13.
> >
> > To summary, I think that what we want to do in DROP SUBSCRIPTION cases is to
> > drop relations from pg_subscription_rel that are no longer included in the 
> > set of
> > after-calculated publications. In the latter example, when ALTER
> > SUBSCRIPTION ... DROP PUBLICATION pub12, we should compare the current
> > set of relations associated with {pub12, pub13} to the set of relcations 
> > associated
> > with pub13, not pub12. Then we can find out t2 is no longer associated with 
> > the
> > after-calculated publication (i.g., pub13) so remove it.
>
> Thanks for the review. You are right, I missed this point.
> Attach new version patch which fix these problems(Also add a new pl-test).
>

Thank you for updating the patch!

Here are some comments:

I think that it still could happen that DROP PULIBCATION misses
dropping relations. Please consider the following case:

On publisher and subscriber:
create table t1 (a int);
create table t2 (a int);
create table t3 (a int);

On publisher:
create publication pub12 for table t1, t2;
create publication pub3 for table t3;

On subscriber:
create subscription sub connection 'dbname=postgres' publication pub12, pub3;

On publisher:
alter publication pub3 add table t1;

On subscriber:
alter subscription sub drop publication pub12;
select srsubid, srrelid::regclass::text, srsubstate, srsublsn from
pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn
-+-++---
   16393 | t3  | r  | 0/1707CE0
   16393 | t1  | r  | 0/1707D18

With v3 patch, pg_subscription_rel has entries for t3 and t1. But if
ALTER SUBSCRIPTION ... DROP PUBLICATION pub12 refreshes only relations
associated with pub12, t1 should be removed and be added when 

Re: A qsort template

2021-08-05 Thread Peter Geoghegan
On Sun, Aug 1, 2021 at 5:41 PM Thomas Munro  wrote:
> On Fri, Jul 30, 2021 at 12:34 PM John Naylor
>  wrote:
> > I got around to getting a benchmark together to serve as a starting point. 
> > I based it off something I got from the archives, but don't remember where 
> > (I seem to remember Tomas Vondra wrote the original, but not sure). To 
> > start I just used types that were there already -- int, text, numeric. The 
> > latter two won't be helped by this patch, but I wanted to keep something 
> > like that so we can see what kind of noise variation there is. I'll 
> > probably cut text out in the future and just keep numeric for that purpose.
>
> Thanks, that's very useful.

If somebody wants to get a sense of what the size hit is from all of
these specializations, I can recommend the diff feature of bloaty:

https://github.com/google/bloaty/blob/master/doc/using.md#size-diffs

Obviously you'd approach this by building postgres without the patch,
and diffing that baseline to postgres with the patch. And possibly
variations of the patch, with less or more sort specializations.

-- 
Peter Geoghegan




Re: Assert triggered during RE_compile_and_cache

2021-08-05 Thread Tom Lane
Mark Dilger  writes:
> I have now found lots of cases of this failure.  I *believe* the 
> backreference is always greater than 1, and it is always in a capture group 
> which then has the {0} or {0,0} applied to it.

Hm.  I thought that this might be an old issue, but I'm not seeing the
crash in pre-v14 branches.  That means it's some bug introduced in
the performance improvements I did a few months ago.  Open item added.

regards, tom lane




Re: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS

2021-08-05 Thread Andres Freund
Hi,

On 2021-08-02 15:34:07 -0400, Alvaro Herrera wrote:
> Ah, that makes sense.  That doesn't sound super fragile, but it is odd
> and it's probably a good argument for removing the feature, particularly
> since nobody seems to be using it.

ISTM we concluded that we should remove unconnected workers. Writing a patch
to do so left me with two questions:

First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to treat
it as a required flag going forward. That way we don't silently start being
attached to shared memory in case somebody actually has a unattached
worker. And if we ever wanted to add the ability to have unattached workers
back, it'll also be easier this way.  Perhaps it also has a small amount of
signalling value reminding people that they need to be careful...

The second question is what we want to do in the backbranches. I think the
reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an
error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used.

Greetings,

Andres Freund




Re: Accidentally dropped constraints: bug?

2021-08-05 Thread Mark Dilger



> On Aug 5, 2021, at 12:35 AM, Simon Riggs  wrote:
> 
> Noting that all constraints have been removed, not just the ones
> wholly dependent on dropped columns.

I don't find this all that surprising.  If CHECK (a > 5 AND b IS NOT NULL AND c 
> 10) is really meant as three independent checks, it should be written that 
way.  However, for any row where c is null, the entire expression will either 
be null or false, and the check will fail precisely when (a > 5 AND b IS NOT 
NULL) is false.  So if you imagine the dropped column as a column of phantom 
null values, you'd expect the check to still reject rows where a <= 5 or b is 
null.

Is it reasonable to imagine the dropped column as implicitly full of nulls?  
That's how an added column works, but do we think about dropped columns that 
way?

In any event, the documentation is pretty clear about this:

> DROP COLUMN [ IF EXISTS ]
> This form drops a column from a table. Indexes and table constraints 
> involving the column will be automatically dropped as well.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Assert triggered during RE_compile_and_cache

2021-08-05 Thread Mark Dilger


> On Aug 5, 2021, at 3:15 PM, Tom Lane  wrote:
> 
> I don't immediately see what's different about your failing case
> versus the not-failing ones.

I have now found lots of cases of this failure.  I *believe* the backreference 
is always greater than 1, and it is always in a capture group which then has 
the {0} or {0,0} applied to it.

You can find lots of cases using the attached regex generating script I whipped 
up for testing your work.  (Note this is just a quick and dirty tool for 
hacking, not anything refined.)

#!/usr/bin/perl

use strict;
use warnings;

our @alphabet = ('a'..'z');

sub rand_num
{
	my $result = 0;
	$result++ while(int(rand(3)));
	return $result;
}

sub rand_char
{
	return $alphabet[int(rand(@alphabet))];
}

our @strings;
sub rand_string
{
	if (scalar(@strings))
	{
		my $dice = int(rand(3));
		return $strings[int(rand(@strings))] if ($dice == 0);
		shift(@strings) if ($dice == 1);
		pop(@strings) if ($dice == 2);
	}

	my $result = join('', map { rand_char() } (1..rand_num()));
	push (@strings, $result) if (int(rand(2)));

	return $result;
}

sub rand_long_string
{
	my $result = "";
	$result .= rand_string() while(int(rand(10)));

	return $result;
}

sub rand_quantifier
{
	my $dice = int(rand(12));

	return "*"if ($dice == 0);
	return "+"if ($dice == 1);
	return "?"if ($dice == 2);
	return "*?"   if ($dice == 3);
	return "+?"   if ($dice == 4);
	return "??"   if ($dice == 5);

	my $beg = rand_num();
	return "{$beg}"   if ($dice == 6);
	return "{$beg,}"  if ($dice == 7);
	return "{$beg}?"  if ($dice == 8);
	return "{$beg,}?" if ($dice == 9);

	my $end = rand_num() + $beg;
	return "{$beg,$end}"  if ($dice == 10);
	return "{$beg,$end}?" if ($dice == 11);
	return "";
}

sub rand_escape
{
	my $dice = int(rand(5));

	return '\\0'if ($dice == 0);
	return '\\' . rand_char()   if ($dice == 1);
	return '\\' . uc(rand_char())   if ($dice == 2);
	return '\\' . rand_string() if ($dice == 3);
	return '\\' . uc(rand_string()) if ($dice == 4);

	return "";
}

our $max_capture = 0;

sub rand_rgx
{
	my ($depth) = @_;

	$depth = 0 unless defined $depth;

	# Choose option, but limit the choice if we're in danger of deep recursion
	my $dice = int(rand($depth < 5 ? 100 : 20));

	# Base cases
	return "" if ($dice == 0);
	return rand_escape()  if ($dice == 2);
	return rand_char()if ($dice < 5);
	if ($dice < 10 && $max_capture)
	{
		my $capgroup = 1 + int(rand($max_capture));
		return '\\' . $capgroup;
	}
	return "."if ($dice < 20);

	# Recursive cases
	return '[' . rand_escape() . ']'  if ($dice == 20);
	return '[^' . rand_escape() . ']' if ($dice == 21);
	return '[' . rand_string() . ']'  if ($dice == 22);
	return '[^' . rand_string() . ']' if ($dice == 23);
	if ($dice < 60)
	{
		my $result = '(' . rand_rgx($depth+1) . ')';
		$max_capture++;
		return $result;
	}
	return '(?:' . rand_rgx($depth+1) . ')'   if ($dice < 70);
	return '(?=' . rand_rgx($depth+1) . ')'   if ($dice == 71);
	return '(?!' . rand_rgx($depth+1) . ')'   if ($dice == 72);
	return '(?<=' . rand_rgx($depth+1) . ')'  if ($dice == 73);
	return '(?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Assert triggered during RE_compile_and_cache

2021-08-05 Thread Tom Lane
Mark Dilger  writes:
> +select '' ~ '(())(\2){0}';
> +server closed the connection unexpectedly

> Any ideas?

Huh.  This seems like some deficiency in the part of parseqatom
starting with

/* annoying special case:  {0} or {0,0} cancels everything */
if (m == 0 && n == 0)

but I don't immediately see what's different about your failing case
versus the not-failing ones.

regards, tom lane




Re: Another regexp performance improvement: skip useless paren-captures

2021-08-05 Thread Mark Dilger



> On Aug 5, 2021, at 7:36 AM, Tom Lane  wrote:
> 
> Probably should add more cases...

The patch triggers an assertion that master does not:

+select 'azrlfkjbjgidgryryiglcabkgqluflu' !~ '(.(.)((.)))((?:(\3)))';
+server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+connection to server was lost

The relevant portion of the stack trace:

frame #3: 0x0001043bcf6d 
postgres`ExceptionalCondition(conditionName=, 
errorType=, fileName=, lineNumber=) at 
assert.c:69:2 [opt]
frame #4: 0x00010410168b postgres`cdissect(v=0x7ffeebdd2ad8, 
t=0x7f863cd055b0, begin=0x7f863d821528, end=0x7f863d82152c) at 
regexec.c:767:4 [opt]
frame #5: 0x00010410129b postgres`cdissect [inlined] 
ccondissect(v=, t=, begin=0x7f863d821524, 
end=) at regexec.c:835:10 [opt]
frame #6: 0x00010410123d postgres`cdissect(v=0x7ffeebdd2ad8, 
t=0x7f863cd05430, begin=0x7f863d821524, end=0x7f863d82152c) at 
regexec.c:752 [opt]
frame #7: 0x00010410129b postgres`cdissect [inlined] 
ccondissect(v=, t=, begin=0x7f863d821520, 
end=) at regexec.c:835:10 [opt]
frame #8: 0x00010410123d postgres`cdissect(v=0x7ffeebdd2ad8, 
t=0x7f863cd050f0, begin=0x7f863d821520, end=0x7f863d82152c) at 
regexec.c:752 [opt]
frame #9: 0x000104101282 postgres`cdissect [inlined] 
ccondissect(v=, t=, begin=0x7f863d821520, 
end=) at regexec.c:832:9 [opt]
frame #10: 0x00010410123d postgres`cdissect(v=0x7ffeebdd2ad8, 
t=0x7f863cd04ff0, begin=0x7f863d821520, end=0x7f863d821530) at 
regexec.c:752 [opt]
frame #11: 0x0001040ff508 postgres`pg_regexec [inlined] 
cfindloop(v=, cnfa=, cm=, 
d=0x7ffeebdd6d68, s=0x7ffeebdd2b48, coldp=) at 
regexec.c:600:10 [opt]
frame #12: 0x0001040ff36b postgres`pg_regexec [inlined] 
cfind(v=0x00010459c5f8, cnfa=, cm=) at 
regexec.c:515 [opt]
frame #13: 0x0001040ff315 postgres`pg_regexec(re=, 
string=, len=140732855577960, search_start=, 
details=, nmatch=0, pmatch=0x, flags=0) at 
regexec.c:293 [opt]
frame #14: 0x000104244d61 postgres`RE_wchar_execute(re=, 
data=, data_len=, start_search=, 
nmatch=, pmatch=) at regexp.c:274:19 [opt]
frame #15: 0x000104242c80 postgres`textregexne [inlined] 
RE_execute(dat=, dat_len=31, nmatch=0, pmatch=0x) 
at regexp.c:322:10 [opt]
frame #16: 0x000104242c50 postgres`textregexne [inlined] 
RE_compile_and_execute(text_re=, dat=, dat_len=31, 
cflags=19, collation=, nmatch=0, pmatch=) at 
regexp.c:357 [opt]

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Assert triggered during RE_compile_and_cache

2021-08-05 Thread Mark Dilger



> On Aug 5, 2021, at 1:38 PM, Mark Dilger  wrote:
> 
> +select 'vyrlvyrlwvqko' ~ '(?:(?:((.((\2)\1.){0,0}?';

I've boiled it down a bit more:

+select '' ~ '()\1{0}';
+ ?column? 
+--
+ t
+(1 row)
+
+select '' ~ '()(\1){0}';
+ ?column? 
+--
+ t
+(1 row)
+
+select '' ~ '(())\2{0}';
+ ?column? 
+--
+ t
+(1 row)
+
+select '' ~ '(())(\2){0}';
+server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+connection to server was lost

Any ideas?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Assert triggered during RE_compile_and_cache

2021-08-05 Thread Mark Dilger
Tom,

while testing your patch in [1], I stumbled upon the following assert in 
master, without your patch applied:

+select 'vyrlvyrlwvqko' ~ '(?:(?:((.((\2)\1.){0,0}?';
+server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+connection to server was lost


The assert itself dates back to a commit of yours from 2003.  I don't think you 
did anything wrong here, but I wonder if you might understand the code better 
than anyone else?

The relevant bit of the stack trace is:

frame #3: 0x0001020eff7d 
postgres`ExceptionalCondition(conditionName=, 
errorType=, fileName=, lineNumber=) at 
assert.c:69:2 [opt]
frame #4: 0x000101e2c682 postgres`delsub(nfa=0x7fc7e8407360, 
lp=0x7fc7e8831010, rp=0x7fc7e8827328) at regc_nfa.c:1265:2 [opt]
frame #5: 0x000101e28dd8 postgres`parsebranch [inlined] 
parseqatom(v=, stopper=101, type=112, lp=, 
rp=0x7fc7e8827328, top=) at regcomp.c:1084:3 [opt]
frame #6: 0x000101e27de9 postgres`parsebranch(v=0x7ffeee0a80b8, 
stopper=101, type=, left=, right=, 
partial=) at regcomp.c:759 [opt]
frame #7: 0x000101e29719 postgres`parsebranch [inlined] 
parseqatom(v=, stopper=101, type=112, lp=, 
rp=0x7fc7e8827328, top=) at regcomp.c:1282:23 [opt]
frame #8: 0x000101e29639 postgres`parsebranch(v=0x7ffeee0a80b8, 
stopper=101, type=, left=, right=, 
partial=) at regcomp.c:759 [opt]
frame #9: 0x000101e2457b postgres`parse(v=0x7ffeee0a80b8, 
stopper=101, type=112, init=0x7fc7e8827280, final=0x7fc7e88272b8) at 
regcomp.c:691:12 [opt]
frame #10: 0x000101e234d2 postgres`pg_regcomp(re=0x7ffeee0a8208, 
string=, len=, flags=, 
collation=) at regcomp.c:418:12 [opt]
frame #11: 0x000101f75790 
postgres`RE_compile_and_cache(text_re=, cflags=, 
collation=100) at regexp.c:186:19 [opt]
frame #12: 0x000101f75b90 postgres`textregexeq [inlined] 
RE_compile_and_execute(text_re=, dat=, dat_len=13, 
cflags=3, collation=, nmatch=0, pmatch=) at 
regexp.c:351:7 [opt]

[1] https://www.postgresql.org/message-id/2219936.1628115334%40sss.pgh.pa.us

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: very long record lines in expanded psql output

2021-08-05 Thread Andrew Dunstan


On 8/5/21 12:48 PM, Platon Pronko wrote:
> Hi!
 I also find this annoying and would be happy to be rid of it.
>>>
>>> Have you tried "\pset format wrapped"? Pavel suggested it, and it
>>> solved most of the problem for me, for example.
>>
>> Yes, but it changes the data line output. Ideally, you should be able
>> to  modify these independently.
>
> I agree, and I think this can be implemented, but I'm a bit afraid of
> introducing an additional psql option (there's already quite a lot of
> them).
> I suspect primary PostgreSQL maintainers won't be happy with such an
> approach.
>
>

I think I qualify as one of those ... :-)


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: very long record lines in expanded psql output

2021-08-05 Thread Andrew Dunstan


On 8/5/21 2:25 PM, Pavel Stehule wrote:
>
>
> čt 5. 8. 2021 v 18:48 odesílatel Platon Pronko
> mailto:platon7pro...@gmail.com>> napsal:
>
> Hi!
> >>> I also find this annoying and would be happy to be rid of it.
> >>
> >> Have you tried "\pset format wrapped"? Pavel suggested it, and it
> >> solved most of the problem for me, for example.
> >
> > Yes, but it changes the data line output. Ideally, you should be
> able
> > to  modify these independently.
>
> I agree, and I think this can be implemented, but I'm a bit afraid of
> introducing an additional psql option (there's already quite a lot
> of them).
> I suspect primary PostgreSQL maintainers won't be happy with such
> an approach.
>
>
> it can be a fully new format - designed for simple copy from terminal.
> Some like
>
> == record: 10 ==
>  proname 
> left
>  prosrc  
> $lalalal
> ewqrwqerw
> ewqrwqerqrewq
> $
> ===
>
>

I think that's really a different idea. The original proposal was simply
to deal with insanely long header lines. This seems like massive scope
creep.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: RFC: Improve CPU cache locality of syscache searches

2021-08-05 Thread Andres Freund
Hi,

On 2021-08-05 12:27:49 -0400, John Naylor wrote:
> On Wed, Aug 4, 2021 at 3:44 PM Andres Freund  wrote:
> > On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> > > typedef struct cc_bucket
> > > {
> > >   uint32 hashes[4];
> > >   catctup *ct[4];
> > >   dlist_head;
> > > };
> >
> > I'm not convinced that the above the right idea though. Even if the hash
> > matches, you're still going to need to fetch at least catctup->keys[0]
> from
> > a separate cacheline to be able to return the cache entry.
> 
> I see your point. It doesn't make sense to inline only part of the
> information needed.

At least not for the unconditionally needed information.


> Although I'm guessing inlining just two values in the 4-key case wouldn't
> buy much.

Not so sure about that. I'd guess that two key comparisons take more cycles
than a cacheline fetch the further keys (perhaps not if we had inlined key
comparisons). I.e. I'd expect out-of-order + speculative execution to hide the
latency for fetching the second cacheline for later key values.


> > If we stuffed four values into one bucket we could potentially SIMD the
> hash
> > and Datum comparisons ;)
> 
> ;-) That's an interesting future direction to consider when we support
> building with x86-64-v2. It'd be pretty easy to compare a vector of hashes
> and quickly get the array index for the key comparisons (ignoring for the
> moment how to handle the rare case of multiple identical hashes).
> However, we currently don't memcmp() the Datums and instead call an
> "eqfast" function, so I don't see how that part  would work in a vector
> setting.

It definitely couldn't work unconditionally - we have to deal with text,
oidvector, comparisons after all. But we could use it for the other
types. However, looking at the syscaches, I think it'd not very often be
applicable for caches with enough columns.


I have wondered before whether we should have syscache definitions generate
code specific to each syscache definition. I do think that'd give a good bit
of performance boost. But I don't see a trivial way to get there without
notational overhead.

We could define syscaches in a header using a macro that's defined differently
in syscache.c than everywhere else. The header would declare a set of
functions for each syscache, syscache.c would define them to call an
always_inline function with the relevant constants.

Or perhaps we should move syscache definitions into the pg_*.h headers, and
generate the relevant code as part of their processing. That seems like it
could be nice from a modularity POV alone. And it could do better than the
current approach, because we could hardcode the types for columns in the
syscache definition without increasing notational overhead.

Greetings,

Andres Freund




Re: Accidentally dropped constraints: bug?

2021-08-05 Thread Bruce Momjian
On Thu, Aug  5, 2021 at 08:35:46AM +0100, Simon Riggs wrote:
> If we drop a column we cascade that drop to all indexes and all
> constraints that mention that column, even if they include other
> columns also. We might expect that indexes should be dropped
> automatically, but the latter behavior for constraints seems like a
> bug, since it can silently remove constraints that might still be
> valid without the dropped column. (Example below). This is even more
> surprising if the user specifies RESTRICT explicitly. I note that this
> is acting as documented, it's just the docs don't explain the full
> implications, so I'm guessing we didn't think about this before.

Wow, I certainly never thought of it or heard anyone complain about it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: straightening out backend process startup

2021-08-05 Thread Andres Freund
Hi,

On 2021-08-03 16:50:24 +0900, Kyotaro Horiguchi wrote:
> At Mon, 2 Aug 2021 09:41:24 -0700, Andres Freund  wrote in
> > - PostgresMainSingle() should probably not be in postgres.c. We could put it
> >   into postinit.c or ..?
>
> PostgresMainSingle() looks like the single-process version of
> PostgresMain so it is natural that they are placed together in
> postgres.c.  If PostgresMainSingle is constructed as initializing
> standalone first then calling PostgresMain, it might be right that
> PostgresMain calls the initialization function resides in postinit.c
> if !IsUnderPostmaster.
>
> PostgresMain()
> {
>   if (!IsUnderPostmaster)
>   InitSinglePostgres(argv[0]);
>   ...

I find passing argc/argv to functions that don't actually need them, like
PostgresMain during normal operation, confusing. Especially when the argc/argv
values are just manufactured stuff like in the case of PostgresMain(). So I
think it's better to have have the order work the other way round.


> > - My first attempt at PostgresMainSingle() separated the single/multi user
> >   cases a bit more than the code right now, by having a PostgresMainCommon()
> >   which was called by PostgresMainSingle(), PostgresMain(). *Common only
> >   started with the MessageContext allocation, which did have the advantage 
> > of
> >   splitting out a few of the remaining conditional actions in PostgresMain()
> >   (PostmasterContext, welcome banner, Log_disconnections). But lead to a bit
> >   more duplication. I don't really have an opinion on what's better.
>
> I'm not sure how it looked like, but isn't it reasonable that quickdie
> and log_disconnections(). handle IsUnderPostmaster instead?  Or for
> log_disconnections, Log_disconnections should be turned off at
> standalone-initialization?

I was wondering about log_disconnections too. The conditional addition of the
callback is all that forces log_disconnections to be PGC_SU_BACKEND rather
than PGC_SUSET too. So I agree that moving a check for Log_disconnections and
IsUnderPostmaster into log_disconnections is a good idea.

I don't understand your reference to quickdie() though?


> > - I had to move the PgStartTime computation to a bit earlier for single user
> >   mode. That seems to make sense to me anyway, given that postmaster does so
> >   fairly early too.
> >
> >   Any reason that'd be a bad idea?
> >
> >   Arguably it should even be a tad earlier to be symmetric.
>
> Why don't you move the code for multiuser as earlier as standalone does?

I think it's the other way round - right now the standalone case is much later
than the multiuser case. Postmaster determines PgStartTime after creating
shared memory, just before starting checkpointer / startup process - whereas
single user mode in HEAD does it just before accepting input for the first
time.

Greetings,

Andres Freund




Re: very long record lines in expanded psql output

2021-08-05 Thread Daniel Verite
Platon Pronko wrote:

> Maybe we can avoid making the header line longer than terminal width
> for \pset border 0 and \pset border 1? We already have terminal
> width calculated. Please see attached a patch with the proposed
> implementation.

+1 for doing something against these long lines.

Rather than padding up to the terminal's width, we could simply end
the line after the "-[RECORD N]-" marker, with the idea that the
rest of it does not add much to readability anyway. 
And when writing into a file as opposed to a terminal, getting rid of
these lines is useful too.

In that case, your example could be displayed like this:

=> \pset expanded on
=> \pset border 1

=> select n, repeat('x', n) as long_column_name from unnest(array[42,210])
 as n;

-[ RECORD 1 ]-
n| 42
long_column_name | xx
-[ RECORD 2 ]-
n| 210
long_column_name |
xx

=> \pset border 0
=> select n, repeat('x', n) as long_column_name from unnest(array[42,210])
 as n;

* Record 1
n42
long_column_name xx
* Record 2
n210
long_column_name
xx



Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: straightening out backend process startup

2021-08-05 Thread Andres Freund
Hi,

Thanks Robert, Horiguchi-san for looking.

On 2021-08-04 16:34:52 -0400, Robert Haas wrote:
> On Mon, Aug 2, 2021 at 12:41 PM Andres Freund  wrote:
> > - AuxiliaryProcessMain() is used for two independent tasks: Start bootstrap 
> > /
> >   checker mode and starting auxiliary processes. In HEAD there's maybe 5 
> > lines
> >   out 250 that are actually common to both uses.
> >
> >   A related oddity is that we reserve shared memory resources for bootstrap 
> > &
> >   checker aux processes, despite those never existing.
> >
> >   This is addressed in patches 1-7
>
> This all looks pretty mechanical and, I would guess, not very controversial.

Pushed these patches.


> > - The order of invocation of InitProcess()/InitAuxiliaryProcess() and
> >   BaseInit() depends on EXEC_BACKEND. Due to that there often is no single
> >   place initialization code can be put if it needs any locks.
> >
> >   This is addressed in patches 8-9
> >
> > - PostgresMain() has code for single user and multi user interleaved, making
> >   it unnecessarily hard to understand what's going on.
> >
> >   This is addressed in patches 10
>
> This stuff I'd need to study more in order to have an intelligent opinion.

Unless somebody protests soon I plan to push at least the
"process startup: Always call Init[Auxiliary]Process() before BaseInit()."
portion, as the inconsistent order between EXEC_BACKEND/!EB is making life
hard for me in other patches.

I don't have a dependency on
"process startup: Split single user code out of PostgresMain()."
but it does make the code a good bit more readable imo...

Greetings,

Andres Freund




Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
čt 5. 8. 2021 v 20:56 odesílatel Platon Pronko 
napsal:

> > it can be a fully new format - designed for simple copy from terminal.
> Some
> > like
> >
> > == record: 10 ==
> >  proname 
> > left
> >  prosrc  
> > $lalalal
> > ewqrwqerw
> > ewqrwqerqrewq
> > $
> > ===
>
> > no, it was proposed for psql
>
> How is this better than "copy (...) to stdout"? E.g.:
>
> $ copy (select * from pg_class limit 1) to stdout;
> 2619pg_statistic11  12016   0   10  2   26190
>  18  402 18  2840t   f   p   r   31
>   0   f   f   f   f   f   t   n   f0
> 478 1   {postgres=arwdDxt/postgres} \N  \N
>
> You can still copy the necessary parts (using mouse selection) -
> seems that it achieves the the same goal.
>

I think copy format can be pretty unclean when the line will be longer with
multiline values or with a lot of values.  When the length of the line will
be longer than the terminal width, then searching the wanted column can be
pretty difficult.

When you know, so any value is on separate line or lines, then selection is
just primitive, and there is good enough information to find wanted data
quickly.

For this case I am working with hypothetical longer records with possible
multiline fields - just something that does chaos on the terminal, and you
want to select some part of data by mouse from the terminal screen.

Regards

Pavel





> Best regards,
> Platon Pronko
>


Re: very long record lines in expanded psql output

2021-08-05 Thread Platon Pronko

it can be a fully new format - designed for simple copy from terminal. Some
like

== record: 10 ==
 proname 
left
 prosrc  
$lalalal
ewqrwqerw
ewqrwqerqrewq
$
===



no, it was proposed for psql


How is this better than "copy (...) to stdout"? E.g.:

$ copy (select * from pg_class limit 1) to stdout;
2619pg_statistic11  12016   0   10  2   26190   
18  402 18  2840t   f   p   r   31  0   
f   f   f   f   f   t   n   f0  478 1   
{postgres=arwdDxt/postgres} \N  \N

You can still copy the necessary parts (using mouse selection) -
seems that it achieves the the same goal.

Best regards,
Platon Pronko




Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
čt 5. 8. 2021 v 20:48 odesílatel Platon Pronko 
napsal:

> Hi!
>
> >> I don't think that would be a good idea. If somebody really just needs
> to
> >> copy,
> >> then wrapping the query in "copy (...) to stdout" already works nicely,
> >> no need to create a special mode just for that.
> >>
> >
> > It is working well, but it is copy to file, not copy to clipboard.
>
> Maybe I misunderstood - are you suggesting this a special mode in pspg? I
> thought
> that you were suggesting to implement a mode in psql, something like
> "\pset format_for_clipboard".
>

no, it was proposed for psql


> >> I think the question was more about being able to copy in an ad-hoc way,
> >> in the middle of scrolling trough "normal" output.
> >
> > I understand your motivation, but I don't feel a significant benefit to
> > increase the complexity of the main format (more, when it breaks pspg,
> > without possibility to fix it). but it can depend on usual data, and it
> is
> > true, so when I use pspg, I don't use extended mode too often.
>
> To clarify: I'm not suggesting my patch anymore - it definitely breaks
> pspg and this
> is completely unacceptable.
> We're just discussing the possible alternate solutions, I think.
>

yes

Pavel


> Best regards,
> Platon Pronko
>


Re: very long record lines in expanded psql output

2021-08-05 Thread Platon Pronko

Hi!


I don't think that would be a good idea. If somebody really just needs to
copy,
then wrapping the query in "copy (...) to stdout" already works nicely,
no need to create a special mode just for that.



It is working well, but it is copy to file, not copy to clipboard.


Maybe I misunderstood - are you suggesting this a special mode in pspg? I 
thought
that you were suggesting to implement a mode in psql, something like
"\pset format_for_clipboard".
 

I think the question was more about being able to copy in an ad-hoc way,
in the middle of scrolling trough "normal" output.


I understand your motivation, but I don't feel a significant benefit to
increase the complexity of the main format (more, when it breaks pspg,
without possibility to fix it). but it can depend on usual data, and it is
true, so when I use pspg, I don't use extended mode too often.


To clarify: I'm not suggesting my patch anymore - it definitely breaks pspg and 
this
is completely unacceptable.
We're just discussing the possible alternate solutions, I think.

Best regards,
Platon Pronko




Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
čt 5. 8. 2021 v 20:30 odesílatel Platon Pronko 
napsal:

> Hi!
>
> > I also find this annoying and would be happy to be rid of it.
> 
>  Have you tried "\pset format wrapped"? Pavel suggested it, and it
>  solved most of the problem for me, for example.
> >>>
> >>> Yes, but it changes the data line output. Ideally, you should be able
> >>> to  modify these independently.
> >>
> >> I agree, and I think this can be implemented, but I'm a bit afraid of
> >> introducing an additional psql option (there's already quite a lot of
> >> them).
> >> I suspect primary PostgreSQL maintainers won't be happy with such an
> >> approach.
> >>
> >
> > it can be a fully new format - designed for simple copy from terminal.
> Some
> > like
> >
> > == record: 10 ==
> >  proname 
> > left
> >  prosrc  
> > $lalalal
> > ewqrwqerw
> > ewqrwqerqrewq
> > $
> > ===
>
> I don't think that would be a good idea. If somebody really just needs to
> copy,
> then wrapping the query in "copy (...) to stdout" already works nicely,
> no need to create a special mode just for that.
>

It is working well, but it is copy to file, not copy to clipboard.

I think the question was more about being able to copy in an ad-hoc way,
> in the middle of scrolling trough "normal" output.
>

I understand your motivation, but I don't feel a significant benefit to
increase the complexity of the main format (more, when it breaks pspg,
without possibility to fix it). but it can depend on usual data, and it is
true, so when I use pspg, I don't use extended mode too often.


> Best regards,
> Platon Pronko
>


Re: very long record lines in expanded psql output

2021-08-05 Thread Platon Pronko

Hi!


I also find this annoying and would be happy to be rid of it.


Have you tried "\pset format wrapped"? Pavel suggested it, and it
solved most of the problem for me, for example.


Yes, but it changes the data line output. Ideally, you should be able
to  modify these independently.


I agree, and I think this can be implemented, but I'm a bit afraid of
introducing an additional psql option (there's already quite a lot of
them).
I suspect primary PostgreSQL maintainers won't be happy with such an
approach.



it can be a fully new format - designed for simple copy from terminal. Some
like

== record: 10 ==
 proname 
left
 prosrc  
$lalalal
ewqrwqerw
ewqrwqerqrewq
$
===


I don't think that would be a good idea. If somebody really just needs to copy,
then wrapping the query in "copy (...) to stdout" already works nicely,
no need to create a special mode just for that.
I think the question was more about being able to copy in an ad-hoc way,
in the middle of scrolling trough "normal" output.

Best regards,
Platon Pronko




Re: Numeric x^y for negative x

2021-08-05 Thread Dean Rasheed
On Thu, 5 Aug 2021 at 17:04, Tom Lane  wrote:
>
> It looks like castoroides is not happy with this patch:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=castoroides=2021-08-01%2008%3A52%3A43
>
> Maybe there's some hidden C99 dependency in what you did?
> Although pademelon, which is one of our other pre-C99
> dinosaurs, doesn't seem to be unhappy.
>

Hmm, there's something very weird going on there. The 0.99 ^
70 test, for example, is one that would have thrown an
overflow error before, but it's not doing that.

So somehow, when it hits the overflow/underflow test, Abs(val) is not
greater than NUMERIC_MAX_RESULT_SCALE * 3.01, which is 6020. The thing
is, when I step through it, I get val = -7000, which should trigger
that comfortably. Even if I play with the return value from
estimate_ln_dweight(), which relies on some double precision
arithmetic, making it -11 or -9 instead of -10, I still get val =
-7000. And even if I force val to be -6000, or even val = 0, so that
it doesn't trigger the overflow/underflow test, it still returns zero
in the end. The end result in this case just isn't very sensitive to
changes in these values.

So I'm wondering if it's somehow not even getting that far. Maybe if
the earlier test to see if exp can be represented as an integer is
failing, it might be going through power_var_int() instead, which
would explain it returning a non-zero value. That hypothesis would be
easy to test, by changing the test to 0.99 ^ 70.5.

In any case, it would be interesting to see what castoroides returns for

select 0.99 ^ 70;

and

select 0.99 ^ 70.5;

Regards,
Dean




Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
čt 5. 8. 2021 v 18:48 odesílatel Platon Pronko 
napsal:

> Hi!
> >>> I also find this annoying and would be happy to be rid of it.
> >>
> >> Have you tried "\pset format wrapped"? Pavel suggested it, and it
> >> solved most of the problem for me, for example.
> >
> > Yes, but it changes the data line output. Ideally, you should be able
> > to  modify these independently.
>
> I agree, and I think this can be implemented, but I'm a bit afraid of
> introducing an additional psql option (there's already quite a lot of
> them).
> I suspect primary PostgreSQL maintainers won't be happy with such an
> approach.
>

it can be a fully new format - designed for simple copy from terminal. Some
like

== record: 10 ==
 proname 
left
 prosrc  
$lalalal
ewqrwqerw
ewqrwqerqrewq
$
===

Regards

Pavel


> Best regards,
> Platon Pronko
>


Re: very long record lines in expanded psql output

2021-08-05 Thread Platon Pronko

Hi!

I also find this annoying and would be happy to be rid of it.


Have you tried "\pset format wrapped"? Pavel suggested it, and it
solved most of the problem for me, for example.


Yes, but it changes the data line output. Ideally, you should be able
to  modify these independently.


I agree, and I think this can be implemented, but I'm a bit afraid of
introducing an additional psql option (there's already quite a lot of them).
I suspect primary PostgreSQL maintainers won't be happy with such an approach.

Best regards,
Platon Pronko




Re: RFC: Improve CPU cache locality of syscache searches

2021-08-05 Thread John Naylor
On Wed, Aug 4, 2021 at 3:44 PM Andres Freund  wrote:
> On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> > typedef struct cc_bucket
> > {
> >   uint32 hashes[4];
> >   catctup *ct[4];
> >   dlist_head;
> > };
>
> I'm not convinced that the above the right idea though. Even if the hash
> matches, you're still going to need to fetch at least catctup->keys[0]
from
> a separate cacheline to be able to return the cache entry.

I see your point. It doesn't make sense to inline only part of the
information needed.

> struct cc_bucket_1
> {
> uint32 hashes[3]; // 12
> // 4 bytes alignment padding
> Datum key0s[3]; // 24
> catctup *ct[3]; // 24
> // cacheline boundary
> dlist_head conflicts; // 16
> };
>
> would be better for 1 key values?
>
> It's obviously annoying to need different bucket types for different key
> counts, but given how much 3 unused key Datums waste, it seems worth
paying
> for?

Yeah, it's annoying, but it does make a big difference to keep out unused
Datums:

keys  cachelines
  3 values  4 values

1 1 1/4 1 1/2
2 1 5/8 2
3 2 2 1/2
4 2 3/8 3

Or, looking at it another way, limiting the bucket size to 2 cachelines, we
can fit:

keys  values
1 5
2 4
3 3
4 2

Although I'm guessing inlining just two values in the 4-key case wouldn't
buy much.

> If we stuffed four values into one bucket we could potentially SIMD the
hash
> and Datum comparisons ;)

;-) That's an interesting future direction to consider when we support
building with x86-64-v2. It'd be pretty easy to compare a vector of hashes
and quickly get the array index for the key comparisons (ignoring for the
moment how to handle the rare case of multiple identical hashes).
However, we currently don't memcmp() the Datums and instead call an
"eqfast" function, so I don't see how that part  would work in a vector
setting.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: very long record lines in expanded psql output

2021-08-05 Thread Andrew Dunstan


On 8/5/21 12:04 PM, Platon Pronko wrote:
> Hi!
>
>> I also find this annoying and would be happy to be rid of it.
>
> Have you tried "\pset format wrapped"? Pavel suggested it, and it
> solved most of the problem for me, for example.


Yes, but it changes the data line output. Ideally, you should be able
to  modify these independently.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: .ready and .done files considered harmful

2021-08-05 Thread Dipesh Pandit
> I'm not sure. I think we need the value to be accurate during
> recovery, so I'm not sure whether replayEndTLI would get us there.
> Another approach might be to set ThisTimeLineID on standbys also.
> Actually just taking a fast look at the code I'm not quite sure why
> that isn't happening already. Do you have any understanding of that?

During investigation I found that the current timeline ID (ThisTimeLineID)
gets updated in XLogCtl’s ThisTimeLineID once it gets finalised as part
of archive recovery.

/*
 * Write the timeline history file, and have it archived. After this
 * point (or rather, as soon as the file is archived), the timeline
 * will appear as "taken" in the WAL archive and to any standby
 * servers.  If we crash before actually switching to the new
 * timeline, standby servers will nevertheless think that we
switched
 * to the new timeline, and will try to connect to the new timeline.
 * To minimize the window for that, try to do as little as possible
 * between here and writing the end-of-recovery record.
 */

In case of Standby this happens only when it gets promoted.

If Standby is in recovery mode then replayEndTLI points to the most
recent TLI corresponding to the replayed records. Also, if replying a
record causes timeline switch then replayEndTLI gets updated with
the new timeline. As long as it is in recovery mode replayEndTLI should
point to the current timeline ID on Standby. Thoughts?

Thanks,
Dipesh


Re: Lowering the ever-growing heap->pd_lower

2021-08-05 Thread Peter Geoghegan
On Thu, Aug 5, 2021 at 6:28 AM Simon Riggs  wrote:
> Hmm, there is no information in WAL to describe the line pointers
> being truncated by PageTruncateLinePointerArray(). We just truncate
> every time we see a XLOG_HEAP2_VACUUM record and presume it does the
> same thing as the original change.
>
> If that is safe, then we don't need to put the truncation on a WAL
> record at all, we just truncate after every XLOG_HEAP2_PRUNE record.

I agree that that's how we'd do it. This approach is no different to
assuming that PageRepairFragmentation() reliably produces a final
defragmented page image deterministically when called after we prune.

These days we automatically verify assumptions like this via
wal_consistency_checking. It would absolutely be able to catch any
bugs in PageTruncateLinePointerArray(), since the truncate code path
has plenty of coverage within the regression tests.

-- 
Peter Geoghegan




Re: Numeric x^y for negative x

2021-08-05 Thread Tom Lane
Dean Rasheed  writes:
> On Thu, 22 Jul 2021 at 16:19, Dean Rasheed  wrote:
>> Thanks for looking. Barring any further comments, I'll push this in a few 
>> days.

> So I have been testing this a lot over the last couple of days, and I
> have concluded that the patch works well as far as it goes, but I did
> manage to construct another case where numeric_power() loses
> precision. I think, though, that it would be better to tackle that as
> a separate patch.

It looks like castoroides is not happy with this patch:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=castoroides=2021-08-01%2008%3A52%3A43

Maybe there's some hidden C99 dependency in what you did?
Although pademelon, which is one of our other pre-C99
dinosaurs, doesn't seem to be unhappy.

regards, tom lane




Re: very long record lines in expanded psql output

2021-08-05 Thread Platon Pronko

Hi!


I also find this annoying and would be happy to be rid of it.


Have you tried "\pset format wrapped"? Pavel suggested it, and it solved most 
of the problem for me, for example.

Best regards,
Platon Pronko




Re: very long record lines in expanded psql output

2021-08-05 Thread Andrew Dunstan


On 8/5/21 6:56 AM, Pavel Stehule wrote:
> Hi
>
> čt 5. 8. 2021 v 12:36 odesílatel Platon Pronko
> mailto:platon7pro...@gmail.com>> napsal:
>
> In expanded mode psql calculates the width of the longest field in
> all the output rows,
> and prints the header line according to it. This often results in
> record header wrapping
> over several lines (see example below).
>
> This huge record header is printed the same way before each
> record, even if all the fields
> in current record are small and fit into terminal width. This
> often leads to situations
> when record header occupies the entirety of the screen, for all
> records, even though there are
> only a few records with huge fields in a record set.
>
> Maybe we can avoid making the header line longer than terminal
> width for \pset border 0
> and \pset border 1? We already have terminal width calculated.
> Please see attached a patch
> with the proposed implementation.
>
> Example output before the modification, in a terminal with a
> 100-column width:
>
> $ psql template1 -c "\x on" -c "\pset border 1;" -c "select n,
> repeat('x', n) as long_column_name from unnest(array[42,210]) as n"
> Expanded display is on.
> Border style is 1.
> ─[ RECORD 1
> 
> ]┬──
> 
> 
> ─
> n                │ 42
> long_column_name │ xx
> ─[ RECORD 2
> 
> ]┼──
> 
> 
> ─
> n                │ 210
> long_column_name │
> 
> x
> 
> 
> x
>
> And here's the same command after the patch:
>
> $ psql template1 -c "\x on" -c "\pset border 1;" -c "select n,
> repeat('x', n) as long_column_name from unnest(array[42,210]) as n"
> Expanded display is on.
> Border style is 1.
> ─[ RECORD 1
> 
> ]┬──
> n                │ 42
> long_column_name │ xx
> ─[ RECORD 2
> 
> ]┼──
> n                │ 210
> long_column_name │
> 
> x
> 
> 
> x
>
>
> the length of lines should be consistent.
>
> Your proposal breaks pspg
>
> https://github.com/okbob/pspg 
>
> It can be separate option, but it should not be default
>
>


I also find this annoying and would be happy to be rid of it.  I could
see setting an option like


\pset xheader_width column|page|nnn


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Another regexp performance improvement: skip useless paren-captures

2021-08-05 Thread Andrew Dunstan


On 8/5/21 10:39 AM, Robert Haas wrote:
> On Thu, Aug 5, 2021 at 9:43 AM Andrew Dunstan  wrote:
>> On 8/4/21 6:15 PM, Tom Lane wrote:
>>> Here's a little finger exercise that improves a case that's bothered me
>>> for awhile.  In a POSIX regexp, parentheses cause capturing by default;
>>> you have to write the very non-obvious "(?:...)" if you don't want the
>>> matching substring to be reported by the regexp engine.
>> It's not obscure to perl programmers :-)
> Well, I consider myself a pretty fair perl programmer, 


I also consider you one :-)


Perhaps I should have said "many perl programmers".


> and I know
> there's a way to do that, but I never do it, and I would have had to
> look up the exact syntax. So +1 from me for anything automatic that
> avoids paying the overhead in some cases.


Yeah, I'm not arguing against the idea. I also have to look it up,
mainly because there is such a huge amount of stuff that can follow
"(?", do "perldoc perlre" happens a lot when I'm doing that sort of work.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
čt 5. 8. 2021 v 16:50 odesílatel Platon Pronko 
napsal:

> Hi!
>
> > pspg supports copy cell to clipboard - you can copy complete cell
> >
> > but I am not sure if it is working well in extended mode. This mode is
> not
> > extra tested in pspg
>
> Thanks for the tip!
>
> I tried it out, in non-extended mode copy indeed works well, in extended
> mode vertical cursor
> selects both columns (and copies both), and in extended+wrapped mode copy
> includes wrapping
> symbol.
>

Yes, this is probably not finished yet. It was a surprise for me, so I am
able to use the clipboard from the terminal application, so support for
copy is relatively new feature. I'll check it when I have time.

Pavel







> Best regards,
> Platon Pronko
>


Re: Another regexp performance improvement: skip useless paren-captures

2021-08-05 Thread Tom Lane
Robert Haas  writes:
> Well, I consider myself a pretty fair perl programmer, and I know
> there's a way to do that, but I never do it, and I would have had to
> look up the exact syntax. So +1 from me for anything automatic that
> avoids paying the overhead in some cases.

That's my feeling about it too --- I never really think of this
point when writing a regexp.  It seems like something the engine
ought to handle gracefully, so this patch is an attempt to do so.

regards, tom lane




RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-05 Thread houzj.f...@fujitsu.com
On Thursday, August 5, 2021 1:09 PM Masahiko Sawada  
wrote
> I've reviewed v2 patch. Here are some comments:
> 
> +   if (type == ALTER_SUBSCRIPTION_SET_PUBLICATION ||
> +   type == ALTER_SUBSCRIPTION_REFRESH)
> +   drop_table = !bsearch(, pubrel_local_oids,
> + list_length(pubrel_names),
> + sizeof(Oid), oid_cmp);
> +   else if (type == ALTER_SUBSCRIPTION_DROP_PUBLICATION)
> +   drop_table = bsearch(, pubrel_local_oids,
> +list_length(pubrel_names),
> +sizeof(Oid), oid_cmp);
> 
> IIUC, in DROP PUBLICATION cases, we don't need to refer to the tables in the
> publication on the publisher that we're dropping in order to check whether to
> remove the relation. If we drop a table from the publication on the publisher
> before dropping the publication from the subscription on the subscriber, the
> pg_subscription_rel entry for the table remains. For example, please consider 
> the
> following case:
> 
> On publisher and subscriber:
> create table t1 (a int);
> create table t2 (a int);
> create table t3 (a int);
> 
> On publisher:
> create publication pub12 for table t1, t2; create publication pub3 for table 
> t3;
> 
> On subscriber:
> create subscription sub connection 'dbname=postgres' publication pub12, pub3;
> 
> On publisher:
> alter publication pub12 drop table t2;
> 
> On subscriber:
> alter subscription sub drop publication pub12; select srsubid,
> srrelid::regclass::text, srsubstate, srsublsn from pg_subscription_rel;
> 
>  srsubid | srrelid | srsubstate | srsublsn
> -+-++---
>16393 | t3  | r  | 0/1707CE0
>16393 | t2  | r  | 0/1707CE0
> 
> With v2 patch, the pg_subscription_rel has the entry for 't2' but it should 
> not
> exist.
> 
> But that doesn't mean that drop_table should always be true in DROP
> PULIBCATION cases. Since the tables associated with two publications can
> overlap, we cannot remove the relation from pg_subscription_rel if it is also
> associated with the remaining publication. For example:
> 
> On publisher and subscriber:
> create table t1 (a int);
> create table t2 (a int);
> create table t3 (a int);
> 
> On publisher:
> create publication pub12 for table t1, t2; create publication pub13 for table 
> t1, t3;
> 
> On subscriber:
> create subscription sub connection 'dbname=postgres' publication pub12,
> pub13;
> 
> On subscriber:
> alter subscription sub drop publication pub12; select srsubid,
> srrelid::regclass::text, srsubstate, srsublsn from pg_subscription_rel;  
> srsubid |
> srrelid | srsubstate | srsublsn
> -+-++---
>16393 | t3  | r  | 0/17080B0
> (1 row)
> 
> With v2 patch, the pg_subscription_rel has only one entry for t3 but it should
> have also the one for t1 since it's still subscribing to pub13.
> 
> To summary, I think that what we want to do in DROP SUBSCRIPTION cases is to
> drop relations from pg_subscription_rel that are no longer included in the 
> set of
> after-calculated publications. In the latter example, when ALTER
> SUBSCRIPTION ... DROP PUBLICATION pub12, we should compare the current
> set of relations associated with {pub12, pub13} to the set of relcations 
> associated
> with pub13, not pub12. Then we can find out t2 is no longer associated with 
> the
> after-calculated publication (i.g., pub13) so remove it.

Thanks for the review. You are right, I missed this point.
Attach new version patch which fix these problems(Also add a new pl-test).

Best regards,
Houzj



v3-0001-fix-ALTER-SUB-ADD-DROP-PUBLICATION.patch
Description: v3-0001-fix-ALTER-SUB-ADD-DROP-PUBLICATION.patch


Re: Another regexp performance improvement: skip useless paren-captures

2021-08-05 Thread Robert Haas
On Thu, Aug 5, 2021 at 9:43 AM Andrew Dunstan  wrote:
> On 8/4/21 6:15 PM, Tom Lane wrote:
> > Here's a little finger exercise that improves a case that's bothered me
> > for awhile.  In a POSIX regexp, parentheses cause capturing by default;
> > you have to write the very non-obvious "(?:...)" if you don't want the
> > matching substring to be reported by the regexp engine.
> It's not obscure to perl programmers :-)

Well, I consider myself a pretty fair perl programmer, and I know
there's a way to do that, but I never do it, and I would have had to
look up the exact syntax. So +1 from me for anything automatic that
avoids paying the overhead in some cases.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
čt 5. 8. 2021 v 16:32 odesílatel Platon Pronko 
napsal:

> Hi!
>
> My apologies - I was using pspg incorrectly. If it's called via the pipe
> then all the column wrapping is ignored, and that's why I couldn't
> reproduce
> the issues. If instead pspg is used via "\setenv PAGER pspg", then I
> indeed can't
> scroll the table horizontally more than 1 character, and that's definitely
> broken.
>
> I'll be using "\pset format wrapped" from now on, it's good enough. My only
> issue would be with copying the long cell contents, but that's an uncommon
> case and can be worked around without too much trouble.
>

pspg supports copy cell to clipboard - you can copy complete cell, although
some parts are invisible

https://github.com/okbob/pspg#export--clipboard

but I am not sure if it is working well in extended mode. This mode is not
extra tested in pspg

Regards

Pavel





> Best regards,
> Platon Pronko
>


Re: Another regexp performance improvement: skip useless paren-captures

2021-08-05 Thread Tom Lane
Andrew Dunstan  writes:
> I'm a bit worried about how you'll keep track of back-ref numbering
> since back-refs only count capturing groups, and you're silently turning
> a capturing group into a non-capturing group.

They're already numbered at this point, and we aren't changing the numbers
of the capturing groups that remain live.  There will be unused entries in
the regmatch_t array at runtime (corresponding to the zapped groups), but
that doesn't cost anything worth mentioning.

Now that you mention it, I am not sure whether there are any regression
test cases that specifically cover still being able to match \2 when
the first capture group went away.  Probably should add more cases...

regards, tom lane




Re: Commitfest overflow

2021-08-05 Thread Robert Haas
On Tue, Aug 3, 2021 at 2:52 PM Tomas Vondra
 wrote:
> How would this be different from the CFM just rejecting patches? It does
> not matter if there's an explicit number of patches that we allow to be
> moved to the next CF - someone still needs to make the decision, and I
> agree with Tom it probably should not be CFM's job.
>
> IMO what the CFM can do is make an assessment, share it on the mailing
> list with a proposal to reject the patch, and leave the decision up to
> the patch author. Either the author accepts the view it's futile, or
> decides to keep working on it, solicits some reviews, etc.

I don't really agree with this. Not all decisions can or should be
made collaboratively.

In particular, I think CfM's have been pretty timid about bouncing
stuff that isn't really showing signs of progress. If a patch has been
reviewed at some point in the past, and say a month has gone by, and
now we're beginning or ending a CommitFest, the patch should just be
closed. Otherwise the CommitFest just fills up with clutter. It's not
the worst thing in the world to ask on the thread, "hey, can we RwF
this?" but it risks delaying a decision that ought to have been
automatic. Patches that are not being updated regularly have no
business being part of a CommitFest.

Also, if it's clear that an idea has quite a bit of opposition, it
should be RwF'd or rejected, even if the patch author doesn't agree.
Nothing whatever keeps the patch author from continuing to argue about
it, or indeed resubmitting the patch. But we don't get any value out
of having a list of things that somebody should take a look at which
includes things that have been looked at already and judged by
multiple people to be not acceptable. I don't think a patch should be
rejected on the strength of a single -1, but when 2 or 3 people have
shown up to say either that they don't like the idea or that the
implementation is way off base, it's not helpful to leave things in a
state that suggests it needs more eyeballs.

Arguing with people about the merits of a patch that has a 0%
probability of being committed by anybody is one of my least favorite
things. I have only so much time, and I want to invest the time I have
in things that have a chance of getting someplace. I don't mind the
time spent telling somebody whose patch is bad why it has no hope -
and I think that's an important thing for us to do. But if somebody
thinks, say, that full page writes are not needed and we should just
remove them, I only want to explain why they are wrong. I don't want
to argue about whether they are actually right. There are plenty of
patches where somebody has ideas that may or may not be good and we
should talk about it - but there are also some where talking isn't
going to help, and those can easily consume massive amounts of time
and energy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: when the startup process doesn't (logging startup delays)

2021-08-05 Thread Robert Haas
On Thu, Aug 5, 2021 at 7:41 AM Nitin Jadhav
 wrote:
> This seems a little confusing. As we are setting
> last_startup_progress_timeout = now and then calling
> reset_startup_progress_timeout() which will calculate the next_time
> based on the value of last_startup_progress_timeout initially and
> checks whether next_timeout is less than now. It doesn't make sense to
> me. I feel we should calculate the next_timeout based on the time that
> it is supposed to fire next time. So we should set
> last_startup_progress_timeout = next_timeout after enabling the timer.
> Also I feel with the 2 functions mentioned above, we also need
> InitStartupProgress() which sets the initial value to
> startupProcessOpStartTime which is used to calculate the time
> difference and display in the logs. I could see those functions like
> below.
>
> InitStartupProgress(void)
> {
> startupProcessOpStartTime = GetCurrentTimestamp();
> ResetStartupProgressTimeout(startupProcessOpStartTime);
> }

This makes sense, but I think I'd like to have all the functions in
this patch use names_like_this() rather than NamesLikeThis().

> reset_startup_progress_timeout(TimeStampTz now)
> {
>   next_timeout = last_startup_progress_timeout + interval;
>   if (next_timeout < now)
>   {
>  // Either the timeout was processed so late that we missed an entire 
> cycle,
>  // or the system clock was set backwards.
>  next_timeout = now + interval;
>   }
>   enable_timeout_at(next_timeout);
>   last_startup_progress_timeout = next_timeout;
> }

Hmm, yeah, that seems good, but given this change, maybe the variables
need a little renaming. Like change last_startup_progress_timeout to
scheduled_startup_progress_timeout, perhaps.

> startup_progress_timeout_has_expired()
> {
>if (!startup_progress_timer_expired)
>  return;
>   now = GetCurrentTimestamp();
>   // compute timestamp difference based on startupProcessOpStartTime
>   reset_startup_progress_timeout(now);
> }

I guess this one needs to return a Boolean, actually.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Commitfest overflow

2021-08-05 Thread Andrew Dunstan


On 8/5/21 8:49 AM, Tomas Vondra wrote:
> On 8/5/21 11:27 AM, Michael Banck wrote:
>> On Tue, Aug 03, 2021 at 11:55:50AM -0400, Bruce Momjian wrote:
>>> On Tue, Aug  3, 2021 at 04:53:40PM +0100, Simon Riggs wrote:
 There are 273 patches in the queue for the Sept Commitfest already, so
 it seems clear the queue is not being cleared down each CF as it was
 before. We've been trying hard, but it's overflowing.

 ...
>>>
>>> I wonder if our lack of in-person developer meetings is causing some of
>>> our issues to not get closed.
>>
>> Well, or the lack of virtual developer meetings? Sure, in-person
>> meetings are difficult now, but OTOH virtual meetings are very common
>> these days and maybe could be tried as quarterly developer meetings or
>> so and have the advantage that, modulo timezone problems, every
>> (invited) developer could attend.
>>
>
> I'm quite tired of virtual meetings, but I agree the lack of meetings
> is noticeable so maybe we should give it a try. In fact, we'll
> probably need to do that anyway, because even if there's an in-person
> meeting early next year the in-person participation is likely to be
> limited.
>
>
>

+1


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
čt 5. 8. 2021 v 15:30 odesílatel Platon Pronko 
napsal:

> Hi!
>
> > a) with line cursor
>
> Tried in different configurations, seems that line cursor works fine.
>
> > b) format detection - I try to detect the header line - and I expect this
> > line has the most length of all lines. I use a line with the same length
> as
> > the column's type info (data, border).
>
> I looked at pspg source, and here's what I found (please correct me if
> some/all
> of my assumptions are wrong):
>
> 1. Input handling and format detection happens in table.c readfile().
>
> 2. There's some variables in DataDesc - maxx and maxbytes - that store the
> longest line seen so far. But they are re-updated on each new row, so the
> fact
> that header is shorter shouldn't affect them.
>
> 3. Expanded header detection is handled by is_expanded_header function. It
> has
> ei_minx and ei_maxx return pointers, but when it is used from readfile()
> these
> pointers are set to NULL in both cases - so header length is simply
> ignored.
>

The problem can be in translate_headline in bad desc->headline_size


> > Did you test the wrapped format? It is working
> >
> > \pset format wrapped
> > \x
> > select  * from pg_class;
>
> There's no difference in outputs in wrapped format, so pspg behavior is
> also unaffected.
>
> By the way, it seems that pspg recommends setting \pset border 2 anyway,
> so in vast
> majority of cases there should be no possibility of difference at all -
> proposed patch
> doesn't change output for \pset border 2 (because there's no sane way of
> making it
> look okay in presence of long fields anyway).
>

Although pspg prefers border 2, it is working for all configurations now.

I think your proposal is not good, because it is breaking consistency and
if people want similar output, then they can use a wrapped format.

Regards

Pavel




>
> Best regards,
> Platon Pronko
>


Re: .ready and .done files considered harmful

2021-08-05 Thread Robert Haas
On Thu, Aug 5, 2021 at 7:39 AM Dipesh Pandit  wrote:
> Yes, we can avoid storing another copy of information. We can
> use XLogCtl's ThisTimeLineID on Primary. However,
> XLogCtl's ThisTimeLineID is not set to the current timeline ID on
> Standby server. It's value is set to '0'. Can we use XLogCtl's
> replayEndTLI on the Standby server to get the current timeline ID?

I'm not sure. I think we need the value to be accurate during
recovery, so I'm not sure whether replayEndTLI would get us there.
Another approach might be to set ThisTimeLineID on standbys also.
Actually just taking a fast look at the code I'm not quite sure why
that isn't happening already. Do you have any understanding of that?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Another regexp performance improvement: skip useless paren-captures

2021-08-05 Thread Andrew Dunstan


On 8/4/21 6:15 PM, Tom Lane wrote:
> Here's a little finger exercise that improves a case that's bothered me
> for awhile.  In a POSIX regexp, parentheses cause capturing by default;
> you have to write the very non-obvious "(?:...)" if you don't want the
> matching substring to be reported by the regexp engine.  


It's not obscure to perl programmers :-)


> That'd be fine
> if capturing were cheap, but with our engine it is not particularly
> cheap.  In many situations, the initial DFA check is sufficient to
> tell whether there is an overall match, but it does not tell where any
> subexpression match boundaries are.  To identify exactly which substring
> is deemed to match a parenthesized subexpression, we have to recursively
> break down the match, which takes at the very least a few more DFA
> invocations; and with an uncooperative regex, it can easily result in
> O(N^2) behavior where there was none at the DFA stage.
>
> Therefore, we really ought to expend some effort to not capture
> subexpressions if the sub-match data is not actually needed, which in
> many invocations we know that it isn't.  Spencer's original code has
> a REG_NOSUB option that looks like it ought to be good for this ... but
> on closer inspection it's basically useless, because it turns *all*
> parens into non-capturing ones.  That breaks back-references, so unless
> you know that the regexp contains no back-refs, you can't use it.


In perl you can use the 'n' modifier for this effect (since 5.22)


I would expect to know if a back-ref were present.


I'm a bit worried about how you'll keep track of back-ref numbering
since back-refs only count capturing groups, and you're silently turning
a capturing group into a non-capturing group.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Lowering the ever-growing heap->pd_lower

2021-08-05 Thread Simon Riggs
On Wed, 4 Aug 2021 at 15:39, Robert Haas  wrote:
>
> On Tue, Aug 3, 2021 at 8:44 PM Peter Geoghegan  wrote:
> > This time it's quite different: we're truncating the line pointer
> > array during pruning. Pruning often occurs opportunistically, during
> > regular query processing. In fact I'd say that it's far more common
> > than pruning by VACUUM. So the chances of line pointer array
> > truncation hurting rather than helping seems higher.
>
> How would it hurt?
>
> It's easy to see the harm caused by not shortening the line pointer
> array when it is possible to do so: we're using up space in the page
> that could have been made free. It's not so obvious to me what the
> downside of shortening it might be. I suppose there is a risk that we
> shorten it and get no benefit, or even shorten it and have to lengthen
> it again almost immediately. But neither of those things really
> matters unless shortening is expensive. If we can piggy-back it on an
> existing WAL record, I don't really see what the problem is.

Hmm, there is no information in WAL to describe the line pointers
being truncated by PageTruncateLinePointerArray(). We just truncate
every time we see a XLOG_HEAP2_VACUUM record and presume it does the
same thing as the original change.

If that is safe, then we don't need to put the truncation on a WAL
record at all, we just truncate after every XLOG_HEAP2_PRUNE record.

If that is not safe... then we have a PG14 bug.

If we do want to see this in WAL, both xl_heap_vacuum and
xl_heap_prune lend themselves to just adding one more OffsetNumber
onto the existing array, to represent the highest offset after
truncation. So we don't need to change the WAL format.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: A varint implementation for PG?

2021-08-05 Thread Andrew Dunstan


On 8/4/21 7:21 PM, Tomas Vondra wrote:
> On 8/5/21 1:05 AM, Andres Freund wrote:
>
>>
>>> The first one seems quite efficient in how it encodes the length
>>> into very
>>> few bits (which matters especially for small values). It's designed for
>>> integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary
>>> lengths
>>> fairly easily, I think:
>>
>>> Look at the first byte, and
>>>
>>> 0 - 243 - encoded as is
>>> 244 - 1 byte
>>> 245 - 2 bytes
>>> 246 - 3 bytes
>>> 247 - 4 bytes
>>> 248 - 5 bytes
>>> 249 - 6 bytes
>>> 250 - 7 bytes
>>> 251 - 8 bytes
>>> 252 - next 1 byte is length
>>> 253 - next 2 bytes are length
>>> 254 - next 3 bytes are length
>>> 255 - next 4 bytes are length
>>
>>> If we want to support longer lengths, we'd have to reserve an extra
>>> value
>>> (which reduces the number of values that require a single byte).
>>
>> I think that's not a bad scheme. I think it may end up being a bit more
>> expensive to decode because you need more branches instead of using
>> find-first-set type instructions.
>>
>
> I don't think it requires many branches, because you can essentially do
>
>   if (byte[0] <= 243)
>     length = 0
>   else if (byte[0] <= 251)
>     length = byte[0] - 243
>   else
>   {
>     length_bytes = byte[0] - 251;
>     ... read length_bytes, decode length
>   }
>
> but I haven't tried implementing it and maybe my intuition is wrong.
>
> Or maybe it'd be a good scheme for on-disk format, but poor for memory.
>
>
>

This seems like quite an elegant scheme. Certainly worth trying out. I
find it hard to believe that more than 4 length bytes would be needed
(although that reminds me of the famous and possibly apocryphal quote
"640K ought to be enough for anybody.")


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Commitfest overflow

2021-08-05 Thread Tomas Vondra

On 8/5/21 8:39 AM, Andrey Borodin wrote:

...

Early commitfests recognized a rule that patch authors owed one review per
patch registered in the commitfest.  If authors were holding to that, then
both submissions and reviews would slow during vacations, but the neglected
fraction of the commitfest would be about the same.  I think it would help to
track each author's balance (reviews_done - reviews_owed).


+1 for tracking this.


Yeah, I agree we should be stricter about this rule, but I'm somewhat 
skeptical about tracking it in the CF app - judging patch and review 
complexity seems quite subjective, etc.



BTW when review is done? When first revision is published? Or when patch is 
committed\rollbacked?
When the review is owed? At the moment when patch is submitted? Or when it is 
committed?



I think the rule is roughly that when you submit a patch to a CF, you're 
expected to review a patch of comparable complexity in the same CF. It's 
not tied to whether the patch is committed, etc.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Commitfest overflow

2021-08-05 Thread Tomas Vondra

On 8/5/21 11:27 AM, Michael Banck wrote:

On Tue, Aug 03, 2021 at 11:55:50AM -0400, Bruce Momjian wrote:

On Tue, Aug  3, 2021 at 04:53:40PM +0100, Simon Riggs wrote:

There are 273 patches in the queue for the Sept Commitfest already, so
it seems clear the queue is not being cleared down each CF as it was
before. We've been trying hard, but it's overflowing.

...


I wonder if our lack of in-person developer meetings is causing some of
our issues to not get closed.


Well, or the lack of virtual developer meetings? Sure, in-person
meetings are difficult now, but OTOH virtual meetings are very common
these days and maybe could be tried as quarterly developer meetings or
so and have the advantage that, modulo timezone problems, every
(invited) developer could attend.



I'm quite tired of virtual meetings, but I agree the lack of meetings is 
noticeable so maybe we should give it a try. In fact, we'll probably 
need to do that anyway, because even if there's an in-person meeting 
early next year the in-person participation is likely to be limited.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
čt 5. 8. 2021 v 13:14 odesílatel Platon Pronko 
napsal:

> Hi!
>
> pspg looks nice :)
>
> > Your proposal breaks pspg
> >
> > https://github.com/okbob/pspg
>
> I tried the following (after the patch):
>
> ./psql template1 -c "select n, repeat('x', n) as long_column_name from
> unnest(array[42,210]) as n" | pspg
>
> And it seems that pspg handled the situation without any issue. Also I
> tried inserting newlines into the output, and pspg still performed without
> issue. Can you give an example of scenario which should break after
> changing the record header length?
>

I think so there can be two basic problems:

a) with line cursor

b) format detection - I try to detect the header line - and I expect this
line has the most length of all lines. I use a line with the same length as
the column's type info (data, border).

Did you test the wrapped format? It is working

\pset format wrapped
\x
select  * from pg_class;



Regards

Pavel




> Best regards,
> Platon Pronko
>


Re: when the startup process doesn't (logging startup delays)

2021-08-05 Thread Nitin Jadhav
Thanks for the detailed explanation.

> +elapsed_ms = (seconds * 1000) + (useconds / 1000);
> +interval_in_ms = log_startup_progress_interval * 1000;
> +enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT,
> + (interval_in_ms - (elapsed_ms % interval_in_ms)));
>
> This will work correctly only if elapsed_ms is equal to interval_in_ms
> or slightly greater than interval_ms. But if elapsed_ms is greater
> than two times interval_ms, then it will produce pretty much random
> results. If elapsed_ms is negative because the system clock gets set
> backward, a possibility I've already mentioned to you in a previous
> review, then it will also misbehave. I actually don't think
> enable_timeout_after() is very easy to use for this kind of thing. At
> least for me, it's way easier to think about calculating the timestamp
> at which I want the timer to expire. Maybe something along these
> lines:
>
> next_timeout = last_startup_progress_timeout + interval;
> if (next_timeout < now)
> {
>// Either the timeout was processed so late that we missed an entire cycle,
>// or the system clock was set backwards.
>next_timeout = now + interval;
> }
> enable_timeout_at(next_timeout);
>
> So I think we should take this out, which would permit simplifying a
> bunch of code.The four places where you call
> ereport_startup_progress(true, ...) would go away.
> ereport_startup_progress() would no longer need a Boolean argument,
> and neither would LogStartupProgressTimerExpired() /
> startup_progress_timer_has_expired(). Note that there's no real need
> to disable the timeout when we're done with it. It's fine if we do,
> but if we don't, it's also not a big deal; all that happens if we
> leave the timer scheduled and let it expire is that it will set a
> Boolean flag that nobody will care about. So what I'm thinking about
> is that we could just have, say, reset_startup_progress_timeout() and
> startup_progress_timeout_has_expired().
> reset_startup_progress_timeout() would just do exactly what I showed
> above to reset the timeout, and you'd call it at the beginning of each
> phase. And startup_progress_timeout_has_expired() would look roughly
> like this:
>
> if (!startup_progress_timer_expired)
>return;
> now = GetCurrentTimestamp();
> // compute timestamp difference
> last_startup_progress_timeout = now;
> reset_startup_progress_timeout();

This seems a little confusing. As we are setting
last_startup_progress_timeout = now and then calling
reset_startup_progress_timeout() which will calculate the next_time
based on the value of last_startup_progress_timeout initially and
checks whether next_timeout is less than now. It doesn't make sense to
me. I feel we should calculate the next_timeout based on the time that
it is supposed to fire next time. So we should set
last_startup_progress_timeout = next_timeout after enabling the timer.
Also I feel with the 2 functions mentioned above, we also need
InitStartupProgress() which sets the initial value to
startupProcessOpStartTime which is used to calculate the time
difference and display in the logs. I could see those functions like
below.

InitStartupProgress(void)
{
startupProcessOpStartTime = GetCurrentTimestamp();
ResetStartupProgressTimeout(startupProcessOpStartTime);
}

reset_startup_progress_timeout(TimeStampTz now)
{
  next_timeout = last_startup_progress_timeout + interval;
  if (next_timeout < now)
  {
 // Either the timeout was processed so late that we missed an entire cycle,
 // or the system clock was set backwards.
 next_timeout = now + interval;
  }
  enable_timeout_at(next_timeout);
  last_startup_progress_timeout = next_timeout;
}

startup_progress_timeout_has_expired()
{
   if (!startup_progress_timer_expired)
 return;
  now = GetCurrentTimestamp();
  // compute timestamp difference based on startupProcessOpStartTime
  reset_startup_progress_timeout(now);
}

Kindly share your thoughts and correct me if I am wrong.

> I think we also should consider where to put the new functions
> introduced by this patch, and the GUC. You put them in xlog.c which is
> reasonable since that is where StartupXLOG() lives. However, xlog.c is
> also a gigantic file, and xlog.h is included in a lot of places, most
> of which aren't going to care about the new things you're adding to
> that file at all. So I'm thinking it might make more sense to put the
> new code in src/backend/postmaster/startup.c. That is actually a
> better thematic fit given that this is really about the startup
> process specifically, not WAL-logging in general. Then reinit.c would
> include startup.h instead of xlog.h, which seems better, because I
> don't think we want any actual xlog operations to happen from within
> that file, so better not to be including xlog.h.
>
> The patch currently lacks documentation. It needs to update config.sgml.

I agree and I will take care in the next patch.

Thanks & Regards,
Nitin Jadhav



On Tue, Aug 3, 

Re: .ready and .done files considered harmful

2021-08-05 Thread Dipesh Pandit
Hi,

> I don't really understand why you are storing something in shared
> memory specifically for the archiver. Can't we use XLogCtl's
> ThisTimeLineID instead of storing another copy of the information?

Yes, we can avoid storing another copy of information. We can
use XLogCtl's ThisTimeLineID on Primary. However,
XLogCtl's ThisTimeLineID is not set to the current timeline ID on
Standby server. It's value is set to '0'. Can we use XLogCtl's
replayEndTLI on the Standby server to get the current timeline ID?

Thanks,
Dipesh


Re: very long record lines in expanded psql output

2021-08-05 Thread Pavel Stehule
Hi

čt 5. 8. 2021 v 12:36 odesílatel Platon Pronko 
napsal:

> In expanded mode psql calculates the width of the longest field in all the
> output rows,
> and prints the header line according to it. This often results in record
> header wrapping
> over several lines (see example below).
>
> This huge record header is printed the same way before each record, even
> if all the fields
> in current record are small and fit into terminal width. This often leads
> to situations
> when record header occupies the entirety of the screen, for all records,
> even though there are
> only a few records with huge fields in a record set.
>
> Maybe we can avoid making the header line longer than terminal width for
> \pset border 0
> and \pset border 1? We already have terminal width calculated. Please see
> attached a patch
> with the proposed implementation.
>
> Example output before the modification, in a terminal with a 100-column
> width:
>
> $ psql template1 -c "\x on" -c "\pset border 1;" -c "select n, repeat('x',
> n) as long_column_name from unnest(array[42,210]) as n"
> Expanded display is on.
> Border style is 1.
> ─[ RECORD 1
> ]┬──
>
> 
> ─
> n│ 42
> long_column_name │ xx
> ─[ RECORD 2
> ]┼──
>
> 
> ─
> n│ 210
> long_column_name │
> x
>
> 
> x
>
> And here's the same command after the patch:
>
> $ psql template1 -c "\x on" -c "\pset border 1;" -c "select n, repeat('x',
> n) as long_column_name from unnest(array[42,210]) as n"
> Expanded display is on.
> Border style is 1.
> ─[ RECORD 1
> ]┬──
> n│ 42
> long_column_name │ xx
> ─[ RECORD 2
> ]┼──
> n│ 210
> long_column_name │
> x
>
> 
> x
>
>
the length of lines should be consistent.

Your proposal breaks pspg

https://github.com/okbob/pspg

It can be separate option, but it should not be default

Pavel


Best regards,
> Platon Pronko
>


very long record lines in expanded psql output

2021-08-05 Thread Platon Pronko

In expanded mode psql calculates the width of the longest field in all the 
output rows,
and prints the header line according to it. This often results in record header 
wrapping
over several lines (see example below).

This huge record header is printed the same way before each record, even if all 
the fields
in current record are small and fit into terminal width. This often leads to 
situations
when record header occupies the entirety of the screen, for all records, even 
though there are
only a few records with huge fields in a record set.

Maybe we can avoid making the header line longer than terminal width for \pset 
border 0
and \pset border 1? We already have terminal width calculated. Please see 
attached a patch
with the proposed implementation.

Example output before the modification, in a terminal with a 100-column width:

$ psql template1 -c "\x on" -c "\pset border 1;" -c "select n, repeat('x', n) as 
long_column_name from unnest(array[42,210]) as n"
Expanded display is on.
Border style is 1.
─[ RECORD 1 
]┬──

─
n│ 42
long_column_name │ xx
─[ RECORD 2 
]┼──

─
n│ 210
long_column_name │ 
x

x

And here's the same command after the patch:

$ psql template1 -c "\x on" -c "\pset border 1;" -c "select n, repeat('x', n) as 
long_column_name from unnest(array[42,210]) as n"
Expanded display is on.
Border style is 1.
─[ RECORD 1 
]┬──
n│ 42
long_column_name │ xx
─[ RECORD 2 
]┼──
n│ 210
long_column_name │ 
x

x

Best regards,
Platon Pronko
diff --git a/src/fe_utils/print.c b/src/fe_utils/print.c
index d48fcc4a03..9483d0d852 100644
--- a/src/fe_utils/print.c
+++ b/src/fe_utils/print.c
@@ -1157,6 +1157,7 @@ print_aligned_vertical_line(const printTextFormat *format,
 			unsigned long record,
 			unsigned int hwidth,
 			unsigned int dwidth,
+			int output_columns,
 			printTextRule pos,
 			FILE *fout)
 {
@@ -1200,6 +1201,14 @@ print_aligned_vertical_line(const printTextFormat *format,
 	}
 	if (reclen < 0)
 		reclen = 0;
+
+	if (output_columns > 0) {
+		if (opt_border == 0)
+			dwidth = Min(dwidth, output_columns - hwidth);
+		if (opt_border == 1)
+			dwidth = Min(dwidth, output_columns - hwidth - 3);
+	}
+
 	for (i = reclen; i < dwidth; i++)
 		fputs(opt_border > 0 ? lformat->hrule : " ", fout);
 	if (opt_border == 2)
@@ -1501,10 +1510,11 @@ print_aligned_vertical(const printTableContent *cont,
 
 			if (!opt_tuples_only)
 print_aligned_vertical_line(format, opt_border, record++,
-			lhwidth, dwidth, pos, fout);
+			lhwidth, dwidth, output_columns,
+			pos, fout);
 			else if (i != 0 || !cont->opt->start_table || opt_border == 2)
 print_aligned_vertical_line(format, opt_border, 0, lhwidth,
-			dwidth, pos, fout);
+			dwidth, output_columns, pos, fout);
 		}
 
 		/* Format the header */
@@ -1691,7 +1701,7 @@ print_aligned_vertical(const printTableContent *cont,
 	{
 		if (opt_border == 2 && !cancel_pressed)
 			print_aligned_vertical_line(format, opt_border, 0, hwidth, dwidth,
-		PRINT_RULE_BOTTOM, fout);
+		output_columns, PRINT_RULE_BOTTOM, fout);
 
 		/* print footers */
 		if (!opt_tuples_only && cont->footers != NULL && !cancel_pressed)


Re: Added schema level support for publication.

2021-08-05 Thread Amit Kapila
On Wed, Aug 4, 2021 at 4:10 PM Amit Kapila  wrote:
>
> On Tue, Aug 3, 2021 at 8:38 PM vignesh C  wrote:
> >
> > Thanks for reporting this, this is fixed in the v18 patch attached.
> >
>
> I have started looking into this patch and below are some initial comments.
>

Few more comments:
===
1. Do we need the previous column 'puballtables' after adding pubtype
or pubkind in pg_publication?

2.
@@ -224,6 +279,20 @@ CreatePublication(ParseState *pstate,
CreatePublicationStmt *stmt)
..
+ nspcrel = table_open(NamespaceRelationId, ShareUpdateExclusiveLock);
+ PublicationAddSchemas(puboid, schemaoidlist, true, NULL);
+ table_close(nspcrel, ShareUpdateExclusiveLock);

What is the reason for opening and taking a lock on
NamespaceRelationId? Do you want to avoid dropping the corresponding
schema during this duration? If so, that is not sufficient because
what if somebody drops it just before you took lock on
NamespaceRelationId. I think you need to use LockDatabaseObject to
avoid dropping the schema and note that it should be unlocked only at
the end of the transaction similar to what we do for tables. I guess
you need to add this locking inside the function
PublicationAddSchemas. Also, it would be good if you can add few
comments in this part of the code to explain the reason for locking.

3. The function PublicationAddSchemas is called from multiple places
in the patch but the locking protection is not there at all places. I
think if you add locking as suggested in the previous point then it
should be okay. I think you need similar locking for
PublicationDropSchemas.

4.
@@ -421,16 +537,84 @@ AlterPublicationTables(AlterPublicationStmt
*stmt, Relation rel,
  PublicationAddTables(pubid, rels, true, stmt);

  CloseTableList(delrels);
+ if (pubform->pubtype == PUBTYPE_EMPTY)
+ UpdatePublicationTypeTupleValue(rel, tup,
+ Anum_pg_publication_pubtype,
+ PUBTYPE_TABLE);
  }

At the above and all other similar places, the patch first updates the
pg_publication after performing the rel/schema operation. Isn't it
better to first update pg_publication to remain in sync with how
CreatePublication works? I am not able to see any issue with the way
you have it in the patch but it is better to keep the code consistent
across various similar functions to avoid confusion in the future.

-- 
With Regards,
Amit Kapila.




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2021-08-05 Thread Paul Guo
Rebased.


v12-0002-Tests-to-replay-create-database-operation-on-sta.patch
Description:  v12-0002-Tests-to-replay-create-database-operation-on-sta.patch


v12-0001-Support-node-initialization-from-backup-with-tab.patch
Description:  v12-0001-Support-node-initialization-from-backup-with-tab.patch


v12-0003-Fix-replay-of-create-database-records-on-standby.patch
Description:  v12-0003-Fix-replay-of-create-database-records-on-standby.patch


v12-0004-Fix-database-create-drop-wal-description.patch
Description: v12-0004-Fix-database-create-drop-wal-description.patch


Re: Two patches to speed up pg_rewind.

2021-08-05 Thread Paul Guo
On Tue, Jun 22, 2021 at 11:08 AM Paul Guo  wrote:
>
> On Thu, Jun 17, 2021 at 3:19 PM Michael Paquier  wrote:
> >
> > On Wed, Jun 02, 2021 at 05:02:10PM +0900, Michael Paquier wrote:
> > > On Wed, Jun 02, 2021 at 06:20:30PM +1200, Thomas Munro wrote:
> > > > The main thing I noticed was that Linux < 5.3 can fail with EXDEV if
> > > > you cross a filesystem boundary, is that something we need to worry
> > > > about there?
> > >
> > > Hmm.  Good point.  That may justify having a switch to control that.
> >
> > Paul, the patch set still needs some work, so I am switching it as
> > waiting on author.  I am pretty sure that we had better have a
> > fallback implementation of copy_file_range() in src/port/, and that we
> > are going to need an extra switch in pg_rewind to allow users to
> > bypass copy_file_range()/EXDEV if they do a local rewind operation
> > across different FSes with a kernel < 5.3.
> > --
>
> I did modification on the copy_file_range() patch yesterday by simply falling
> back to read()+write() but I think it could be improved further.
>
> We may add a function to determine two file/path are copy_file_range()
> capable or not (using POSIX standard statvfs():f_fsid?) - that could be used
> by other copy_file_range() users although in the long run the function
> is not needed.
> And even having this we may still need the fallback code if needed.
>
> - For pg_rewind, we may just determine that ability once on src/dst pgdata, 
> but
>   since there might be soft link (tablespace/wal) in pgdata so we should still
>   allow fallback for those non copy_fie_range() capable file copying.
> - Also it seems that sometimes copy_file_range() could return 
> ENOTSUP/EOPNOTSUP
>   (the file system does not support that and the kernel does not fall
> back to simple copying?)
>   although this is not documented and it seems not usual?
>
> Any idea?

I modified the copy_file_range() patch using the below logic:

If the first call of copy_file_range() fails with errno EXDEV or
ENOTSUP, pg_rewind
would not use copy_file_range() in rest code, and if copy_file_range() fails we
fallback to use the previous read()+write() code logic for the file.


v4-0001-Fsync-the-affected-files-directories-only-in-pg_r.patch
Description: Binary data


v4-0002-Use-copy_file_range-if-possible-for-file-copying-.patch
Description: Binary data


RE: Parallel INSERT SELECT take 2

2021-08-05 Thread houzj.f...@fujitsu.com
Hi,

Based on the discussion in another thread[1], we plan to change the design like
the following:

allow users to specify a parallel-safety option for both partitioned and
non-partitioned relations but for non-partitioned relations if users didn't
specify, it would be computed automatically? If the user has specified
parallel-safety option for non-partitioned relation then we would consider that
instead of computing the value by ourselves.

In this approach, it will be more convenient for users to use and get the
benefit of parallel select for insert.

Since most of the technical discussion happened in another thread, I
posted the new version patch including the new design to that thread[2].
Comments are also welcome in that thread.

[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BMQnm6RkqooHA7R-y7riRa84qsh5j3FZDScw71m_n4OA%40mail.gmail.com

[2] 
https://www.postgresql.org/message-id/OS0PR01MB5716DB1E3F723F86314D080094F09%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best regards,
houzj


Re: Commitfest overflow

2021-08-05 Thread Michael Banck
On Tue, Aug 03, 2021 at 11:55:50AM -0400, Bruce Momjian wrote:
> On Tue, Aug  3, 2021 at 04:53:40PM +0100, Simon Riggs wrote:
> > There are 273 patches in the queue for the Sept Commitfest already, so
> > it seems clear the queue is not being cleared down each CF as it was
> > before. We've been trying hard, but it's overflowing.
> > 
> > Of those, about 50 items have been waiting more than one year, and
> > about 25 entries waiting for more than two years.
> > 
> > One problem is that many entries don't have official reviewers, though
> > many people have commented on Hackers, making it difficult to track
> > down items that actually need work. That wastes a lot of time for
> > would-be reviewers (or at least, it has done for me).
> > 
> > Please can there be some additional coordination to actively clear
> > down this problem? I won't attempt to come up with ideas to do this,
> > but others may wish to suggest ways that avoid Committer burn-out?
> > 
> > I will be happy to volunteer to be part of an organized approach that
> > spreads out the work.
> 
> I wonder if our lack of in-person developer meetings is causing some of
> our issues to not get closed.

Well, or the lack of virtual developer meetings? Sure, in-person
meetings are difficult now, but OTOH virtual meetings are very common
these days and maybe could be tried as quarterly developer meetings or
so and have the advantage that, modulo timezone problems, every
(invited) developer could attend.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Sascha Heuer, Geoff Richardson,
Peter Lilley

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz




Re: Fix around conn_duration in pgbench

2021-08-05 Thread Yugo NAGATA
Hello Fujii-san,

On Thu, 5 Aug 2021 16:16:48 +0900
Fujii Masao  wrote:

> 
> 
> On 2021/08/01 14:50, Yugo NAGATA wrote:
> > When -C is not specified, the disconnection time is not measured even in
> > the patch for v14+. I don't think it is a problem because the
> > disconnection delay at the end of benchmark almost doesn't affect the tps.
> 
> What about v13 or before? That is, in v13, even when -C is not specified,
> both the connection and disconnection delays are measured. Right?

No. Although there is a code measuring the thread->conn_time around
disconnect_all() in v13 or before;

done:
INSTR_TIME_SET_CURRENT(start);
disconnect_all(state, nstate);
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(thread->conn_time, end, start);

this is a no-op because finishCon() is already called at CSTATE_ABORTED or 
CSTATE_FINISHED. Therefore, in the end, the disconnection delay is not
measured even in v13.

Regards,
Yugo Nagata


-- 
Yugo NAGATA 




RE: Skipping logical replication transactions on subscriber side

2021-08-05 Thread osumi.takami...@fujitsu.com
On Tuesday, August 3, 2021 3:49 PM  Masahiko Sawada  
wrote:
> I've attached new patches that incorporate all comments I got so far.
> Please review them.
Hi, I had a chance to look at the patch-set during my other development.
Just let me share some minor cosmetic things.


[1] unnatural wording ? in v5-0002.
+ * create tells whether to create the new subscription entry if it is not
+ * create tells whether to create the new subscription relation entry if it is

I'm not sure if this wording is correct or not.
You meant just "tells whether to create " ?,
although we already have 1 other "create tells" in HEAD.

[2] typo "kep" in v05-0002.

I think you meant "kept" in below sentence.

+/*
+ * Subscription error statistics kep in the stats collector.  One entry 
represents
+ * an error that happened during logical replication, reported by the apply 
worker
+ * (subrelid is InvalidOid) or by the table sync worker (subrelid is a valid 
OID).

[3] typo "lotigcal" in the v05-0004 commit message.

If incoming change violates any constraint, lotigcal replication stops
until it's resolved. This commit introduces another way to skip the
transaction in question.

It should be "logical".

[4] warning of doc build

I've gotten an output like below during my process of make html.
Could you please check this ?

Link element has no content and no Endterm. Nothing to show in the link to 
monitoring-pg-stat-subscription-errors


Best Regards,
Takamichi Osumi



RE: Implementing Incremental View Maintenance

2021-08-05 Thread r.takahash...@fujitsu.com
Hi Nagata-san,


Thank you for your reply.

> I'll investigate this more, but we may have to prohibit views on partitioned
> table and partitions.

I think this restriction is strict.
This feature is useful when the base table is large and partitioning is also 
useful in such case.


I have several additional comments on the patch.


(1)
The following features are added to transition table.
- Prolong lifespan of transition table
- If table has row security policies, set them to the transition table
- Calculate pre-state of the table

Are these features only for IVM?
If there are other useful case, they should be separated from IVM patch and
should be independent patch for transition table.


(2)
DEPENDENCY_IMMV (m) is added to deptype of pg_depend.
What is the difference compared with existing deptype such as 
DEPENDENCY_INTERNAL (i)?


(3)
Converting from normal materialized view to IVM or from IVM to normal 
materialized view is not implemented yet.
Is it difficult?

I think create/drop triggers and __ivm_ columns can achieve this feature.


Regards,
Ryohei Takahashi




Re: param 'txn' not used in function maybe_send_schema()

2021-08-05 Thread Fujii Masao




On 2021/07/31 10:08, Fujii Masao wrote:

Barring any objection, I will commit the patch.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Possible dependency issue in makefile

2021-08-05 Thread Filip Janus
Thanks, for Your advice Tom.
I tried writing to temp file followed by mv, but it didn't bring the
desired effect.
I am going to follow Your minimal approach and I'll build src/port,
src/common, src/interfaces/libpq, then src/interfaces/ecpg, in series.

Regards,

-Filip-

st 4. 8. 2021 v 15:46 odesílatel Tom Lane  napsal:

> Filip Janus  writes:
> > I am building libecpg 13.1 but 13.3 behaves in the same manner and my
> build
> > fails with:
> > ...
> > Complete build log: log <
> https://fjanus.fedorapeople.org/postgresql/build.log>
>
> It looks like you're just running configure and then trying to do this:
>
> /usr/bin/make -O -j40 V=1 VERBOSE=1 -C src/interfaces/ecpg
>
> which is probably not a great idea.  It bypasses the .NOTPARALLEL:
> in src/Makefile, which would normally ensure that src/port gets
> built before src/interfaces.  If you want to not build the entire
> system, I think a minimal approach would be to make src/port,
> src/common, src/interfaces/libpq, then src/interfaces/ecpg, in series.
>
> As-is, it looks like two different sub-makes are recursing to build
> pg_config_paths.h concurrently.  Since the rule for that is
>
> pg_config_paths.h: $(top_builddir)/src/Makefile.global
> echo "#define PGBINDIR \"$(bindir)\"" >$@
> echo "#define PGSHAREDIR \"$(datadir)\"" >>$@
> ...
> echo "#define HTMLDIR \"$(htmldir)\"" >>$@
> echo "#define MANDIR \"$(mandir)\"" >>$@
>
> it's not too surprising that concurrent builds misbehave.  I don't
> know of any way to prevent make from doing that other than
> sprinkling .NOTPARALLEL around a lot more, which would defeat
> your purpose in using -j in the first place.
>
> We could possibly adjust this specific rule to create pg_config_paths.h
> atomically (say, write to a temp file and then "mv" it into place), but
> I don't have any faith that there's not other similar issues behind
> this one.  Building ecpg by itself is not a case that we test or consider
> supported.
>
> regards, tom lane
>
>
>


Re: [Proposal] Global temporary tables

2021-08-05 Thread ZHU XIAN WEN
Hi WenJing


Thanks for the feedback,

I have tested the code, it seems okay, and regression tests got pass

and I have reviewed the code, and I don't find any issue anymore


Hello all


Review and comments for the patches V51 is welcome.


if there is no feedback, I'm going to changed the status to 'Ready for
Committer' on Aug 25


big thanks

Tony



On 2021/7/29 23:19, wenjing zeng wrote:
>
>> 2021年7月28日 23:09,Tony Zhu  写道:
>>
>> Hi Wenjing
>>
>> would you please rebase the code?
> Thank you for your attention.
> According to the test, the latest pgmaster code can merge the latest patch 
> and pass the test.
> https://www.travis-ci.com/github/wjzeng/postgres/builds 
> 
> If you have any questions, please give me feedback.
>
>
> Wenjing
>
>
>> Thank you very much
>> Tony
>>
>> The new status of this patch is: Waiting on Author
>



OpenPGP_signature
Description: OpenPGP digital signature


Accidentally dropped constraints: bug?

2021-08-05 Thread Simon Riggs
If we drop a column we cascade that drop to all indexes and all
constraints that mention that column, even if they include other
columns also. We might expect that indexes should be dropped
automatically, but the latter behavior for constraints seems like a
bug, since it can silently remove constraints that might still be
valid without the dropped column. (Example below). This is even more
surprising if the user specifies RESTRICT explicitly. I note that this
is acting as documented, it's just the docs don't explain the full
implications, so I'm guessing we didn't think about this before.

The effect of accidentally removing a constraint can be fairly
dramatic, for example, tables suddenly start refusing updates/deletes
if they are part of a publication, or partitioning schemes that depend
upon check constraints can suddenly stop working. As well as the more
obvious loss of protection from bad input data.

ISTM that we should refuse to drop constraints, if the constraint is
also dependent upon other columns that will remain in the table,
unless the user requests CASCADE.

- - -

create table abc (a int, b int, c int, primary key (a,b,c), check (a >
5 and b is not null and c > 10));

create index bc on abc (b, c);

\d abc
Table "public.abc"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   | not null |
 b  | integer |   | not null |
 c  | integer |   | not null |

Indexes:
"abc_pkey" PRIMARY KEY, btree (a, b, c)
"bc" btree (b, c)
Check constraints:
"abc_c_check" CHECK (c > 9)
"abc_check" CHECK (a > 5 AND b IS NOT NULL AND c > 10)

alter table abc drop column c restrict;

\d abc
Table "public.abc"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   | not null |
 b  | integer |   | not null |

Noting that all constraints have been removed, not just the ones
wholly dependent on dropped columns.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: archive status ".ready" files may be created too early

2021-08-05 Thread Kyotaro Horiguchi
At Thu, 5 Aug 2021 05:15:04 +, "Bossart, Nathan"  
wrote in 
> On 8/4/21, 9:05 PM, "Kyotaro Horiguchi"  wrote:
> > By the way about the v3 patch,
> >
> > +#define InvalidXLogSegNo   ((XLogSegNo) 0x)
> >
> > Like InvalidXLogRecPtr, the first valid segment number is 1 so we can
> > use 0 as InvalidXLogSegNo.
> 
> It's been a while since I wrote this, but if I remember correctly, the
> issue with using 0 is that we could end up initializing
> lastNotifiedSeg to InvalidXLogSegNo in XLogWrite().  Eventually, we'd
> initialize it to 1, but we will have skipped creating the .ready file
> for the first segment.

Maybe this?

+   
SetLastNotifiedSegmentIfInvalid(openLogSegNo - 1);

Hmm. Theoretically 0 is invalid as segment number. So we'd better not
using 0 as a valid value of lastNotifiedSeg.

Honestly I don't like having this initialization in XLogWrite.  We
should and I think can initialize it earlier.  It seems to me the most
appropriate timing to initialize the variable is just before running
the end-of-recovery checkpoint).  Since StartupXLOG knows the first
segment to write .  If it were set to 0, that doesn't matter at all.
We can get rid of the new symbol by doing this.

Maybe something like this:

>   {
>   /*
>* There is no partial block to copy. Just set InitializedUpTo, 
> and
>* let the first attempt to insert a log record to initialize 
> the next
>* buffer.
>*/
>   XLogCtl->InitializedUpTo = EndOfLog;
>   }
> 
+   /*
+* EndOfLog resides on the next segment of the last finished one. Set 
the
+* last finished segment as lastNotifiedSeg now.  In the case where the
+* last crash has left the last several segments not being marked as
+* .ready, the checkpoint just after does that for all finished 
segments.
+* There's a corner case where the checkpoint advances segment, but that
+* ends up at most with a duplicate archive notification.
+*/
+   XLByteToSeg(EndOfLog, EndOfLogSeg, wal_segment_size);
+   Assert(EndOfLogSeg > 0);
+   SetLastNotifiedSegment(EndOfLogSeg - 1);
+
>   LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;

Does this makes sense?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center






Re: Fix around conn_duration in pgbench

2021-08-05 Thread Fujii Masao




On 2021/08/01 14:50, Yugo NAGATA wrote:

When -C is not specified, the disconnection time is not measured even in
the patch for v14+. I don't think it is a problem because the
disconnection delay at the end of benchmark almost doesn't affect the tps.


What about v13 or before? That is, in v13, even when -C is not specified,
both the connection and disconnection delays are measured. Right?
If right, the time required to close the connection in CSTATE_FINISHED
state should also be measured?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: row filtering for logical replication

2021-08-05 Thread Peter Smith
v21 --> v22

(This small change is only to keep the patch up-to-date with HEAD)

Changes:

* A recent commit [1] added a new TAP subscription test file 023, so
now this patch's test file (previously "023_row_filter.pl") has been
bumped to "024_row_filter.pl".

--
[1] 
https://github.com/postgres/postgres/commit/63cf61cdeb7b0450dcf3b2f719c553177bac85a2

Kind Regards,
Peter Smith.
Fujitsu Australia


v22-0001-Row-filter-for-logical-replication.patch
Description: Binary data


Re: Commitfest overflow

2021-08-05 Thread Andrey Borodin



> 5 авг. 2021 г., в 09:25, Noah Misch  написал(а):
> 
> On Tue, Aug 03, 2021 at 12:13:44PM -0400, Tom Lane wrote:
>> Bruce Momjian  writes:
>>> On Tue, Aug  3, 2021 at 04:53:40PM +0100, Simon Riggs wrote:
 There are 273 patches in the queue for the Sept Commitfest already, so
 it seems clear the queue is not being cleared down each CF as it was
 before. We've been trying hard, but it's overflowing.
> 
>> 1. There wasn't that much getting done during this CF because it's
>> summer and many people are on vacation (in the northern hemisphere
>> anyway).
> 
>> I don't think there's much to be done about the vacation effect;
>> we just have to accept that the summer CF is likely to be less
>> productive than others.
> 
> Early commitfests recognized a rule that patch authors owed one review per
> patch registered in the commitfest.  If authors were holding to that, then
> both submissions and reviews would slow during vacations, but the neglected
> fraction of the commitfest would be about the same.  I think it would help to
> track each author's balance (reviews_done - reviews_owed).

+1 for tracking this.
BTW when review is done? When first revision is published? Or when patch is 
committed\rollbacked?
When the review is owed? At the moment when patch is submitted? Or when it is 
committed?

Best regards, Andrey Borodin.