date:20200730

Re: autovac issue with large number of tables

2020-07-30 Thread Kasahara Tatsuhito

Hi,

On Tue, Jul 28, 2020 at 3:49 AM Jim Nasby  wrote:
> I'm in favor of trying to improve scheduling (especially allowing users
> to control how things are scheduled), but that's a far more invasive
> patch. I'd like to get something like this patch in without waiting on a
> significantly larger effort.

BTW, Have you tried the patch suggested in the thread below?

https://www.postgresql.org/message-id/20180629.173418.190173462.horiguchi.kyotaro%40lab.ntt.co.jp

The above is a suggestion to manage statistics on shared memory rather
than in a file, but I think this feature may mitigate your problem.
I think that feature has yet another performance challenge, but it
might be worth a try.
The above patch will also require a great deal of effort to get into
the PostgreSQL-core, but I'm curious to see how well it works for this
problem.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Re: Fast DSM segments

2020-07-30 Thread Thomas Munro

On Mon, Jul 27, 2020 at 2:45 PM Thomas Munro  wrote:
> Here's a new version, using the name min_dynamic_shared_memory, which
> sounds better to me.  Any objections?  I also fixed the GUC's maximum
> setting so that it's sure to fit in size_t.

I pushed it like that.  Happy to rename the GUC if someone has a better idea.

I don't really love the way dsm_create()'s code flows, but I didn't
see another way to do this within the existing constraints.  I think
it'd be nice to rewrite this thing to get rid of the random
number-based handles that are directly convertible to key_t/pathname,
and instead use something holding {slot number, generation number}.
Then you could improve that code flow and get rid of several cases of
linear array scans under an exclusive lock.  The underlying
key_t/pathname would live in the slot.  You'd need a new way to find
the control segment itself after a restart, where
dsm_cleanup_using_control_segment() cleans up after the previous
incarnation, but I think that just requires putting the key_t/pathname
directly in PGShmemHeader, instead of a new {slot number, generation
number} style handle.  Or maybe a separate mapped file opened by well
known pathname, or something like that.

Re: Fix minor source code comment mistake

2020-07-30 Thread Michael Paquier

On Thu, Jul 30, 2020 at 05:57:40PM +0900, Michael Paquier wrote:
> Indeed.  Let's fix this.

And done.
--
Michael


signature.asc
Description: PGP signature

Re: Threading in BGWorkers (!)

2020-07-30 Thread James Sewell

>
> I see no good reason to believe that the signal handler issue is the only
> one.  Even if it is,
> not being able to call any postgres infrastructure is a pretty huge
> handicap.
>

(changed emails to get rid of the nasty employer notice...)

It's at least a workable handicap that I'm happy to deal with.

I can say with 100% confidence that people coming from non C languages will
be using threading in Postgres backends as interop matures (and it's
happening fast now). A lot of the time they won't even know they are using
threads as it will be done by libraries they make use of transparently.

Let's help them to avoid unsafe code now, not wait until they show up on
this list with a critical failure and tap at the big sign that says "NO
THREADING".

- james

RE: [Proposal] Add accumulated statistics for wait event

2020-07-30 Thread imai.yoshik...@fujitsu.com

> This patch fails to apply to HEAD, please submit a rebased version.  I've
> marked this as as Waiting on Author.

Sorry for my absence. Unfortunately I couldn't have time to work on this patch 
in this cf.
I believe I will be back in next cf, work on this patch and also review other 
patches.

---
Yoshikazu IMAI

RE: [Patch] Optimize dropping of relation buffers using dlist

2020-07-30 Thread k.jami...@fujitsu.com

On Friday, July 31, 2020 2:37 AM, Konstantin Knizhnik wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
> 
> I have tested this patch at various workloads and hardware (including Power2
> server with 384 virtual cores) and didn't find performance regression.
> 
> The new status of this patch is: Ready for Committer

Thank you very much, Konstantin, for testing the patch for different workloads.
I wonder if I need to modify some documentations.
I'll leave the final review to the committer/s as well.

Regards,
Kirk Jamison

Re: Add Information during standby recovery conflicts

2020-07-30 Thread Masahiko Sawada

On Sat, 11 Jul 2020 at 21:56, Drouvot, Bertrand  wrote:
>
>
> On 7/7/20 9:43 AM, Masahiko Sawada wrote:
> > Fully makes sense, the new patch version attached is now producing:
> >> 2020-07-06 06:10:36.022 UTC [14035] LOG:  waiting for recovery conflict
> >> on snapshot
> > How about adding the subject? that is, "recovery is waiting for
> > recovery conflict on %s" or "recovery process  is waiting for
> > conflict on %s".
>
>
> The subject is now added in the new attached patch (I did not include
> the pid as it is part of the log prefix).
>
> It now looks like:
>
> 2020-07-11 12:00:41.092 UTC [23217] LOG:  recovery is waiting for
> recovery conflict on snapshot
> 2020-07-11 12:00:41.092 UTC [23217] DETAIL:  There is 1 blocking
> connection(s).
> 2020-07-11 12:00:41.092 UTC [23217] CONTEXT:  WAL redo at 0/4A0A6BF0 for
> Heap2/CLEAN: remxid 1128
>  WAL record received at 2020-07-11 12:00:41.092231+00
>  tbs 1663 db 13586 rel 16805, fork main, blkno 0
> >
> >> 2020-07-06 06:10:36.022 UTC [14035] DETAIL:  WAL record received at
> >> 2020-07-06 06:10:36.021963+00.
> >>   Tablespace/database/relation are 1663/13586/16672, fork is main
> >> and block is 0.
> >>   There is 1 blocking connection(s).
> > To follow the existing log message, perhaps this can be something like
> > "WAL record received at %s, rel %u/%u/%u, fork %s, blkno %u. %d
> > processes". But I'm not sure the errdetail is the best place to
> > display the WAL information as I mentioned in the latter part of this
> > email.
>
> moved to the context and formatted the same way as the current
> Standby/LOCK context.
>
>
> > Ok. Let's keep this thread for the new attached patch that focus on the
> >> recovery process waiting.
> > Thank you for updating the patch!
> >
> > I've tested the latest patch.
>
>
> Thank you for testing and reviewing!
>
>
> > On recovery conflict on lock and on
> > bufferpin, if max_standby_streaming_delay is disabled (set to -1), the
> > logs don't appear even if log_recovery_conflicts is true.
>
>
> Nice catch! it is fixed in the new attached patch (the log reporting has
> been moved out of StandbyTimeoutHandler()).
>
> >
> > Here is random comments on the code:
> >
> > +   recovery_conflict_main_message = psprintf("waiting for
> > recovery conflict on %s",
> > +
> > get_procsignal_reason_desc(reason));
> > :
> > +   ereport(LOG,
> > +   (errmsg("%s", 
> > recovery_conflict_main_message),
> > +   errdetail("%s\n" "There is %d blocking
> > connection(s).", wal_record_detail_str, num_waitlist_entries)));
> >
> > It's not translation-support-friendly. I think the message "waiting
> > for recovery conflict on %s" should be surrounded by _(). Or we can
> > just put it to ereport as follows:
> >
> > ereport(LOG,
> >  (errmsg("waiting for recovery conflicts on %s",
> > get_procsignal_reason_desc(reason))
> >  ...
>
>
> changed in the new attached patch.
>
>
> >
> > ---
> > +   oldcontext = MemoryContextSwitchTo(ErrorContext);
> > +   econtext = error_context_stack;
> > +
> > +   if (XLogRecGetBlockTag(econtext->arg, 0, &rnode, &forknum, &blknum))
> >
> > I don't think it's a good idea to rely on error_context_stack because
> > other codes might set another error context before reaching here in
> > the future.
>
>
> right, changed in the new attached patch: this is now done in
> rm_redo_error_callback() and using the XLogReaderState passed as argument.
>
>
> >
> > ---
> > +   if (XLogRecGetBlockTag(econtext->arg, 0, &rnode, &forknum, &blknum))
> > +   wal_record_detail_str = psprintf("WAL record received
> > at %s.\nTablespace/database/relation are %u/%u/%u, fork is %s and
> > block is %u.",
> > +   receipt_time_str,
> > rnode.spcNode, rnode.dbNode, rnode.relNode,
> > +   forkNames[forknum],
> > +   blknum);
> >
> > There might be a block tag in block ids other than 0.
>
>
> right, fixed in the new attached patch.
>
>
> > I'm not sure the
> > errdetail is the best place where we display WAL information.
>
>
> moved to context in the new attached patch.
>
>
> > For
> > instance, we display both the relation oid and block number depending
> > on RM as follows:
> >
> > 2020-07-07 15:50:30.733 JST [13344] LOG:  waiting for recovery conflict on 
> > lock
> > 2020-07-07 15:50:30.733 JST [13344] DETAIL:  WAL record received at
> > 2020-07-07 15:50:27.73378+09.
> >  There is 1 blocking connection(s).
> > 2020-07-07 15:50:30.733 JST [13344] CONTEXT:  WAL redo at 0/328
> > for Standby/LOCK: xid 506 db 13586 rel 16384
> >
> > I wonder if we can display the details of redo WAL information by
> > improving xlog_outdesc() or rm_redo_error_callback() so that it
> > displays relfilenode, forknum, and block number. What do you think?
>
>
> I think that fully make sense to

Re: windows config.pl question

2020-07-30 Thread Dmitry Markman

sorry I meant file src/tools/msvc/Solution.pm


routine sub GetOpenSSLVersion

has the follwing line:

qq("$self->{options}->{openssl}\\bin\\openssl.exe" version 2>&1);

in our distribution openssl.exe isn’t in the $self->{options}->{openssl}\bin\  
location

dm




> On Jul 30, 2020, at 10:25 PM, Dmitry Markman  wrote:
> 
> Hi Michael, thanks a lot
> 
> I figured it out, UNC works indeed
> 
> however I found at least 2 problems (at least in our 3p harness)
> 
> 1. in our configuration openssl executable went to lib (I don’t know why), so 
> I had to change line in Configure script
> and point to openssl.exe file. Sure I probably can change our makefile to 
> create bin directory and put openssl.exe there
> but it’s not my file, maybe later
> 
> 2. if PostgreSQL is situated on network drive that mapped to say disk z:, 
> then build script failed:
> 
> Z:\3p\derived\win64\PostgreSQL\source\src\tools\msvc>build
> Detected hardware platform: Win32
> Generating win32ver.rc for src/backend
> Generating win32ver.rc for src/timezone
> Generating win32ver.rc for src/backend/snowball
> Generating win32ver.rc for src/pl/plpgsql/src
> Generating win32ver.rc for src/backend/replication/libpqwalreceiver
> Generating win32ver.rc for src/backend/replication/pgoutput
> Generating win32ver.rc for src/interfaces/ecpg/pgtypeslib
> 
> . . . . . . . . . . . . 
> 
> Building the projects in this solution one at a time. To enable parallel 
> build, please add the "/m" switch.
> Build started 7/30/2020 5:52:12 PM.
> Project "Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln" on node 1 (default 
> targets).
> Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln.metaproj : error MSB4126: The 
> specified solution configuration "Release
> |x64" is invalid. Please specify a valid solution configuration using the 
> Configuration and Platform properties (e.g. M
> SBuild.exe Solution.sln /p:Configuration=Debug /p:Platform="Any CPU") or 
> leave those properties blank to use the defaul
> t solution configuration. [Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln]
> Done Building Project "Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln" 
> (default targets) -- FAILED.
> 
> 
> Build FAILED.
> 
> "Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln" (default target) (1) ->
> (ValidateSolutionConfiguration target) ->
>  Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln.metaproj : error MSB4126: 
> The specified solution configuration "Relea
> se|x64" is invalid. Please specify a valid solution configuration using the 
> Configuration and Platform properties (e.g.
> MSBuild.exe Solution.sln /p:Configuration=Debug /p:Platform="Any CPU") or 
> leave those properties blank to use the defa
> ult solution configuration. [Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln]
> 
>0 Warning(s)
>1 Error(s)
> 
> 
> Time Elapsed 00:00:00.37
> 
> Z:\3p\derived\win64\PostgreSQL\source\src\tools\msvc>
> 
> 
> the same works just fine if it’s on c: drive
> 
> all PostgreSQL distribution is in the Z:\3p\derived\win64\PostgreSQL\source 
> folder
> 
> 
> 
> network UNC path is mapped to Z:
> 
> thanks again for your help
> 
> dm
> 
> 
> 
>> On Jul 30, 2020, at 9:59 PM, Michael Paquier  wrote:
>> 
>> On Thu, Jul 30, 2020 at 06:55:28AM -0400, Dmitry Markman wrote:
>>> icu   => ,
>>> 
>>> is it correct?
>> 
>> Exactly.
>> 
>>> if it’s correct does build support UNC paths?
>> 
>> Yes, these work.  One take to be aware of is that the quoting of the
>> paths can be annoying.  FWIW, I just use single quotes with normal
>> slashes as that's a no-brainer, say:
>> openssl => 'C:/OpenSSL-hoge/',
>> 
>> If you are able to break the scripts with a given path, this would
>> likely be a bug we should address.  For example, we had lately
>> complains about the build scripts breaking once you inserted spaces in
>> the python or OpenSSL paths (see beb2516 and ad7595b).
>> --
>> Michael
>

Re: Index Skip Scan (new UniqueKeys)

2020-07-30 Thread David Rowley

On Mon, 13 Jul 2020 at 10:18, Floris Van Nee  wrote:
> One question about the unique keys - probably for Andy or David: I've looked 
> in the archives to find arguments for/against using Expr nodes or 
> EquivalenceClasses in the Unique Keys patch. However, I couldn't really find 
> a clear answer about why the current patch uses Expr rather than 
> EquivalenceClasses. At some point David mentioned "that probably Expr nodes 
> were needed rather than EquivalenceClasses", but it's not really clear to me 
> why. What were the thoughts behind this?

I'm still not quite sure on this either way.  I did think
EquivalenceClasses were more suitable before I wrote the POC patch for
unique keys.  But after that, I had in mind that Exprs might be
better. The reason I thought this was due to the fact that the
DISTINCT clause list is a bunch of Exprs and if the UniqueKeys were
EquivalenceClasses then checking to see if the DISTINCT can be skipped
turned into something more complex that required looking through lists
of ec_members rather than just checking if the uniquekey exprs were a
subset of the DISTINCT clause.

Thinking about it a bit harder, if we did use Exprs then it would mean
it a case like the following wouldn't work for Andy's DISTINCT no-op
stuff.

CREATE TABLE xy (x int primary key, y int not null);

SELECT DISTINCT y FROM xy WHERE x=y;

whereas if we use EquivalenceClasses then we'll find that we have an
EC with x,y in it and can skip the DISTINCT since we have a UniqueKey
containing that EquivalenceClass.

Also, looking at what Andy wrote to make a case like the following
work in his populate_baserel_uniquekeys() function in the 0002 patch:

CREATE TABLE ab (a int, b int, primary key(a,b));
SELECT DISTINCT a FROM ab WHERE b = 1;

it's a bit uninspiring. Really what we want here when checking if we
can skip doing the DISTINCT is a UniqueKey set using
EquivalenceClasses as we can just insist that any unmatched UniqueKey
items have an ec_is_const == true. However, that means we have to loop
through the ec_members of the EquivalenceClasses in the uniquekeys
during the DISTINCT check.  That's particularly bad when you consider
that in a partitioned table case there might be an ec_member for each
child partition and there could be 1000s of child partitions and
following those ec_members chains is going to be too slow.

My current thoughts are that we should be using EquivalenceClasses but
we should first add some infrastructure to make them perform better.
My current thoughts are that we do something like what I mentioned in
[1] or something more like what Andres mentions in [2].  After that,
we could either make EquivalenceClass.ec_members a hash table or
binary search tree. Or even perhaps just have a single hash table/BST
for all EquivalenceClasses that allows very fast lookups from {Expr}
-> {EquivalenceClass}.  I think an Expr can only belong in a single
non-merged EquivalenceClass. So when we do merging of
EquivalenceClasses we could just repoint that data structure to point
to the new EquivalenceClass. We'd never point to ones that have
ec_merged != NULL.  This would also allow us to fix the poor
performance in regards to get_eclass_for_sort_expr() for partitioned
tables.

So, it seems the patch dependency chain for skip scans just got a bit longer :-(

David

[1] 
https://www.postgresql.org/message-id/CAApHDvrEXcadNYAAdq6RO0eKZUG6rRHXJGAbpzj8y432gCD9bA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/20190920051857.2fhnvhvx4qdddviz%40alap3.anarazel.de#c3add3919c534591eae2179a6c82742c

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-07-30 Thread Michael Paquier

On Thu, Jul 30, 2020 at 11:23:38PM +0200, Daniel Gustafsson wrote:
> Looks good, thanks.  Let's close the CF entry with this and open a new for the
> pg_depend part when that's done.

I have applied the patch, thanks.

And actually, I have found just after that CREATE DATABASE gets
severely impacted by the number of slots initialized when copying the
template dependencies if there are few of them.  The fix is as simple
as delaying the initialization of the slots once we know they will be
used.  In my initial tests, I was using fsync = off, so I missed that.
Sorry about that.  The attached fixes this regression by delaying the
slot initialization until we know that it will be used.  This does not
matter for pg_attribute as we know in advance the number of attributes
to insert.
--
Michael
diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index ef2b87927c..3c63f52a24 100644
--- a/src/backend/catalog/pg_shdepend.c
+++ b/src/backend/catalog/pg_shdepend.c
@@ -809,15 +809,18 @@ copyTemplateDependencies(Oid templateDbId, Oid newDbId)
 	int			slotCount;
 	CatalogIndexState indstate;
 	TupleTableSlot **slot;
-	int			nslots;
+	int			nslots, max_slots;
+	bool		slot_init = true;
 
 	sdepRel = table_open(SharedDependRelationId, RowExclusiveLock);
 	sdepDesc = RelationGetDescr(sdepRel);
 
-	nslots = MAX_PGSHDEPEND_INSERT_BYTES / sizeof(FormData_pg_shdepend);
-	slot = palloc(sizeof(TupleTableSlot *) * nslots);
-	for (int i = 0; i < nslots; i++)
-		slot[i] = MakeSingleTupleTableSlot(sdepDesc, &TTSOpsHeapTuple);
+	/*
+	 * Allocate the slots to use, but delay initialization until we know
+	 * that they will be used.
+	 */
+	max_slots = MAX_PGSHDEPEND_INSERT_BYTES / sizeof(FormData_pg_shdepend);
+	slot = palloc(sizeof(TupleTableSlot *) * max_slots);
 
 	indstate = CatalogOpenIndexes(sdepRel);
 
@@ -842,6 +845,9 @@ copyTemplateDependencies(Oid templateDbId, Oid newDbId)
 	{
 		Form_pg_shdepend shdep;
 
+		if (slot_init)
+			slot[slotCount] = MakeSingleTupleTableSlot(sdepDesc, &TTSOpsHeapTuple);
+
 		ExecClearTuple(slot[slotCount]);
 
 		shdep = (Form_pg_shdepend) GETSTRUCT(tup);
@@ -858,10 +864,11 @@ copyTemplateDependencies(Oid templateDbId, Oid newDbId)
 		slotCount++;
 
 		/* If slots are full, insert a batch of tuples */
-		if (slotCount == nslots)
+		if (slotCount == max_slots)
 		{
 			CatalogTuplesMultiInsertWithInfo(sdepRel, slot, slotCount, indstate);
 			slotCount = 0;
+			slot_init = false;
 		}
 	}
 
@@ -874,6 +881,8 @@ copyTemplateDependencies(Oid templateDbId, Oid newDbId)
 	CatalogCloseIndexes(indstate);
 	table_close(sdepRel, RowExclusiveLock);
 
+	/* drop only the number of slots used */
+	nslots = slot_init ? slotCount : max_slots;
 	for (int i = 0; i < nslots; i++)
 		ExecDropSingleTupleTableSlot(slot[i]);
 	pfree(slot);


signature.asc
Description: PGP signature

Re: Cache relation sizes?

2020-07-30 Thread Thomas Munro

On Sat, Jun 20, 2020 at 10:32 AM Thomas Munro  wrote:
> Rebased.  I'll add this to the open commitfest.

I traced the recovery process while running pgbench -M prepared -c16
-j16 -t1 (= 160,000 transactions).  With the patch, the number of
lseeks went from 1,080,661 (6.75 per pgbench transaction) to just 85.

I went ahead and pushed this patch.

There's still the matter of crazy numbers of lseeks in regular
backends; looking at all processes while running the above test, I get
1,469,060 (9.18 per pgbench transaction) without -M prepared, and
193,722 with -M prepared (1.21 per pgbench transaction).  Fixing that
with this approach will require bullet-proof shared invalidation, but
I think it's doable, in later work.

Re: windows config.pl question

2020-07-30 Thread Dmitry Markman

Hi Michael, thanks a lot

I figured it out, UNC works indeed

however I found at least 2 problems (at least in our 3p harness)

1. in our configuration openssl executable went to lib (I don’t know why), so I 
had to change line in Configure script
and point to openssl.exe file. Sure I probably can change our makefile to 
create bin directory and put openssl.exe there
but it’s not my file, maybe later

2. if PostgreSQL is situated on network drive that mapped to say disk z:, then 
build script failed:

Z:\3p\derived\win64\PostgreSQL\source\src\tools\msvc>build
Detected hardware platform: Win32
Generating win32ver.rc for src/backend
Generating win32ver.rc for src/timezone
Generating win32ver.rc for src/backend/snowball
Generating win32ver.rc for src/pl/plpgsql/src
Generating win32ver.rc for src/backend/replication/libpqwalreceiver
Generating win32ver.rc for src/backend/replication/pgoutput
Generating win32ver.rc for src/interfaces/ecpg/pgtypeslib

. . . . . . . . . . . . 

Building the projects in this solution one at a time. To enable parallel build, 
please add the "/m" switch.
Build started 7/30/2020 5:52:12 PM.
Project "Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln" on node 1 (default 
targets).
Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln.metaproj : error MSB4126: The 
specified solution configuration "Release
|x64" is invalid. Please specify a valid solution configuration using the 
Configuration and Platform properties (e.g. M
SBuild.exe Solution.sln /p:Configuration=Debug /p:Platform="Any CPU") or leave 
those properties blank to use the defaul
t solution configuration. [Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln]
Done Building Project "Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln" 
(default targets) -- FAILED.

Build FAILED.

"Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln" (default target) (1) ->
(ValidateSolutionConfiguration target) ->
  Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln.metaproj : error MSB4126: The 
specified solution configuration "Relea
se|x64" is invalid. Please specify a valid solution configuration using the 
Configuration and Platform properties (e.g.
 MSBuild.exe Solution.sln /p:Configuration=Debug /p:Platform="Any CPU") or 
leave those properties blank to use the defa
ult solution configuration. [Z:\3p\derived\win64\PostgreSQL\source\pgsql.sln]

0 Warning(s)
1 Error(s)

Time Elapsed 00:00:00.37

Z:\3p\derived\win64\PostgreSQL\source\src\tools\msvc>

the same works just fine if it’s on c: drive

all PostgreSQL distribution is in the Z:\3p\derived\win64\PostgreSQL\source 
folder

network UNC path is mapped to Z:

thanks again for your help

 dm

> On Jul 30, 2020, at 9:59 PM, Michael Paquier  wrote:
> 
> On Thu, Jul 30, 2020 at 06:55:28AM -0400, Dmitry Markman wrote:
>> icu   => ,
>> 
>> is it correct?
> 
> Exactly.
> 
>> if it’s correct does build support UNC paths?
> 
> Yes, these work.  One take to be aware of is that the quoting of the
> paths can be annoying.  FWIW, I just use single quotes with normal
> slashes as that's a no-brainer, say:
> openssl => 'C:/OpenSSL-hoge/',
> 
> If you are able to break the scripts with a given path, this would
> likely be a bug we should address.  For example, we had lately
> complains about the build scripts breaking once you inserted spaces in
> the python or OpenSSL paths (see beb2516 and ad7595b).
> --
> Michael

Re: Use of "long" in incremental sort code

2020-07-30 Thread David Rowley

On Fri, 3 Jul 2020 at 07:47, James Coleman  wrote:
> Patch using int64 attached.

I added this to the open items list for PG13.

David

Re: windows config.pl question

2020-07-30 Thread Michael Paquier

On Thu, Jul 30, 2020 at 06:55:28AM -0400, Dmitry Markman wrote:
>  icu   => ,
> 
> is it correct?

Exactly.

> if it’s correct does build support UNC paths?

Yes, these work.  One take to be aware of is that the quoting of the
paths can be annoying.  FWIW, I just use single quotes with normal
slashes as that's a no-brainer, say:
openssl => 'C:/OpenSSL-hoge/',

If you are able to break the scripts with a given path, this would
likely be a bug we should address.  For example, we had lately
complains about the build scripts breaking once you inserted spaces in
the python or OpenSSL paths (see beb2516 and ad7595b).
--
Michael

signature.asc
Description: PGP signature

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Peter Geoghegan

On Thu, Jul 30, 2020 at 6:40 PM Justin Pryzby  wrote:
> Feel free to close it out.  I'm satisfied that we've had a discussion about 
> it.

Closed it out.

-- 
Peter Geoghegan

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Peter Geoghegan

On Thu, Jul 30, 2020 at 6:39 PM James Coleman  wrote:
> I very much do not like this approach, and I think it's actually 
> fundamentally wrong, at least for the memory check. Quicksort is not the only 
> option that uses memory. For now, there's only one option that spills to disk 
> (external merge sort), but there's no reason it has to remain that way.

I wouldn't be surprised if it was possible to get
SORT_TYPE_EXTERNAL_SORT even today (though I'm not sure if that's
truly possible). That will happen for a regular sort node if we
require randomAccess to the sort, and it happens to spill -- we can
randomly access the final tape, but cannot do a final on-the-fly
merge. Say for a merge join.

-- 
Peter Geoghegan

Re: fixing pg_ctl with relative paths

2020-07-30 Thread ZHAOWANCHENG







At 2014-01-28 21:11:54, "Bruce Momjian"  wrote:

>On Mon, Jul  1, 2013 at 08:10:14PM -0400, Josh Kupershmidt wrote:
>> On Thu, Jun 27, 2013 at 11:47 AM, Fujii Masao  wrote:
>> > On Thu, Jun 27, 2013 at 10:36 AM, Josh Kupershmidt  
>> > wrote:
>> >> On Wed, Jun 26, 2013 at 12:22 PM, Fujii Masao  
>> >> wrote:
>> >>> Though this is a corner case, the patch doesn't seem to handle properly 
>> >>> the case
>> >>> where "-D" appears as other option value, e.g., -k option value, in
>> >>> postmaster.opts
>> >>> file.
>> >>
>> >> Could I see a command-line example of what you mean?
>> >
>> > postmaster -k "-D", for example. Of course, it's really a corner case :)
>> 
>> Oh, I see. I was able to trip up strip_datadirs() with something like
>> 
>> $ PGDATA="/my/data/" postmaster -k "-D" -S 100 &
>> $ pg_ctl -D /my/data/ restart
>> 
>> that example causes pg_ctl to fail to start the server after stopping
>> it, although perhaps you could even trick the server into starting
>> with the wrong options. Of course, similar problems exists today in
>> other cases, such as with the relative paths issue this patch is
>> trying to address, or a datadir containing embedded quotes.
>> 
>> I am eager to see the relative paths issue fixed, but maybe we need to
>> bite the bullet and sort out the escaping of command-line options in
>> the rest of pg_ctl first, so that a DataDir like "/tmp/here's a \"
>> quote" can consistently be used by pg_ctl {start|stop|restart} before
>> we can fix this wart.
>
>Where are we on this patch?
>
>-- 
>  Bruce Momjian  http://momjian.us
>  EnterpriseDB http://enterprisedb.com
>
>  + Everyone has their own god. +
>
>
>-- 
>Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-hackers

>


Hi, I encountered the same problem.
I want to know is there a final conclusion?
thank you very much!

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Justin Pryzby

On Thu, Jul 30, 2020 at 06:33:32PM -0700, Peter Geoghegan wrote:
> On Thu, Jul 30, 2020 at 5:22 PM Justin Pryzby  wrote:
> > Because filtering out zero values is exactly what's intended to be avoided 
> > for
> > nontext output.
> >
> > I think checking whether the method was used should result in the same 
> > output,
> > without the literal check for zero value (which itself sets a bad example).
> 
> It seems fine to me as-is. What about SORT_TYPE_TOP_N_HEAPSORT? Or any
> other sort methods we add in the future?

Feel free to close it out.  I'm satisfied that we've had a discussion about it.

-- 
Justin

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread James Coleman

On Thu, Jul 30, 2020 at 8:22 PM Justin Pryzby  wrote:

> On Wed, Jul 29, 2020 at 09:18:44PM -0700, Peter Geoghegan wrote:
> > On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby 
> wrote:
> > > So my 2ndary suggestion is to conditionalize based on the method
> rather than
> > > value of space used.
> >
> > What's the advantage of doing it that way?
>
> Because filtering out zero values is exactly what's intended to be avoided
> for
> nontext output.
>
> I think checking whether the method was used should result in the same
> output,
> without the literal check for zero value (which itself sets a bad example).
>
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2824,13 +2824,13 @@
> show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
> appendStringInfo(&groupName, "%s Groups", groupLabel);
> ExplainOpenGroup("Incremental Sort Groups",
> groupName.data, true, es);
> ExplainPropertyInteger("Group Count", NULL,
> groupInfo->groupCount, es);
>
> ExplainPropertyList("Sort Methods Used", methodNames, es);
>
> -   if (groupInfo->maxMemorySpaceUsed > 0)
> +   if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT)
> {
> longavgSpace =
> groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
> const char *spaceTypeName;
> StringInfoData memoryName;
>
> spaceTypeName =
> tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
> @@ -2841,13 +2841,13 @@
> show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
> ExplainPropertyInteger("Average Sort Space Used",
> "kB", avgSpace, es);
> ExplainPropertyInteger("Peak Sort Space Used",
> "kB",
>
>  groupInfo->maxMemorySpaceUsed, es);
>
> ExplainCloseGroup("Sort Spaces", memoryName.data,
> true, es);
> }
> -   if (groupInfo->maxDiskSpaceUsed > 0)
> +   if (groupInfo->sortMethods & SORT_TYPE_EXTERNAL_SORT)
> {
> longavgSpace =
> groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
> const char *spaceTypeName;
> StringInfoData diskName;
>
> spaceTypeName =
> tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);
>

I very much do not like this approach, and I think it's actually
fundamentally wrong, at least for the memory check. Quicksort is not the
only option that uses memory. For now, there's only one option that spills
to disk (external merge sort), but there's no reason it has to remain that
way. And in the future we might accurately report memory consumed even when
we've eventually spilled to disk also, so memory used would be relevant
potentially even if no in-memory sort was ever performed.

So I'm pretty confident checking the space used is the correct way to do
this.

James

Re: new heapcheck contrib module

2020-07-30 Thread Mark Dilger




> On Jul 30, 2020, at 5:53 PM, Robert Haas  wrote:
> 
> On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
>  wrote:
>> No, that wasn't my concern.  I was thinking about CLOG entries disappearing 
>> during the scan as a consequence of concurrent vacuums, and the effect that 
>> would have on the validity of the cached [relfrozenxid..next_valid_xid] 
>> range.  In the absence of corruption, I don't immediately see how this would 
>> cause any problems.  But for a corrupt table, I'm less certain how it would 
>> play out.
> 
> Oh, hmm. I wasn't thinking about that problem. I think the only way
> this can happen is if we read a page and then, before we try to look
> up the CID, vacuum zooms past, finishes the whole table, and truncates
> clog. But if that's possible, then it seems like it would be an issue
> for SELECT as well, and it apparently isn't, or we would've done
> something about it by now. I think the reason it's not possible is
> because of the locking rules described in
> src/backend/storage/buffer/README, which require that you hold a
> buffer lock until you've determined that the tuple is visible. Since
> you hold a share lock on the buffer, a VACUUM that hasn't already
> processed that freeze the tuples in that buffer; it would need an
> exclusive lock on the buffer to do that. Therefore it can't finish and
> truncate clog either.
> 
> Now, you raise the question of whether this is still true if the table
> is corrupt, but I don't really see why that makes any difference.
> VACUUM is supposed to freeze each page it encounters, to the extent
> that such freezing is necessary, and with Andres's changes, it's
> supposed to ERROR out if things are messed up. We can postulate a bug
> in that logic, but inserting a VACUUM-blocking lock into this tool to
> guard against a hypothetical vacuum bug seems strange to me. Why would
> the right solution not be to fix such a bug if and when we find that
> there is one?

Since I can't think of a plausible concrete example of corruption which would 
elicit the problem I was worrying about, I'll withdraw the argument.  But that 
leaves me wondering about a comment that Andres made upthread:

> On Apr 20, 2020, at 12:42 PM, Andres Freund  wrote:

> I don't think random interspersed uses of CLogTruncationLock are a good
> idea. If you move to only checking visibility after tuple fits into
> [relfrozenxid, nextXid), then you don't need to take any locks here, as
> long as a lock against vacuum is taken (which I think this should do
> anyway).

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Peter Geoghegan

On Thu, Jul 30, 2020 at 5:22 PM Justin Pryzby  wrote:
> Because filtering out zero values is exactly what's intended to be avoided for
> nontext output.
>
> I think checking whether the method was used should result in the same output,
> without the literal check for zero value (which itself sets a bad example).

It seems fine to me as-is. What about SORT_TYPE_TOP_N_HEAPSORT? Or any
other sort methods we add in the future?

The way that we flatten maxDiskSpaceUsed and maxMemorySpaceUsed into
"space used" on output might be kind of questionable, but it's
something that we have to live with for the foreseeable future. I
don't think that this is a bad example -- we don't output
maxDiskSpaceUsed or maxMemorySpaceUsed at the conceptual level.

-- 
Peter Geoghegan

Re: Missing CFI in hlCover()?

2020-07-30 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Here's a proposed patch along that line.

> I've back-patched this to 11 (which was just a bit of fuzz) and tested
> it out with a couple of different queries that were causing issues
> previously on the archive server, and they finish in a much more
> reasonable time and react faster to cancel requests/signals.

Yeah, I'd tried this locally using the data from the one test case you
showed me, and it seemed to fix that.

> So, looks good to me, and would certainly be nice to get this into the
> next set of releases, so the archive server doesn't get stuck anymore.

I'll push this tomorrow if nobody has objected to it.

BTW, I had noticed last night that hlFirstIndex is being unreasonably
stupid.  Many of the "words" have null item pointers and hence can't
possibly match any query item (I think because we have "words" for
inter-word spaces/punctuation as well as the actual words).  Checking
that, as in the attached v2 patch, makes things a bit faster yet.

regards, tom lane

diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 76b6f9aef0..92517e4c17 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -2011,13 +2011,18 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
 	for (i = pos; i < prs->curwords; i++)
 	{
 		/* ... scan the query to see if this word matches any operand */
-		QueryItem  *item = GETQUERY(query);
+		QueryOperand *worditem = prs->words[i].item;
+		QueryItem  *item;
 		int			j;
 
+		if (worditem == NULL)
+			continue;			/* certainly not a match */
+
+		item = GETQUERY(query);
 		for (j = 0; j < query->size; j++)
 		{
 			if (item->type == QI_VAL &&
-prs->words[i].item == &item->qoperand)
+worditem == &item->qoperand)
 return i;
 			item++;
 		}
@@ -2028,8 +2033,14 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
 /*
  * hlCover: try to find a substring of prs' word list that satisfies query
  *
- * At entry, *p must be the first word index to consider (initialize this to
- * zero, or to the next index after a previous successful search).
+ * At entry, *p must be the first word index to consider (initialize this
+ * to zero, or to the next index after a previous successful search).
+ * We will consider all substrings starting at or after that word, and
+ * containing no more than max_cover words.  (We need a length limit to
+ * keep this from taking O(N^2) time for a long document with many query
+ * words but few complete matches.  Actually, since checkcondition_HL is
+ * roughly O(N) in the length of the substring being checked, it's even
+ * worse than that.)
  *
  * On success, sets *p to first word index and *q to last word index of the
  * cover substring, and returns true.
@@ -2038,7 +2049,8 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
  * words used in the query.
  */
 static bool
-hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
+hlCover(HeadlineParsedText *prs, TSQuery query, int max_cover,
+		int *p, int *q)
 {
 	int			pmin,
 pmax,
@@ -2084,7 +2096,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
 nextpmin = nextpmax;
 			pmax = nextpmax;
 		}
-		while (pmax >= 0);
+		while (pmax >= 0 && pmax - pmin < max_cover);
 		/* No luck here, so try next feasible startpoint */
 		pmin = nextpmin;
 	}
@@ -2186,7 +2198,7 @@ get_next_fragment(HeadlineParsedText *prs, int *startpos, int *endpos,
 static void
 mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
   int shortword, int min_words,
-  int max_words, int max_fragments)
+  int max_words, int max_fragments, int max_cover)
 {
 	int32		poslen,
 curlen,
@@ -2213,7 +2225,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
 	covers = palloc(maxcovers * sizeof(CoverPos));
 
 	/* get all covers */
-	while (hlCover(prs, query, &p, &q))
+	while (hlCover(prs, query, max_cover, &p, &q))
 	{
 		startpos = p;
 		endpos = q;
@@ -2368,7 +2380,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
  */
 static void
 mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
-			  int shortword, int min_words, int max_words)
+			  int shortword, int min_words, int max_words, int max_cover)
 {
 	int			p = 0,
 q = 0;
@@ -2386,7 +2398,7 @@ mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
 	if (!highlightall)
 	{
 		/* examine all covers, select a headline using the best one */
-		while (hlCover(prs, query, &p, &q))
+		while (hlCover(prs, query, max_cover, &p, &q))
 		{
 			/*
 			 * Count words (curlen) and interesting words (poslen) within
@@ -2542,6 +2554,7 @@ prsd_headline(PG_FUNCTION_ARGS)
 	int			shortword = 3;
 	int			max_fragments = 0;
 	bool		highlightall = false;
+	int			max_cover;
 	ListCell   *l;
 
 	/* Extract configuration option values */

Re: new heapcheck contrib module

2020-07-30 Thread Robert Haas

On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
 wrote:
> No, that wasn't my concern.  I was thinking about CLOG entries disappearing 
> during the scan as a consequence of concurrent vacuums, and the effect that 
> would have on the validity of the cached [relfrozenxid..next_valid_xid] 
> range.  In the absence of corruption, I don't immediately see how this would 
> cause any problems.  But for a corrupt table, I'm less certain how it would 
> play out.

Oh, hmm. I wasn't thinking about that problem. I think the only way
this can happen is if we read a page and then, before we try to look
up the CID, vacuum zooms past, finishes the whole table, and truncates
clog. But if that's possible, then it seems like it would be an issue
for SELECT as well, and it apparently isn't, or we would've done
something about it by now. I think the reason it's not possible is
because of the locking rules described in
src/backend/storage/buffer/README, which require that you hold a
buffer lock until you've determined that the tuple is visible. Since
you hold a share lock on the buffer, a VACUUM that hasn't already
processed that freeze the tuples in that buffer; it would need an
exclusive lock on the buffer to do that. Therefore it can't finish and
truncate clog either.

Now, you raise the question of whether this is still true if the table
is corrupt, but I don't really see why that makes any difference.
VACUUM is supposed to freeze each page it encounters, to the extent
that such freezing is necessary, and with Andres's changes, it's
supposed to ERROR out if things are messed up. We can postulate a bug
in that logic, but inserting a VACUUM-blocking lock into this tool to
guard against a hypothetical vacuum bug seems strange to me. Why would
the right solution not be to fix such a bug if and when we find that
there is one?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: type info support functions for functions that use "any" type

2020-07-30 Thread Tom Lane

Daniel Gustafsson  writes:
> This patch has been bumped in CFs for the past year, with the thread stalled
> and the last review comment being in support of rejection.  Tom, do you still
> feel it should be rejected in light of Pavel's latest posts?

I have seen no convincing response to the concerns I raised in my
last message in the thread [1], to wit that

1. I think the "flexibility" of letting a support function resolve the
output type in some unspecified way is mostly illusory, because if it
doesn't do it in a way that's morally equivalent to polymorphism, it's
doing it wrong.  Also, I'm not that excited about improving polymorphism
in a way that is only accessible with specialized C code.  The example of
Oracle-like DECODE() could be handled about as well if we had a second set
of anycompatible-style polymorphic types, that is something like

decode(expr anycompatible,
   search1 anycompatible, result1 anycompatible2,
   search2 anycompatible, result2 anycompatible2,
   search3 anycompatible, result3 anycompatible2,
   ...
) returns anycompatible2;

Admittedly, you'd need to write a separate declaration for each number of
arguments you wanted to support, but they could all point at the same C
function --- which'd be a lot simpler than in this patch, since it would
not need to deal with any type coercions, only comparisons.

I also argue that to the extent that the support function is reinventing
polymorphism internally, it's going to be inferior to the parser's
version.  As an example, with Pavel's sample implementation, if a
particular query needs a coercion from type X to type Y, that's nowhere
visible in the parse tree.  So you could drop the cast without being told
that view so-and-so depends on it, leading to a run-time failure next time
you try to use that view.  Doing the same thing with normal polymorphism,
the X-to-Y cast function would be used in the parse tree and so we'd know
about the dependency.

2. I have no faith that the proposed implementation is correct or
complete.  As I complained earlier, a lot of places have special-case
handling for polymorphism, and it seems like every one of them would
need to know about this feature too.  That is, to the extent that
this patch's footprint is smaller than commit 24e2885ee -- which it
is, by a lot -- I think those are bugs of omission.  It will not work
to have a situation where some parts of the backend resolve a function's
result type as one thing and others resolve it as something else thanks to
failure to account for this new feature.  As a concrete example, it looks
like we'd fail pretty hard if someone tried to use this facility in an
aggregate support function.

So my opinion is still what it was in January.

regards, tom lane

[1] https://www.postgresql.org/message-id/31501.1579036195%40sss.pgh.pa.us

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Justin Pryzby

On Wed, Jul 29, 2020 at 09:18:44PM -0700, Peter Geoghegan wrote:
> On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby  wrote:
> > So my 2ndary suggestion is to conditionalize based on the method rather than
> > value of space used.
> 
> What's the advantage of doing it that way?

Because filtering out zero values is exactly what's intended to be avoided for
nontext output.

I think checking whether the method was used should result in the same output,
without the literal check for zero value (which itself sets a bad example).

--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2824,13 +2824,13 @@ 
show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
appendStringInfo(&groupName, "%s Groups", groupLabel);
ExplainOpenGroup("Incremental Sort Groups", groupName.data, 
true, es);
ExplainPropertyInteger("Group Count", NULL, 
groupInfo->groupCount, es);
 
ExplainPropertyList("Sort Methods Used", methodNames, es);
 
-   if (groupInfo->maxMemorySpaceUsed > 0)
+   if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT)
{
longavgSpace = 
groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
const char *spaceTypeName;
StringInfoData memoryName;
 
spaceTypeName = 
tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
@@ -2841,13 +2841,13 @@ 
show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
ExplainPropertyInteger("Average Sort Space Used", "kB", 
avgSpace, es);
ExplainPropertyInteger("Peak Sort Space Used", "kB",
   
groupInfo->maxMemorySpaceUsed, es);
 
ExplainCloseGroup("Sort Spaces", memoryName.data, true, 
es);
}
-   if (groupInfo->maxDiskSpaceUsed > 0)
+   if (groupInfo->sortMethods & SORT_TYPE_EXTERNAL_SORT)
{
longavgSpace = 
groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
const char *spaceTypeName;
StringInfoData diskName;
 
spaceTypeName = 
tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);

Re: Missing CFI in hlCover()?

2020-07-30 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Stephen Frost  writes:
> >>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>  We could hard-code a rule like that, or we could introduce a new
>  explicit parameter for the maximum cover length.  The latter would be
>  more flexible, but we need something back-patchable and I'm concerned
>  about the compatibility hazards of adding a new parameter in minor
>  releases.  So on the whole I propose hard-wiring a multiplier of,
>  say, 10 for both these cases.
> 
> >>> That sounds alright to me, though I do think we should probably still
> >>> toss a CFI (or two) in this path somewhere as we don't know how long
> >>> some of these functions might take...
> 
> >> Yeah, of course.  I'm still leaning to doing that in TS_execute_recurse.
> 
> > Works for me.
> 
> Here's a proposed patch along that line.

I've back-patched this to 11 (which was just a bit of fuzz) and tested
it out with a couple of different queries that were causing issues
previously on the archive server, and they finish in a much more
reasonable time and react faster to cancel requests/signals.

If you'd like to play with it more, the PG11 installed on ark2 now has
this patch built into it.

So, looks good to me, and would certainly be nice to get this into the
next set of releases, so the archive server doesn't get stuck anymore.
:)

Thanks!

Stephen

signature.asc
Description: PGP signature

logtape.c stats don't account for unused "prefetched" block numbers

2020-07-30 Thread Peter Geoghegan

Commit 896ddf9b added prefetching to logtape.c to avoid excessive
fragmentation in the context of hash aggs that spill and have many
batches/tapes. Apparently the preallocation doesn't actually perform
any filesystem operations, so the new mechanism should be zero
overhead when "preallocated" blocks aren't actually used after all
(right?). However, I notice that this breaks the statistics shown by
things like trace_sort, and even EXPLAIN ANALYZE.
LogicalTapeSetBlocks() didn't get the memo about preallocation.

The easiest way to spot the issue is to compare trace_sort output on
v13 with output for the same case in v12 -- the "%u disk blocks used"
statistics are consistently higher on v13, especially for cases with
many tapes. I spotted the bug when I noticed that v13 external sorts
reportedly use more or less disk space when fewer or more tapes are
involved (again, this came from trace_sort). That doesn't make sense
-- the total amount of space used for external sort temp files should
practically be fixed, aside from insignificant rounding effects.
Reducing the amount of memory by orders of magnitude in a Postgres 12
tuplesort will hardly affect the "%u disk blocks used" trace_sort
output at all. That's what we need to get back to.

This bug probably won't be difficult to fix. Actually, we have had
similar problems in the past. The fix could be as simple as teaching
LogicalTapeSetBlocks() about this new variety of "sparse allocation".
Although maybe the preallocation stuff should somehow be rolled into
the much older nHoleBlocks stuff. Unsure.

-- 
Peter Geoghegan

Re: pg_dump --where option

2020-07-30 Thread Daniel Gustafsson

> On 10 Jul 2020, at 02:03, Cary Huang  wrote:
> 
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   tested, failed
> Spec compliant:   tested, failed
> Documentation:tested, failed
> 
> Hi
> 
> I had a look at the patch and it cleanly applies to postgres master branch. I 
> tried to do a quick test on the new "where clause" functionality and for the 
> most part it does the job as described and I'm sure some people will find 
> this feature useful to their database dump needs. However I tried the feature 
> with a case where I have a subquery in the where clause, but it seems to be 
> failing to dump the data. I ran the pg_dump like:
> 
>  $ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" > 
> testdump2
>  $ pg_dump: error: processing of table "public.test1" failed
> 
> both test1 and test2 exist in the database and the same subquery works under 
> psql.
> 
> I also notice that the regression tests for pg_dump is failing due to the 
> patch, I think it is worth looking into the failure messages and also add 
> some test cases on the new "where" clause to ensure that it can cover as many 
> use cases as possible.

As this is being reviewed, but time is running out in this CF, I'm moving this
to the next CF.  The entry will be moved to Waiting for Author based on the
above review.

cheers ./daniel

Re: ModifyTable overheads in generic plans

2020-07-30 Thread Daniel Gustafsson

> On 1 Jul 2020, at 15:38, Amit Langote  wrote:

> Another thing I could do is decouple the patches to discuss here from
> the patches of the other thread, which should be possible and might be
> good to avoid back and forth between the two threads.

It sounds like it would make it easier for reviewers, so if it's possible with
a reasonable effort it might be worth it.  I've moved this entry to the next CF
for now.

cheers ./daniel

Re: Online verification of checksums

2020-07-30 Thread Daniel Gustafsson

> On 5 Jul 2020, at 13:52, Daniel Gustafsson  wrote:
> 
>> On 6 Apr 2020, at 23:15, Michael Banck  wrote:
> 
>> Probably we need to take a step back;
> 
> This patch has been Waiting on Author since the last commitfest (and no longer
> applies as well), and by the sounds of the thread there are some open issues
> with it.  Should it be Returned with Feedback to be re-opened with a fresh 
> take
> on it?

Marked as Returned with Feedback, please open a new entry in case there is a
renewed interest with a new patch.

cheers ./daniel

Re: Online checksums patch - once again

2020-07-30 Thread Daniel Gustafsson

> On 29 Jul 2020, at 19:58, Robert Haas  wrote:

> Here are a bunch of comments based on a partial read-through of this
> patch.

Thanks a lot Robert and Justin for the reviews!  With the commitfest wrap-up
imminent and being on vacation I will have a hard time responding properly
before the end of CF so I'm moving it to the next CF for now.

cheers ./daniel

Re: proposal: type info support functions for functions that use "any" type

2020-07-30 Thread Daniel Gustafsson

> On 26 Jan 2020, at 16:33, Pavel Stehule  wrote:

> I reread all related mails and I think so it should be safe - or there is 
> same risk like using any C extensions for functions or hooks.

This patch has been bumped in CFs for the past year, with the thread stalled
and the last review comment being in support of rejection.  Tom, do you still
feel it should be rejected in light of Pavel's latest posts?

cheers ./daniel

Re: new heapcheck contrib module

2020-07-30 Thread Mark Dilger

> On Jul 30, 2020, at 1:47 PM, Andres Freund  wrote:
> 
> Hi,
> 
> On 2020-07-30 13:18:01 -0700, Mark Dilger wrote:
>> Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax 
>> committed via TransactionIdDidCommit.  I am worried about concurrent 
>> truncation of clog entries causing I/O errors on SLRU lookup when performing 
>> that check.  The three strategies I had for dealing with that were taking 
>> the XactTruncationLock (formerly known as CLogTruncationLock, for those 
>> reading this thread from the beginning), locking out vacuum, and the idea 
>> upthread from Andres about setting PROC_IN_VACUUM and such.  Maybe I'm being 
>> dense and don't need to worry about this.  But I haven't convinced myself of 
>> that, yet.
> 
> I think it's not at all ok to look in the procarray or clog for xids
> that are older than what you're announcing you may read. IOW I don't
> think it's OK to just ignore the problem, or try to work around it by
> holding XactTruncationLock.

The current state of the patch is that concurrent vacuums are kept out of the 
table being checked by means of taking a ShareUpdateExclusive lock on the table 
being checked.  In response to Robert's review, I was contemplating whether 
that was necessary, but you raise the interesting question of whether it is 
even sufficient.  The logic in verify_heapam is currently relying on the 
ShareUpdateExclusive lock to prevent any of the xids in the range 
relfrozenxid..nextFullXid from being invalid arguments to 
TransactionIdDidCommit.  Ignoring whether that is a good choice vis-a-vis 
performance, is that even a valid strategy?  It sounds like you are saying it 
is not.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: new heapcheck contrib module

2020-07-30 Thread Mark Dilger

> On Jul 30, 2020, at 2:00 PM, Robert Haas  wrote:
> 
> On Thu, Jul 30, 2020 at 4:18 PM Mark Dilger
>  wrote:
>>> Maybe I'm just being dense here -- exactly what problem are you worried 
>>> about?
>> 
>> Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax 
>> committed via TransactionIdDidCommit.  I am worried about concurrent 
>> truncation of clog entries causing I/O errors on SLRU lookup when performing 
>> that check.  The three strategies I had for dealing with that were taking 
>> the XactTruncationLock (formerly known as CLogTruncationLock, for those 
>> reading this thread from the beginning), locking out vacuum, and the idea 
>> upthread from Andres about setting PROC_IN_VACUUM and such.  Maybe I'm being 
>> dense and don't need to worry about this.  But I haven't convinced myself of 
>> that, yet.
> 
> I don't get it. If you've already checked that the XIDs are >=
> relfrozenxid and <= ReadNewFullTransactionId(), then this shouldn't be
> a problem. It could be, if CLOG is hosed, which is possible, because
> if the table is corrupted, why shouldn't CLOG also be corrupted? But
> I'm not sure that's what your concern is here.

No, that wasn't my concern.  I was thinking about CLOG entries disappearing 
during the scan as a consequence of concurrent vacuums, and the effect that 
would have on the validity of the cached [relfrozenxid..next_valid_xid] range.  
In the absence of corruption, I don't immediately see how this would cause any 
problems.  But for a corrupt table, I'm less certain how it would play out.

The kind of scenario I'm worried about may not be possible in practice.  I 
think it would depend on how vacuum behaves when scanning a corrupt table that 
is corrupt in some way that vacuum doesn't notice, and whether vacuum could 
finish scanning the table with the false belief that it has frozen all tuples 
with xids less than some cutoff.

I thought it would be safer if that kind of thing were not happening during 
verify_heapam's scan of the table.  Even if a careful analysis proved it was 
not an issue with the current coding of vacuum, I don't think there is any 
coding convention requiring future versions of vacuum to be hardened against 
corruption, so I don't see how I can rely on vacuum not causing such problems.

I don't think this is necessarily a too-rare-to-care-about type concern, 
either.  If corruption across multiple tables prevents autovacuum from 
succeeding, and the DBA doesn't get involved in scanning tables for corruption 
until the lack of successful vacuums impacts the production system, I imagine 
you could end up with vacuums repeatedly happening (or trying to happen) around 
the time the DBA is trying to fix tables, or perhaps drop them, or whatever, 
using verify_heapam for guidance on which tables are corrupted.

Anyway, that's what I was thinking.  I was imagining that calling 
TransactionIdDidCommit might keep crashing the backend while the DBA is trying 
to find and fix corruption, and that could get really annoying.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Batch insert in CTAS/MatView code

2020-07-30 Thread Daniel Gustafsson

In an off-list discussion with Paul, we decided to withdraw this patch for now
and instead create a new entry when there is a re-worked patch.  This has now
been done in the CF app.

cheers ./daniel

Re: Identifying user-created objects

2020-07-30 Thread Daniel Gustafsson

> On 1 Jul 2020, at 14:15, Daniel Gustafsson  wrote:
> 
>> On 4 Mar 2020, at 12:06, Masahiko Sawada  
>> wrote:
> 
>> Attached updated patch that incorporated comments from Amit and Vignesh.
> 
> This patch fails to compile due to an Oid collision in pg_proc.dat.  Please
> submit a new version with an Oid from the recommended range for new patches:
> 8000-.  See the below documentation page for more information on this.
> 
>  https://www.postgresql.org/docs/devel/system-catalog-initial-data.html
> 
> I'm marking the entry Waiting on Author in the meantime.

As no new patch has been presented, and the thread contains doubts over the
proposed functionality, I'm marking this returned with feedback.

cheers ./daniel

Re: psql - improve test coverage from 41% to 88%

2020-07-30 Thread Daniel Gustafsson

> On 5 Jul 2020, at 13:38, Daniel Gustafsson  wrote:
> 
>> On 24 Mar 2020, at 15:47, David Steele  wrote:
> 
>> This patch no longer applies: http://cfbot.cputube.org/patch_27_2262.log
>> 
>> CF entry has been updated to Waiting on Author.
> 
> This patch hasn't been updated and still doesn't apply, do you intend to 
> rebase
> it during this commitfest or should we move it to returned with feedback?  It
> can always be re-opened at a later date.

As the thread has stalled, I've marked this Returned with Feedback.

cheers ./daniel

Re: [Patch] Optimize dropping of relation buffers using dlist

2020-07-30 Thread Konstantin Knizhnik

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

I have tested this patch at various workloads and hardware (including Power2 
server with 384 virtual cores)
and didn't find performance regression.

The new status of this patch is: Ready for Committer

Re: Reducing WaitEventSet syscall churn

2020-07-30 Thread Thomas Munro

On Thu, Jul 30, 2020 at 5:50 PM Thomas Munro  wrote:
> I pushed those three patches, but will wait for more discussion on the rest.

And here's a rebase, to make cfbot happy.
From a760053ac6fea1b9a829e9a170902a555863ce66 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:51:09 +1300
Subject: [PATCH v6 1/4] Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet.

Previously, we'd give either a FATAL message or a silent exit() when
we detected postmaster death, depending on which wait point we were at.
Make the exit more uniform, by using the standard exit facility.  This
will allow a later patch to use FeBeWaitSet in a new location without
having to add more explicit handling for postmaster death, in line with
our policy of reducing such clutter.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c | 28 
 src/backend/libpq/pqcomm.c|  2 +-
 2 files changed, 1 insertion(+), 29 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 2ae507a902..aec0926d93 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -184,28 +184,6 @@ retry:
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1,
 		 WAIT_EVENT_CLIENT_READ);
 
-		/*
-		 * If the postmaster has died, it's not safe to continue running,
-		 * because it is the postmaster's job to kill us if some other backend
-		 * exits uncleanly.  Moreover, we won't run very well in this state;
-		 * helper processes like walwriter and the bgwriter will exit, so
-		 * performance may be poor.  Finally, if we don't exit, pg_ctl will be
-		 * unable to restart the postmaster without manual intervention, so no
-		 * new connections can be accepted.  Exiting clears the deck for a
-		 * postmaster restart.
-		 *
-		 * (Note that we only make this check when we would otherwise sleep on
-		 * our latch.  We might still continue running for a while if the
-		 * postmaster is killed in mid-query, or even through multiple queries
-		 * if we never have to wait for read.  We don't want to burn too many
-		 * cycles checking for this very rare condition, and this should cause
-		 * us to exit quickly in most cases.)
-		 */
-		if (event.events & WL_POSTMASTER_DEATH)
-			ereport(FATAL,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to unexpected postmaster exit")));
-
 		/* Handle interrupt. */
 		if (event.events & WL_LATCH_SET)
 		{
@@ -296,12 +274,6 @@ retry:
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1,
 		 WAIT_EVENT_CLIENT_WRITE);
 
-		/* See comments in secure_read. */
-		if (event.events & WL_POSTMASTER_DEATH)
-			ereport(FATAL,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to unexpected postmaster exit")));
-
 		/* Handle interrupt. */
 		if (event.events & WL_LATCH_SET)
 		{
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index ac986c0505..6e91581c0b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -222,7 +222,7 @@ pq_init(void)
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
-	AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+	AddWaitEventToSet(FeBeWaitSet, WL_EXIT_ON_PM_DEATH, -1, NULL, NULL);
 }
 
 /* 
-- 
2.20.1

From 91c7a5a8a04363274ea0e74a69d89e261d7a0e4f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:48:29 +1300
Subject: [PATCH v6 2/4] Introduce symbolic names for FeBeWaitSet positions.

Previously we used hard coded 0 and 1 to refer to the socket and latch.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c |  4 ++--
 src/backend/libpq/pqcomm.c| 18 +++---
 src/backend/utils/init/miscinit.c |  4 ++--
 src/include/libpq/libpq.h |  3 +++
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index aec0926d93..2d5cfbd6f8 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -179,7 +179,7 @@ retry:
 
 		Assert(waitfor);
 
-		ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL);
 
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1,
 		 WAIT_EVENT_CLIENT_READ);
@@ -269,7 +269,7 @@ retry:
 
 		Assert(waitfor);
 
-		ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL);
 
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1,
 		 WAIT_EVENT_CLIENT_WRITE);
diff --git a/src/backend/libpq/pqcomm.c b/src/backen

Re: Missing CFI in hlCover()?

2020-07-30 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Stephen Frost  writes:
>>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
 We could hard-code a rule like that, or we could introduce a new
 explicit parameter for the maximum cover length.  The latter would be
 more flexible, but we need something back-patchable and I'm concerned
 about the compatibility hazards of adding a new parameter in minor
 releases.  So on the whole I propose hard-wiring a multiplier of,
 say, 10 for both these cases.

>>> That sounds alright to me, though I do think we should probably still
>>> toss a CFI (or two) in this path somewhere as we don't know how long
>>> some of these functions might take...

>> Yeah, of course.  I'm still leaning to doing that in TS_execute_recurse.

> Works for me.

Here's a proposed patch along that line.

regards, tom lane

diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 76b6f9aef0..1df1c0362d 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -2028,8 +2028,10 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
 /*
  * hlCover: try to find a substring of prs' word list that satisfies query
  *
- * At entry, *p must be the first word index to consider (initialize this to
- * zero, or to the next index after a previous successful search).
+ * At entry, *p must be the first word index to consider (initialize this
+ * to zero, or to the next index after a previous successful search).
+ * We will consider all substrings starting at or after that word, and
+ * containing no more than max_cover words.
  *
  * On success, sets *p to first word index and *q to last word index of the
  * cover substring, and returns true.
@@ -2038,7 +2040,8 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
  * words used in the query.
  */
 static bool
-hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
+hlCover(HeadlineParsedText *prs, TSQuery query, int max_cover,
+		int *p, int *q)
 {
 	int			pmin,
 pmax,
@@ -2084,7 +2087,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
 nextpmin = nextpmax;
 			pmax = nextpmax;
 		}
-		while (pmax >= 0);
+		while (pmax >= 0 && pmax - pmin < max_cover);
 		/* No luck here, so try next feasible startpoint */
 		pmin = nextpmin;
 	}
@@ -2186,7 +2189,7 @@ get_next_fragment(HeadlineParsedText *prs, int *startpos, int *endpos,
 static void
 mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
   int shortword, int min_words,
-  int max_words, int max_fragments)
+  int max_words, int max_fragments, int max_cover)
 {
 	int32		poslen,
 curlen,
@@ -2213,7 +2216,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
 	covers = palloc(maxcovers * sizeof(CoverPos));
 
 	/* get all covers */
-	while (hlCover(prs, query, &p, &q))
+	while (hlCover(prs, query, max_cover, &p, &q))
 	{
 		startpos = p;
 		endpos = q;
@@ -2368,7 +2371,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
  */
 static void
 mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
-			  int shortword, int min_words, int max_words)
+			  int shortword, int min_words, int max_words, int max_cover)
 {
 	int			p = 0,
 q = 0;
@@ -2386,7 +2389,7 @@ mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
 	if (!highlightall)
 	{
 		/* examine all covers, select a headline using the best one */
-		while (hlCover(prs, query, &p, &q))
+		while (hlCover(prs, query, max_cover, &p, &q))
 		{
 			/*
 			 * Count words (curlen) and interesting words (poslen) within
@@ -2542,6 +2545,7 @@ prsd_headline(PG_FUNCTION_ARGS)
 	int			shortword = 3;
 	int			max_fragments = 0;
 	bool		highlightall = false;
+	int			max_cover;
 	ListCell   *l;
 
 	/* Extract configuration option values */
@@ -2581,6 +2585,15 @@ prsd_headline(PG_FUNCTION_ARGS)
 			defel->defname)));
 	}
 
+	/*
+	 * We might eventually make max_cover a user-settable parameter, but for
+	 * now, just compute a reasonable value based on max_words and
+	 * max_fragments.
+	 */
+	max_cover = Max(max_words * 10, 100);
+	if (max_fragments > 0)
+		max_cover *= max_fragments;
+
 	/* in HighlightAll mode these parameters are ignored */
 	if (!highlightall)
 	{
@@ -2605,10 +2618,10 @@ prsd_headline(PG_FUNCTION_ARGS)
 	/* Apply appropriate headline selector */
 	if (max_fragments == 0)
 		mark_hl_words(prs, query, highlightall, shortword,
-	  min_words, max_words);
+	  min_words, max_words, max_cover);
 	else
 		mark_hl_fragments(prs, query, highlightall, shortword,
-		  min_words, max_words, max_fragments);
+		  min_words, max_words, max_fragments, max_cover);
 
 	/* Fill in default values for string options */
 	if (!prs->startsel)
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index f01b1ee253..756a48a167

Re: OpenSSL randomness seeding

2020-07-30 Thread Daniel Gustafsson

> On 26 Jul 2020, at 09:06, Michael Paquier  wrote:
> 
> On Wed, Jul 22, 2020 at 11:31:38PM +0200, Daniel Gustafsson wrote:
>> Thanks for picking it up!
> 
> For the archives, the patch set has been applied as ce4939f and
> 15e4419 on HEAD.  Thanks, Noah.

Indeed, thanks!

>>> Do you happen to know how OpenSSL 1.1.1 changed its reaction to forks in 
>>> order
>>> to make the RAND_poll() superfluous?  (No need to research it if you don't.)
>> 
>> I'm not entirely sure, but I do believe that 1.1.1 ported over the RNG from 
>> the
>> FIPS module which re-seeds itself with fork() protection.  There was however 
>> a
>> bug, fixed in 1.1.1d or thereabouts as CVE-2019-1549, where the fork 
>> protection
>> wasn't activated by default..  so there is that.  Since that bug was found,
>> there has been tests introduced to catch any regression on that which is
>> comforting.
> 
> No idea about this one actually.

I did some more reading and AFAICT it won't be required in 1.1.1+, but it also
won't cause any harm so unless evidence of the latter emerge we may just as
well leave it as an extra safeguard.

Somewhat on topic though, 1.1.1 adds a RAND_priv_bytes function for random
numbers that are supposed to be private and extra protected via it's own DRBG.
Maybe we should use that for SCRAM salts etc in case we detect 1.1.1?

cheers ./daniel

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-07-30 Thread Daniel Gustafsson

> On 30 Jul 2020, at 03:28, Michael Paquier  wrote:
> On Wed, Jul 29, 2020 at 11:34:07PM +0200, Daniel Gustafsson wrote:

>> Fair enough, let's break out pg_depend and I'll have another go at that.
> 
> Thanks.  Attached is a polished version of the patch that I intend to
> commit for pg_attribute and pg_shdepend.  Let's look again at
> pg_depend later, as there are also links with the handling of
> dependencies for ALTER TABLE mainly.

Looks good, thanks.  Let's close the CF entry with this and open a new for the
pg_depend part when that's done.

cheers ./daniel

Re: new heapcheck contrib module

2020-07-30 Thread Robert Haas

On Thu, Jul 30, 2020 at 4:18 PM Mark Dilger
 wrote:
> > Maybe I'm just being dense here -- exactly what problem are you worried 
> > about?
>
> Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax 
> committed via TransactionIdDidCommit.  I am worried about concurrent 
> truncation of clog entries causing I/O errors on SLRU lookup when performing 
> that check.  The three strategies I had for dealing with that were taking the 
> XactTruncationLock (formerly known as CLogTruncationLock, for those reading 
> this thread from the beginning), locking out vacuum, and the idea upthread 
> from Andres about setting PROC_IN_VACUUM and such.  Maybe I'm being dense and 
> don't need to worry about this.  But I haven't convinced myself of that, yet.

I don't get it. If you've already checked that the XIDs are >=
relfrozenxid and <= ReadNewFullTransactionId(), then this shouldn't be
a problem. It could be, if CLOG is hosed, which is possible, because
if the table is corrupted, why shouldn't CLOG also be corrupted? But
I'm not sure that's what your concern is here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [Proposal] Global temporary tables

2020-07-30 Thread Robert Haas

On Thu, Jul 30, 2020 at 8:09 AM wenjing zeng  wrote:
> Please continue to review the code.

This patch is pretty light on comments. Many of the new functions have
no header comments, for example. There are comments here and there in
the body of the new functions that are added, and in places where
existing code is changed there are comments here and there, but
overall it's not a whole lot. There's no documentation and no README,
either. Since this adds a new feature and a bunch of new SQL-callable
functions that interact with that feature, the feature itself should
be documented, along with its limitations and the new SQL-callable
functions that interact with it. I think there should be either a
lengthy comment in some suitable file, or maybe various comments in
various files, or else a README file, that clearly sets out the major
design principles behind the patch, and explaining also what that
means in terms of features and limitations. Without that, it's really
hard for anyone to jump into reviewing this code, and it will be hard
for people who have to maintain it in the future to understand it,
either. Or for users, for that matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: new heapcheck contrib module

2020-07-30 Thread Andres Freund

Hi,

On 2020-07-30 13:18:01 -0700, Mark Dilger wrote:
> Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax 
> committed via TransactionIdDidCommit.  I am worried about concurrent 
> truncation of clog entries causing I/O errors on SLRU lookup when performing 
> that check.  The three strategies I had for dealing with that were taking the 
> XactTruncationLock (formerly known as CLogTruncationLock, for those reading 
> this thread from the beginning), locking out vacuum, and the idea upthread 
> from Andres about setting PROC_IN_VACUUM and such.  Maybe I'm being dense and 
> don't need to worry about this.  But I haven't convinced myself of that, yet.

I think it's not at all ok to look in the procarray or clog for xids
that are older than what you're announcing you may read. IOW I don't
think it's OK to just ignore the problem, or try to work around it by
holding XactTruncationLock.

Greetings,

Andres Freund

Re: new heapcheck contrib module

2020-07-30 Thread Mark Dilger




> On Jul 29, 2020, at 12:52 PM, Robert Haas  wrote:
> 
> On Mon, Jul 20, 2020 at 5:02 PM Mark Dilger
>  wrote:
>> I've made the options 'all-visible', 'all-frozen', and 'none'.  It defaults 
>> to 'none'.
> 
> That looks nice.
> 
>>> I guess that
>>> could still be expensive if there's a lot of them, but needing
>>> ShareUpdateExclusiveLock rather than only AccessShareLock is a little
>>> unfortunate.
>> 
>> I welcome strategies that would allow for taking a lesser lock.
> 
> I guess I'm not seeing why you need any particular strategy here. Say
> that at the beginning you note the starting relfrozenxid of the table
> -- I think I would lean toward just ignoring datfrozenxid and the
> cluster-wide value completely. You also note the current value of the
> transaction ID counter. Those are the two ends of the acceptable
> range.
> 
> Let's first consider the oldest acceptable XID, bounded by
> relfrozenxid. If you see a value that is older than the relfrozenxid
> value that you noted at the start, it is definitely invalid. If you
> see a newer value, it could still be older than the table's current
> relfrozenxid, but that doesn't seem very worrisome. If the user
> vacuumed the table while they were running this tool, they can always
> run the tool again afterward if they wish. Forcing the vacuum to wait
> by taking ShareUpdateExclusiveLock doesn't actually solve anything
> anyway: you STILL won't notice any problems the vacuum introduces, and
> in fact you are now GUARANTEED not to notice them, plus now the vacuum
> happens later.
> 
> Now let's consider the newest acceptable XID, bounded by the value of
> the transaction ID counter. Any time you see a newer XID than the last
> value of the transaction ID counter that you observed, you go observe
> it again. If the value from the table still looks invalid, then you
> complain about it. Either way, you remember the new observation and
> check future tuples against that value. I think the patch is already
> doing this anyway; if it weren't, you'd need an even stronger lock,
> one sufficient to prevent any insert/update/delete activity on the
> table altogether.
> 
> Maybe I'm just being dense here -- exactly what problem are you worried about?

Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax 
committed via TransactionIdDidCommit.  I am worried about concurrent truncation 
of clog entries causing I/O errors on SLRU lookup when performing that check.  
The three strategies I had for dealing with that were taking the 
XactTruncationLock (formerly known as CLogTruncationLock, for those reading 
this thread from the beginning), locking out vacuum, and the idea upthread from 
Andres about setting PROC_IN_VACUUM and such.  Maybe I'm being dense and don't 
need to worry about this.  But I haven't convinced myself of that, yet.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Threading in BGWorkers (!)

2020-07-30 Thread Robert Haas

On Thu, Jul 30, 2020 at 2:55 PM Tom Lane  wrote:
> > Why not? Our answer to threading inside functions has been, for quite a
> > while, that it's kinda ok if the threads never call into postgres and
> > can never escape the lifetime of a function. But that's not actually
> > really safe due to the signal handler issue.
>
> In other words, it's *not* safe and never has been.  I see no good reason
> to believe that the signal handler issue is the only one.  Even if it is,
> not being able to call any postgres infrastructure is a pretty huge
> handicap.

I find this line of argument really unfair. It's true that there might
be issues other than the signal handler one, but so what? That is not
a principled argument against fixing the signal handler part of the
problem, which is the only *known* problem with the use case Andres
describes. It is also true that it would be more useful to enable
additional use cases rather than just this one, but that is not a
principled argument against enabling this one.

My only present concern about the proposal actually in front of us --
that is to say, use pthread_sigmask() rather than sigprocmask() -- is
Thomas's observation that on his system doing so breaks the world.
That seems to be quite a serious problem. If we are deciding whether
to use one function or another some purpose and they are equally good
for the core code but one is better for people who want to play around
with extensions, we may as well use the one that's better for that
purpose. We need not give such experimentation our unqualified
endorsement; we need only decide against obstructing it unnecessarily.
But when such a substitution risks breaking things that work today,
the calculus gets a lot more complicated. Unless we can find a way to
avoid that risk, I don't think this is a good trade-off.

But more broadly I think it is well past time that we look into making
the backend more thread-friendly. The fact that "it's *not* safe and
never has been" has not prevented people from doing it. We don't end
up with people going "oh, well sigprocmask uh oh so I better give up
now." What we end up with is people just going right ahead and doing
it, probably without even thinking about sigprocmask, and ending up
with low-probability failure conditions, which seems likely to hurt
PostgreSQL's reputation for reliability with no compensating benefit.
Or alternatively they hack core, which sucks, or they switch to some
non-PG database, which also sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Threading in BGWorkers (!)

2020-07-30 Thread Tom Lane

Andres Freund  writes:
> On 2020-07-28 21:52:20 -0400, Tom Lane wrote:
>> TBH, though, I do not buy this argument for a millisecond.  I don't
>> think that anything is going to come out of multithreading a bgworker
>> but blood and tears.  Perhaps someday we'll make a major push to
>> make the backend code (somewhat(?)) thread safe ... but I'm not on
>> board with making one-line-at-a-time changes in hopes of getting
>> partway there.  We need some kind of concrete plan for what is a
>> usable amount of functionality and what has to be done to get it.

> Why not? Our answer to threading inside functions has been, for quite a
> while, that it's kinda ok if the threads never call into postgres and
> can never escape the lifetime of a function. But that's not actually
> really safe due to the signal handler issue.

In other words, it's *not* safe and never has been.  I see no good reason
to believe that the signal handler issue is the only one.  Even if it is,
not being able to call any postgres infrastructure is a pretty huge
handicap.

So I stand by the position that we need an actual plan here, not just
chipping away at one-liner things that might or might not improve
matters noticeably.

regards, tom lane

Re: Threading in BGWorkers (!)

2020-07-30 Thread Andres Freund

Hi,

On 2020-07-28 21:52:20 -0400, Tom Lane wrote:
> >> The rationale here is that as far as I can tell this is the *only* blocker 
> >> to using multithreaded code in a BGWorker which can't be avoided by 
> >> adhering to strict code rules (eg: no PG calls from non-main threads, no 
> >> interaction with signals from non-main threads).
> 
> TBH, though, I do not buy this argument for a millisecond.  I don't
> think that anything is going to come out of multithreading a bgworker
> but blood and tears.  Perhaps someday we'll make a major push to
> make the backend code (somewhat(?)) thread safe ... but I'm not on
> board with making one-line-at-a-time changes in hopes of getting
> partway there.  We need some kind of concrete plan for what is a
> usable amount of functionality and what has to be done to get it.

Why not? Our answer to threading inside functions has been, for quite a
while, that it's kinda ok if the threads never call into postgres and
can never escape the lifetime of a function. But that's not actually
really safe due to the signal handler issue. Whether it's a normal
backend or a bgworker doesn't really play a role here, no?

Greetings,

Andres Freund

Re: Threading in BGWorkers (!)

2020-07-30 Thread Andres Freund

Hi,

On 2020-07-29 13:41:02 +1200, Thomas Munro wrote:
> One practical problem with this change is that some systems have a
> stub definition of pthread_sigmask() that does nothing, when you don't
> link against the threading library.  Realistically, most *useful*
> builds of PostgreSQL bring it in indirectly (via SSL, LDAP, blah
> blah), but it so happens that a bare bones build and make check on
> this here FreeBSD box hangs in CHECK DATABASE waiting for the
> checkpointer to signal it.  I can fix that by putting -lthr into
> LDFLAGS, so we'd probably have to figure out how to do that on our
> supported systems.

Couldn't this be driven by --disable-thread-safety?

Greetings,

Andres Freund

Re: HyperLogLog.c and pg_leftmost_one_pos32()

2020-07-30 Thread Jeff Davis

On Thu, 2020-07-30 at 19:16 +0200, Tomas Vondra wrote:
> > Essentially:
> >  initHyperLogLog(&hll, 5)
> >  for i in 0 .. one billion
> >addHyperLogLog(&hll, hash(i))
> >  estimateHyperLogLog
> > 
> > The numbers are the same regardless of bwidth.
> > 
> > Before my patch, it takes about 15.6s. After my patch, it takes
> > about
> > 6.6s, so it's more than a 2X speedup (including the hash
> > calculation).
> > 
> 
> Wow. That's a huge improvements.

To be clear: the 2X+ speedup was on the tight loop test.

> How does the whole test (data + query) look like? Is it particularly
> rare / special case, or something reasonable to expect in practice?

The whole-query test was:

config:
  shared_buffers=8GB
  jit = off
  max_parallel_workers_per_gather=0

setup:
  create table t_1m_20(i int);
  vacuum (freeze, analyze) t_1m_20;
  insert into t_1m_20 select (random()*100)::int4
from generate_series(1,2000);

query:
  set work_mem='2048kB';
  SELECT pg_prewarm('t_1m_20', 'buffer');

  -- median of the three runs
  select distinct i from t_1m_20 offset 1000;
  select distinct i from t_1m_20 offset 1000;
  select distinct i
from t_1m_20 offset 1000;

results:
  f2130e77 (before using HLL):  6787ms 
  f1af75c5 (before my recent commit):   7170ms
  fd734f38 (master now):6990ms

My previous results before I committed the patch (and therefore not on
the same exact SHA1s) were 6812, 7158, and 6898. So my most recent
batch of results is slightly worse, but the most recent commit
(fd734f38) still does show an improvement of a couple percent.

Regards,
Jeff Davis

Re: new heapcheck contrib module

2020-07-30 Thread Robert Haas

On Mon, Jul 27, 2020 at 1:02 PM Mark Dilger
 wrote:
> Not at all!  I appreciate all the reviews.

Reviewing 0002, reading through verify_heapam.c:

+typedef enum SkipPages
+{
+ SKIP_ALL_FROZEN_PAGES,
+ SKIP_ALL_VISIBLE_PAGES,
+ SKIP_PAGES_NONE
+} SkipPages;

This looks inconsistent. Maybe just start them all with SKIP_PAGES_.

+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));

This doesn't look much like other error messages in the code. Do
something like git grep -A4 PG_ARGISNULL | grep -A3 ereport and study
the comparables.

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all-visible', 'all-frozen', or 'none'")));

Same problem. Check pg_prewarm's handling of the prewarm type, or
EXPLAIN's handling of the FORMAT option, or similar examples. Read the
message style guidelines concerning punctuation of hint and detail
messages.

+ * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572)
+ * to have sometimes rendered the oldest xid value for a database invalid.
+ * It seems unwise to report rows as corrupt for failing to be newer than
+ * a value which itself may be corrupt.  We instead use the oldest xid for
+ * the entire cluster, which must be at least as old as the oldest xid for
+ * our database.

This kind of reference to another comment will not age well; line
numbers and files change a lot. But I think the right thing to do here
is just rely on relfrozenxid and relminmxid. If the table is
inconsistent with those, then something needs fixing. datfrozenxid and
the cluster-wide value can look out for themselves. The corruption
detector shouldn't be trying to work around any bugs in setting
relfrozenxid itself; such problems are arguably precisely what we're
here to find.

+/*
+ * confess
+ *
+ *   Return a message about corruption, including information
+ *   about where in the relation the corruption was found.
+ *
+ *   The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext *ctx, char *msg)

Contrary to what the comments say, the function doesn't return a
message about corruption or anything else. It returns void.

I don't really like the name, either. I get that it's probably
inspired by Perl, but I think it should be given a less-clever name
like report_corruption() or something.

+ * corrupted table from using workmem worth of memory building up the

This kind of thing destroys grep-ability. If you're going to refer to
work_mem, you gotta spell it the same way we do everywhere else.

+ * Helper function to construct the TupleDesc needed by verify_heapam.

Instead of saying it's the TupleDesc somebody needs, how about saying
that it's the TupleDesc that we'll use to report problems that we find
while scanning the heap, or something like that?

+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past.  Stores the inferred FullTransactionId in *fxid.

It really doesn't, because there's no such thing as 'fxid' referenced
anywhere here. You should really make the effort to proofread your
patches before posting, and adjust comments and so on as you go.
Otherwise reviewing takes longer, and if you keep introducing new
stuff like this as you fix other stuff, you can fail to ever produce a
committable patch.

+ * Determine whether tuples are visible for verification.  Similar to
+ *  HeapTupleSatisfiesVacuum, but with critical differences.

Not accurate, because it also reports problems, which is not mentioned
anywhere in the function header comment that purports to be a detailed
description of what the function does.

+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return true; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }

One of these cases is not punctuated like the others.

+ pstrdup("heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor
has a valid xmax"));

1. I don't think that's very grammatical.

2. Why abbreviate HEAP_XMAX_IS_MULTI to XMAX_IS_MULTI and
HEAP_XMAX_IS_LOCKED_ONLY to LOCKED_ONLY? I don't even think you should
be referencing C constant names here at all, and if you are I don't
think you should abbreviate, and if you do abbreviate I don't think
you should omit different numbers of words depending on which constant
it is.

I wonder what the intended division of responsibility is here,
exactly. It seems like you've ended up with some sanity checks in
check_tuple() before tuple_is_visible() is called, and others in
tuple_is_visible() proper. As far as I can see the comments don't
really discuss the logic behind the split, but there's clearly a close
relationship between the two sets o

Re: PG 13 release notes, first draft

2020-07-30 Thread Peter Geoghegan

On Thu, Jul 30, 2020 at 10:32 AM Bruce Momjian  wrote:
> I came up with a more verbose documentation suggestion, attached.

I'm okay with this.

Thanks
-- 
Peter Geoghegan

Re: PG 13 release notes, first draft

2020-07-30 Thread Bruce Momjian

On Wed, Jul 29, 2020 at 08:43:24PM -0700, David G. Johnston wrote:
> On Wednesday, July 29, 2020, Peter Geoghegan  wrote:
> 
> On Wed, Jul 29, 2020 at 6:30 PM Bruce Momjian  wrote:
> > > There should be a note about this in the Postgres 13 release notes,
> > > for the usual reasons. More importantly, the "Allow hash aggregation
> > > to use disk storage for large aggregation result sets" feature should
> > > reference the new GUC directly. Users should be advised that the GUC
> > > may be useful in cases where they upgrade and experience a performance
> > > regression linked to slower hash aggregation. Just including a
> > > documentation link for the GUC would be very helpful.
> >
> > I came up with the attached patch.
> 
> I was thinking something along like the following (after the existing
> sentence about avoiding hash aggs in the planner):
> 
> If you find that hash aggregation is slower than in previous major
> releases of PostgreSQL, it may be useful to increase the value of
> hash_mem_multiplier. This allows hash aggregation to use more memory
> without affecting competing query operations that are generally less
> likely to put any additional memory to good use.

I came up with a more verbose documentation suggestion, attached.

> How about adding wording for GROUP BY as well to cater to users who are more
> comfortable thinking in terms of SQL statements instead of execution plans?

Uh, it is unclear exactly what SQL generates what node types, so I want
to avoid this.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/doc/src/sgml/release-13.sgml b/doc/src/sgml/release-13.sgml
new file mode 100644
index 18e6497..0b1051b
*** a/doc/src/sgml/release-13.sgml
--- b/doc/src/sgml/release-13.sgml
*** Author: Jeff Davis 
  Previously, hash aggregation was avoided if it was expected to use
! more than  memory.
 

  
--- 649,658 
  
 
  Previously, hash aggregation was avoided if it was expected to use
! more than  memory.  To reduce the
! likelihood of using disk storage for hash aggregation and attain
! behavior similar to previous Postgres releases, increase .

Re: HyperLogLog.c and pg_leftmost_one_pos32()

2020-07-30 Thread Tomas Vondra


On Thu, Jul 30, 2020 at 09:21:23AM -0700, Jeff Davis wrote:

On Wed, 2020-07-29 at 17:32 -0700, Peter Geoghegan wrote:

How did you test this? What kind of difference are we talking about?


Essentially:
 initHyperLogLog(&hll, 5)
 for i in 0 .. one billion
   addHyperLogLog(&hll, hash(i))
 estimateHyperLogLog

The numbers are the same regardless of bwidth.

Before my patch, it takes about 15.6s. After my patch, it takes about
6.6s, so it's more than a 2X speedup (including the hash calculation).



Wow. That's a huge improvements.

How does the whole test (data + query) look like? Is it particularly
rare / special case, or something reasonable to expect in practice?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: HyperLogLog.c and pg_leftmost_one_pos32()

2020-07-30 Thread Jeff Davis

On Wed, 2020-07-29 at 17:32 -0700, Peter Geoghegan wrote:
> How did you test this? What kind of difference are we talking about?

Essentially:
  initHyperLogLog(&hll, 5)
  for i in 0 .. one billion
addHyperLogLog(&hll, hash(i))
  estimateHyperLogLog

The numbers are the same regardless of bwidth.

Before my patch, it takes about 15.6s. After my patch, it takes about
6.6s, so it's more than a 2X speedup (including the hash calculation).

As a part of HashAgg, when I test the worst case, it goes from a 4-5%
penalty to ~1% (within noise).

> I think that you should change back the rhs() variable names to match
> HyperLogLog upstream (as well as the existing rhs() comments).

Done.

> > I think it's overall an improvement, but
> > addHyperLogLog() itself seemed to show up as a cost, so it can hurt
> > spilled-but-still-in-memory cases. I'd also like to backpatch this
> > to
> > 13 (as I already did for 9878b643), if that's acceptable.
> 
> I still wonder if it was ever necessary to add HLL to abbreviated
> keys. It only served to avoid some pretty narrow worse cases, at the
> expense of typical cases. Given that the existing users of HLL are
> pretty narrow, and given the importance of preserving the favorable
> performance characteristics of hash aggregate, I'm inclined to agree
> that it's worth backpatching to 13 now. Assuming it is a really
> measurable cost in practice.

Yes, the difference (at least in a tight loop, on my machine) is not
subtle. I went ahead and committed and backpatched.

Regards,
Jeff Davis

Re: Missing CFI in hlCover()?

2020-07-30 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> We could hard-code a rule like that, or we could introduce a new
> >> explicit parameter for the maximum cover length.  The latter would be
> >> more flexible, but we need something back-patchable and I'm concerned
> >> about the compatibility hazards of adding a new parameter in minor
> >> releases.  So on the whole I propose hard-wiring a multiplier of,
> >> say, 10 for both these cases.
> 
> > That sounds alright to me, though I do think we should probably still
> > toss a CFI (or two) in this path somewhere as we don't know how long
> > some of these functions might take...
> 
> Yeah, of course.  I'm still leaning to doing that in TS_execute_recurse.

Works for me.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Missing CFI in hlCover()?

2020-07-30 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> We could hard-code a rule like that, or we could introduce a new
>> explicit parameter for the maximum cover length.  The latter would be
>> more flexible, but we need something back-patchable and I'm concerned
>> about the compatibility hazards of adding a new parameter in minor
>> releases.  So on the whole I propose hard-wiring a multiplier of,
>> say, 10 for both these cases.

> That sounds alright to me, though I do think we should probably still
> toss a CFI (or two) in this path somewhere as we don't know how long
> some of these functions might take...

Yeah, of course.  I'm still leaning to doing that in TS_execute_recurse.

regards, tom lane

Re: Missing CFI in hlCover()?

2020-07-30 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> We could hard-code a rule like that, or we could introduce a new
> explicit parameter for the maximum cover length.  The latter would be
> more flexible, but we need something back-patchable and I'm concerned
> about the compatibility hazards of adding a new parameter in minor
> releases.  So on the whole I propose hard-wiring a multiplier of,
> say, 10 for both these cases.

That sounds alright to me, though I do think we should probably still
toss a CFI (or two) in this path somewhere as we don't know how long
some of these functions might take...

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: [BUG] Error in BRIN summarization

2020-07-30 Thread Anastasia Lubennikova


On 27.07.2020 20:25, Alvaro Herrera wrote:

On 2020-Jul-27, Anastasia Lubennikova wrote:


Here is the updated version of the fix.
The problem can be reproduced on all supported versions, so I suggest to
backpatch it.
Code slightly changed in v12, so here are two patches: one for versions 9.5
to 11 and another for versions from 12 to master.



(I was also considering whether it needs to be a loop to reobtain root
tuples, in case a concurrent transaction can create a new item while
we're checking that item; but I don't think that can really happen for
one individual tuple.)
I don't think we need a recheck for a single tuple, because we only care 
about finding its root, which simply must exist somewhere on this page, 
as concurrent pruning is not allowed. We also may catch root_offsets[] 
for subsequent tuples, but it's okay if we don't. These tuples will do 
the same recheck on their turn.



While testing this fix, Alexander Lakhin spotted another problem. I 
simplified  the test case to this:


1) prepare a table with brin index

create table tbl (iint, bchar(1000)) with (fillfactor=10);
insert into tbl select i, md5(i::text) from generate_series(0,1000) as i;
create index idx on tbl using brin(i, b) with (pages_per_range=1);

2) run brin_desummarize_range() in a loop:

echo "-- desummarize all ranges
 SELECT FROM generate_series(0, pg_relation_size('tbl')/8192 - 1) as i, lateral 
(SELECT brin_desummarize_range('idx', i)) as d;
-- summarize them back
VACUUM tbl" > brin_desum_test.sql

pgbench postgres -f  brin_desum_test.sql -n -T 120


3) run a search that invokes bringetbitmap:

 set enable_seqscan to off;
 explain analyze select * from tbl where i>10 and i<100; \watch 1

After a few runs, it will fail with "ERROR: corrupted BRIN index: 
inconsistent range map"


The problem is caused by a race in page locking in 
brinGetTupleForHeapBlock [1]:


(1) bitmapsan locks revmap->rm_currBuf and finds the address of the 
tuple on a regular page "page", then unlocks revmap->rm_currBuf
(2) in another transaction desummarize locks both revmap->rm_currBuf and 
"page", cleans up the tuple and unlocks both buffers
(1) bitmapscan locks buffer, containing "page", attempts to access the 
tuple and fails to find it



At first, I tried to fix it by holding the lock on revmap->rm_currBuf 
until we locked the regular page, but it causes a deadlock with 
brinsummarize(), It can be easily reproduced with the same test as above.
Is there any rule about the order of locking revmap and regular pages in 
brin? I haven't found anything in README.


As an alternative, we can leave locks as is and add a recheck, before 
throwing an error.


What do you think?

[1] 
https://github.com/postgres/postgres/blob/master/src/backend/access/brin/brin_revmap.c#L269


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: INSERT INTO SELECT, Why Parallelism is not selected?

2020-07-30 Thread Amit Kapila

On Thu, Jul 30, 2020 at 12:02 PM Amit Kapila  wrote:
>
> On Wed, Jul 29, 2020 at 7:18 PM Robert Haas  wrote:
> >
> > I still don't agree with this as proposed.
> >
> > + * For now, we don't allow parallel inserts of any form not even where the
> > + * leader can perform the insert.  This restriction can be uplifted once
> > + * we allow the planner to generate parallel plans for inserts.  We can
> >
> > If I'm understanding this correctly, this logic is completely
> > backwards. We don't prohibit inserts here because we know the planner
> > can't generate them. We prohibit inserts here because, if the planner
> > somehow did generate them, it wouldn't be safe. You're saying that
> > it's not allowed because we don't try to do it yet, but actually it's
> > not allowed because we want to make sure that we don't accidentally
> > try to do it. That's very different.
> >
>
> Right, so how about something like: "To allow parallel inserts, we
> need to ensure that they are safe to be performed in workers.  We have
> the infrastructure to allow parallel inserts in general except for the
> case where inserts generate a new commandid (eg. inserts into a table
> having a foreign key column)."  We can extend this for tuple locking
> if required as per the below discussion.  Kindly suggest if you prefer
> a different wording here.
>
> >
> > + * We should be able to parallelize
> > + * the later case if we can ensure that no two parallel processes can ever
> > + * operate on the same page.
> >
> > I don't know whether this is talking about two processes operating on
> > the same page at the same time, or ever within a single query
> > execution. If it's the former, perhaps we need to explain why that's a
> > concern for parallel query but not otherwise;
> >
>
> I am talking about the former case and I know that as per current
> design it is not possible that two worker processes try to operate on
> the same page but I was trying to be pessimistic so that we can ensure
> that via some form of Assert.
>

I think the two worker processes can operate on the same page for a
parallel index scan case but it won't be for same tuple. I am not able
to think of any case where we should be worried about tuple locking
for Insert's case, so we can probably skip writing anything about it
unless someone else can think of such a case.


-- 
With Regards,
Amit Kapila.

Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

2020-07-30 Thread Robert Haas

On Mon, Jul 20, 2020 at 3:47 PM Robert Haas  wrote:
> But I also agree that what pg_start_backup() was doing before v13 was
> wrong; that's why I committed
> 303640199d0436c5e7acdf50b837a027b5726594. The only reason I didn't
> back-patch it is because the consequences are so minor I didn't think
> it was worth worrying about. We could, though. I'd be somewhat
> inclined to both do that and also change LLVM to use on_proc_exit() in
> master, but I don't feel super-strongly about it.

Unless somebody complains pretty soon, I'm going to go ahead and do
what is described above.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

windows config.pl question

2020-07-30 Thread Dmitry Markman

Hi

I’d like to add icu/openssl support to my postgresql build on windows

documentation says that I have to modify config.pl file, however it's not clear 
what exactly I have to do

config-default.pl for example has the following line

 icu   => undef,# --with-icu=

so, if I want to add icu support what exactly should I do?

should I replace undef with path?


 icu   => ,

is it correct?

if it’s correct does build support UNC paths?

thanks in advance

Dimitry Markman
Dmitry Markman

Re: Doc patch: mention indexes in pg_inherits docs

2020-07-30 Thread Dagfinn Ilmari Mannsåker

Michael Paquier  writes:

> On Wed, Jul 29, 2020 at 03:06:58PM +0900, Michael Paquier wrote:
>> This is actually a bug fix, because we include in the docs some
>> incorrect information, so it should be backpatched.  If there are no
>> objections, I'll take care of that.
>
> And done.

Thanks!

- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-30 Thread Amit Kapila

On Thu, Jul 30, 2020 at 12:28 PM Ajin Cherian  wrote:
>
> I was running some tests on this patch. I was generally trying to see how the 
> patch affects logical replication when doing bulk inserts. This issue has 
> been raised in the past, for eg: this [1].
> My test setup is:
> 1. Two postgres servers running - A and B
> 2. Create a pgbench setup on A. (pgbench -i -s 5 postgres)
> 3. replicate the 3 tables (schema only) on B.
> 4. Three publishers on A for the 3 tables of pgbench; pgbench_accounts, 
> pgbench_branches and pgbench_tellers;
> 5. Three subscribers on B for the same tables. (streaming on and off based on 
> the scenarios described below)
>
> run pgbench with : pgbench -c 4 -T 100 postgres
> While pgbench is running, Do a bulk insert on some other table not in the 
> publication list (say t1); INSERT INTO t1 (select i FROM 
> generate_series(1,1000) i);
>
> Four scenarios:
> 1. Pgbench with logical replication enabled without bulk insert
> Avg TPS (out of 10 runs): 641 TPS
> 2.Pgbench without logical replication enabled with bulk insert (no pub/sub)
> Avg TPS (out of 10 runs): 665 TPS
> 3, Pgbench with logical replication enabled with bulk insert
> Avg TPS (out of 10 runs): 278 TPS
> 4. Pgbench with logical replication streaming on with bulk insert
> Avg TPS (out of 10 runs): 440 TPS
>
> As you can see, the bulk inserts, although on a totally unaffected table, 
> does impact the TPS. But what is good is that, enabling streaming improves 
> the TPS (about 58% improvement)
>

Thanks for doing these tests, it is a good win and probably the reason
is that after patch we won't serialize such big transactions (as shown
in Konstantin's email [1]) and they will be simply skipped.
Basically, it will try to stream such transactions and will skip them
as they are not required to be sent.

[1] - 
https://www.postgresql.org/message-id/5f5143cc-9f73-3909-3ef7-d3895cc6cc90%40postgrespro.ru

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Tab completion for VACUUM of partitioned tables

2020-07-30 Thread Michael Paquier

On Thu, Jul 30, 2020 at 02:16:04PM +0900, Masahiko Sawada wrote:
> Yeah, that's the reason why I mentioned about toast tables. "VACUUM
> " displays "pg_toast." but, "VACUUM pg_to" doesn't
> complement anything.

So am I OK with this situation.  The same is true as well for CLUSTER
and TRUNCATE, and "pg_to" would get completion with the toast tables
only if we begin to add RELKIND_TOASTVALUE to the queries.  Note that
the schema completions come from _complete_from_query() where we would
need to be smarter regarding the filtering of pg_namespace rows and I
have not looked how to do that, but I feel that it may not be that
complicated.

For now I have applied the proposed patch.
--
Michael

signature.asc
Description: PGP signature

Re: Fix minor source code comment mistake

2020-07-30 Thread Michael Paquier

On Thu, Jul 30, 2020 at 08:03:09AM +, k.jami...@fujitsu.com wrote:
> Just found a minor error in source code comment.
> src/include/executor/instrument.h
> 
> Attached is the fix.
> 
> - longlocal_blks_dirtied; /* # of shared blocks dirtied */
> + longlocal_blks_dirtied; /* # of local blocks dirtied */

Indeed.  Let's fix this.
--
Michael


signature.asc
Description: PGP signature

Re: problem with RETURNING and update row movement

2020-07-30 Thread Etsuro Fujita

On Wed, Jul 22, 2020 at 3:16 PM Amit Langote  wrote:
> On Mon, Jul 20, 2020 at 8:35 PM Amit Kapila  wrote:
> > IIUC, here the problem is related to below part of code:
> > ExecInsert(..)
> > {
> > /* Process RETURNING if present */
> > if (resultRelInfo->ri_projectReturning)
> > result = ExecProcessReturning(resultRelInfo, slot, planSlot);
> > ..
> > }
> >
> > The reason is that planSlot is for foo1 and slot is for foo2 and when
> > it tries to access tuple during ExecProcessReturning(), it results in
> > an error.  Is my understanding correct?
>
> Yes.  Specifically, the problem exists if there are any non-target
> relation attributes in RETURNING which are computed by referring to
> planSlot, the plan's output tuple, which may be shaped differently
> among result relations due to their tuple descriptors being different.
>
> > If so, then can't we ensure
> > someway that planSlot also belongs to foo2 instead of skipping return
> > processing in Insert and then later do more work to perform in Update.
>
> I did consider that option but failed to see a way to make it work.
>
> I am not sure if there is a way to make a copy of the plan's output
> tuple (planSlot) that is compatible with the destination partition.
> Simple conversion using execute_attr_map_slot() is okay when we know
> the source and the target slots contain relation tuples, but plan's
> output tuples will also have other junk attributes.  Also, not all
> destination partitions have an associated plan and hence a slot to
> hold plan tuples.

Yeah, I think it might be possible to create planSlot to pass to
ExecInsert() so that we can process RETURNING within that function,
but even if so, that would be cumbersome not only because partitions
can have different rowtypes but because they can have different junk
columns as well, because e.g., subplan foreign partitions may have
different row ID columns as junk columns.  The proposed patch is
simple, so I would vote for it.  (Note: in case of a foreign
partition, we call ExecForeignInsert() with the source partition’s
planSlot in ExecInsert(), which is not correct, but that would sbe OK
because it seems unlikely that the FDW would look at the planSlot for
INSERT.)

One thing I noticed is that the patch changes the existing behavior.
Here is an example:

create table range_parted (a text, b bigint) partition by range (a, b);
create table part_a_1_a_10 partition of range_parted for values from
('a', 1) to ('a', 10);
create table part_b_1_b_10 partition of range_parted for values from
('b', 1) to ('b', 10);
create function trigfunc() returns trigger as $$ begin return null;
end; $$ language plpgsql;
create trigger trig before insert on part_b_1_b_10 for each row
execute function trigfunc();
insert into range_parted values ('a', 1);

In HEAD:

postgres=# update range_parted r set a = 'b' from (values ('a', 1))
s(x, y) where s.x = r.a and s.y = r.b returning tableoid::regclass,
r.*;
 tableoid | a | b
--+---+---
(0 rows)

UPDATE 0

But with the patch:

postgres=# update range_parted r set a = 'b' from (values ('a', 1))
s(x, y) where s.x = r.a and s.y = r.b returning tableoid::regclass,
r.*;
   tableoid| a | b
---+---+---
 part_a_1_a_10 | b | 1
(1 row)

UPDATE 1

This produces RETURNING, though INSERT on the destination partition
was skipped by the trigger.

Another thing is that the patch assumes that the tuple slot to pass to
ExecInsert() would store the inserted tuple when doing that function,
but that’s not always true, because in case of a foreign partition,
the FDW may return a slot other than the passed-in slot when called
from ExecForeignInsert(), in which case the passed-in slot would not
store the inserted tuple anymore.

To fix these, I modified the patch so that we 1) add to ExecInsert()
an output parameter slot to store the inserted tuple, and 2) compute
RETURNING based on the parameter.  I also added a regression test
case.  Attached is an updated version of the patch.

Sorry for the long delay.

Best regards,
Etsuro Fujita

v3-0001-Fix-a-bug-with-RETURNING-when-UPDATE-moves-tuple.patch
Description: Binary data

Fix minor source code comment mistake

2020-07-30 Thread k.jami...@fujitsu.com

Hi,

Just found a minor error in source code comment.
src/include/executor/instrument.h

Attached is the fix.

-   longlocal_blks_dirtied; /* # of shared blocks dirtied */
+   longlocal_blks_dirtied; /* # of local blocks dirtied */


Regards,
Kirk Jamison


0001-Fix-comment-in-source-code.patch
Description: 0001-Fix-comment-in-source-code.patch

RE: [Patch] Optimize dropping of relation buffers using dlist

2020-07-30 Thread k.jami...@fujitsu.com

On Wednesday, July 29, 2020 4:55 PM, Konstantin Knizhnik wrote:
> On 17.06.2020 09:14, k.jami...@fujitsu.com wrote:
> > Hi,
> >
> > Since the last posted version of the patch fails, attached is a rebased 
> > version.
> > Written upthread were performance results and some benefits and challenges.
> > I'd appreciate your feedback/comments.
> >
> > Regards,
> > Kirk Jamison

> As far as i understand this patch can provide significant improvement of
> performance only in case of recovery  of truncates of large number of tables. 
> You
> have added shared hash of relation buffers and certainly if adds some extra
> overhead. According to your latest results this overhead is quite small. But 
> it will
> be hard to prove that there will be no noticeable regression at some 
> workloads.

Thank you for taking a look at this.

Yes, one of the aims is to speed up recovery of truncations, but at the same 
time the
patch also improves autovacuum, vacuum and relation truncate index executions. 
I showed results of pgbench results above for different types of workloads,
but I am not sure if those are validating enough...

> I wonder if you have considered case of local hash (maintained only during
> recovery)?
> If there is after-crash recovery, then there will be no concurrent access to 
> shared
> buffers and this hash will be up-to-date.
> in case of hot-standby replica we can use some simple invalidation (just one 
> flag
> or counter which indicates that buffer cache was updated).
> This hash also can be constructed on demand when DropRelFileNodeBuffers is
> called first time (so w have to scan all buffers once, but subsequent drop
> operation will be fast.
> 
> i have not thought much about it, but it seems to me that as far as this 
> problem
> only affects recovery, we do not need shared hash for it.
> 

The idea of the patch is to mark the relation buffers to be dropped after 
scanning
the whole shared buffers, and store them into shared memory maintained in a 
dlist,
and traverse the dlist on the next scan.
But I understand the point that it is expensive and may cause overhead, that is 
why
I tried to define a macro to limit the number of pages that we can cache for 
cases
that lookup cost can be problematic (i.e. too many pages of relation).

#define BUF_ID_ARRAY_SIZE 100
int buf_id_array[BUF_ID_ARRAY_SIZE];
int forknum_indexes[BUF_ID_ARRAY_SIZE];

In DropRelFileNodeBuffers
do
{
nbufs = CachedBlockLookup(..., forknum_indexes, buf_id_array, 
lengthof(buf_id_array));
for (i = 0; i < nbufs; i++)
{
...
}
} while (nbufs == lengthof(buf_id_array));

Perhaps the patch affects complexities so we want to keep it simpler, or commit 
piece by piece?
I will look further into your suggestion of maintaining local hash only during 
recovery.
Thank you for the suggestion.

Regards,
Kirk Jamison

RE: Is it useful to record whether plans are generic or custom?

2020-07-30 Thread legrand legrand

>> Main purpose is to decide (1) the user interface and (2) the
>> way to get the plan type from pg_stat_statements.
>>
>> (1) the user interface
>> I added a new boolean column 'generic_plan' to both
>> pg_stat_statements view and the member of the hash key of
>> pg_stat_statements.
>>
>> This is because as Legrand pointed out the feature seems
>> useful under the condition of differentiating all the
>> counters for a queryid using a generic plan and the one
>> using a custom one.

> I don't like this because this may double the number of entries in pgss.
> Which means that the number of entries can more easily reach
> pg_stat_statements.max and some entries will be discarded.

Not all the entries will be doubled, only the ones that are prepared.
And even if auto prepare was implemented, having 5000, 1 or 2
max entries seems not a problem to me.

>> I thought it might be preferable to make a GUC to enable
>> or disable this feature, but changing the hash key makes
>> it harder.

> What happens if the server was running with this option enabled and then
> restarted with the option disabled? Firstly two entries for the same query
> were stored in pgss because the option was enabled. But when it's disabled
> and the server is restarted, those two entries should be merged into one
> at the startup of server? If so, that's problematic because it may take
> a long time.

What would you think about a third value for this flag to handle disabled
case (no need to merge entries in this situation) ?

> Therefore I think that it's better and simple to just expose the number of
> times generic/custom plan was chosen for each query.

I thought this feature was mainly needed to identifiy "under optimal generic
plans". Without differentiating execution counters, this will be simpler but
useless in this case ... isn't it ?

> Regards,

> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION

Regards
PAscal

71 matches

Mail list logo