Re: Parallel worker hangs while handling errors.

2020-07-23 Thread vignesh C
Thanks for reviewing and adding your thoughts, My comments are inline.

On Fri, Jul 17, 2020 at 1:21 PM Bharath Rupireddy
 wrote:
>
> The same hang issue can occur(though I'm not able to back it up with a
> use case), in the cases from wherever the EmitErrorReport() is called
> from "if (sigsetjmp(local_sigjmp_buf, 1) != 0)" block, such as
> autovacuum.c, bgwriter.c, bgworker.c, checkpointer.c, walwriter.c, and
> postgres.c.
>

I'm not sure if this can occur in other cases.

> >
> > One of the fixes could be to call BackgroundWorkerUnblockSignals just
> > after sigsetjmp. I'm not sure if this is the best solution.
> > Robert & myself had a discussion about the problem yesterday. We felt
> > this is a genuine problem with the parallel worker error handling and
> > need to be fixed.
> >
>
> Note that, in all sigsetjmp blocks, we intentionally
> HOLD_INTERRUPTS(), to not cause any issues while performing error
> handling, I'm concerned here that now, if we directly call
> BackgroundWorkerUnblockSignals() which will open up all the signals
> and our main intention of holding interrupts or block signals may go
> away.
>
> Since the main problem for this hang issue is because of blocking
> SIGUSR1, in sigsetjmp, can't we just only unblock only the SIGUSR1,
> instead of unblocking all signals? I tried this with parallel copy
> hang, the issue is resolved.
>

On putting further thoughts on this, I feel just unblocking SIGUSR1
would be the right approach in this case. I'm attaching a new patch
which unblocks SIGUSR1 signal. I have verified that the original issue
with WIP parallel copy patch gets fixed. I have made changes only in
bgworker.c as we require the parallel worker to receive this signal
and continue processing. I have not included the changes for other
processes as I'm not sure if this scenario is applicable for other
processes.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From b283bdd47e3bc132c1b8ef0ea59d1c12cb8c0ffb Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 23 Jul 2020 12:35:45 +0530
Subject: [PATCH] Fix for Parallel worker hangs while handling errors.

Worker is not able to receive the signals while processing error flow. Worker
hangs in this case because when the worker is started the signals will be
masked using sigprocmask. Unblocking of signals is done by calling
BackgroundWorkerUnblockSignals in ParallelWorkerMain. Now due to error
handling the worker has jumped to setjmp in StartBackgroundWorker function.
Here the signals are in blocked state, hence the signal is not received by the
worker process.
---
 src/backend/postmaster/bgworker.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85..a0af580 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -747,6 +747,17 @@ StartBackgroundWorker(void)
 	 */
 	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
 	{
+		/*
+		 * Unblock SIGUSR1 signal, it was blocked when the postmaster forker us.
+		 * In case of parallel workers, leader process will send SIGUSR1 signal
+		 * to the worker process(worker process will be in waiting state as
+		 * there is no space available) to indicate shared memory space is freed
+		 * up. Once the signal is received worker process will start populating
+		 * the error message further.
+		 */
+		sigdelset(&BlockSig, SIGUSR1);
+		PG_SETMASK(&BlockSig);
+
 		/* Since not using PG_TRY, must reset error stack by hand */
 		error_context_stack = NULL;
 
-- 
1.8.3.1



Re: Transactions involving multiple postgres foreign servers, take 2

2020-07-23 Thread Ahsan Hadi
On Fri, Jul 17, 2020 at 9:56 PM Fujii Masao 
wrote:

>
>
> On 2020/07/16 14:47, Masahiko Sawada wrote:
> > On Tue, 14 Jul 2020 at 11:19, Fujii Masao 
> wrote:
> >>
> >>
> >>
> >> On 2020/07/14 9:08, Masahiro Ikeda wrote:
>  I've attached the latest version patches. I've incorporated the review
>  comments I got so far and improved locking strategy.
> >>>
> >>> Thanks for updating the patch!
> >>
> >> +1
> >> I'm interested in these patches and now studying them. While checking
> >> the behaviors of the patched PostgreSQL, I got three comments.
> >
> > Thank you for testing this patch!
> >
> >>
> >> 1. We can access to the foreign table even during recovery in the HEAD.
> >> But in the patched version, when I did that, I got the following error.
> >> Is this intentional?
> >>
> >> ERROR:  cannot assign TransactionIds during recovery
> >
> > No, it should be fixed. I'm going to fix this by not collecting
> > participants for atomic commit during recovery.
>
> Thanks for trying to fix the issues!
>
> I'd like to report one more issue. When I started new transaction
> in the local server, executed INSERT in the remote server via
> postgres_fdw and then quit psql, I got the following assertion failure.
>
> TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
> 0   postgres0x00010d52f3c0
> ExceptionalCondition + 160
> 1   postgres0x00010cefbc49
> ForgetAllFdwXactParticipants + 313
> 2   postgres0x00010cefff14
> AtProcExit_FdwXact + 20
> 3   postgres0x00010d313fe3 shmem_exit + 179
> 4   postgres0x00010d313e7a
> proc_exit_prepare + 122
> 5   postgres0x00010d313da3 proc_exit + 19
> 6   postgres0x00010d35112f PostgresMain +
> 3711
> 7   postgres0x00010d27bb3a BackendRun + 570
> 8   postgres0x00010d27af6b BackendStartup
> + 475
> 9   postgres0x00010d279ed1 ServerLoop + 593
> 10  postgres0x00010d277940 PostmasterMain
> + 6016
> 11  postgres0x00010d1597b9 main + 761
> 12  libdyld.dylib   0x7fff7161e3d5 start + 1
> 13  ??? 0x0003 0x0 + 3
>

I have done a test with the latest set of patches shared by Swada and I am
not able to reproduce this issue. Started a prepared transaction on the
local server and then did a couple of inserts in a remote table using
postgres_fdw and the quit psql. I am not able to reproduce the assertion
failure.



>
> Regards,
>
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>
>
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.h...@highgo.ca


Re: Index Skip Scan (new UniqueKeys)

2020-07-23 Thread Dmitry Dolgov
> On Tue, Jul 14, 2020 at 06:18:50PM +, Floris Van Nee wrote:
>
> Due to the other changes I made in 
> create_distinct_paths/query_has_uniquekeys_for, it will choose a correct plan 
> now, even without the EC_MUST_BE_REDUNDANT check though, so it's difficult to 
> give an actual failing test case now. However, since all code filters out 
> constant keys, I think uniqueness should do it too - otherwise you could get 
> into problems later on. It's also more consistent. If you already know 
> something is unique by just (b), it doesn't make sense to store that it's 
> unique by (a,b). Now that I think of it, the best place to do this 
> EC_MUST_BE_REDUNDANT check is probably inside make_pathkeys_for_uniquekeys, 
> rather than build_uniquekeys though. It's probably good to move it there.

That would be my suggestion as well.

> > Along the lines I'm also curious about this part:
> >
> > -   ListCell   *k;
> > -   List *exprs = NIL;
> > -
> > -   foreach(k, ec->ec_members)
> > -   {
> > -   EquivalenceMember *mem = (EquivalenceMember *)
> > lfirst(k);
> > -   exprs = lappend(exprs, mem->em_expr);
> > -   }
> > -
> > -   result = lappend(result, makeUniqueKey(exprs, false, false));
> > +   EquivalenceMember *mem = (EquivalenceMember*)
> > +lfirst(list_head(ec->ec_members));
> >
> > I'm curious about this myself, maybe someone can clarify. It looks like
> > generaly speaking there could be more than one member (if not
> > ec_has_volatile), which "representing knowledge that multiple items are
> > effectively equal". Is this information is not interesting enough to 
> > preserve it
> > in unique keys?
>
> Yeah, that's a good question. Hence my question about the choice for Expr 
> rather than EquivalenceClass for the Unique Keys patch to Andy/David. When 
> storing just Expr, it is rather difficult to check equivalence in joins for 
> example. Suppose, later on we decide to add join support to the distinct skip 
> scan. Consider a query like this:
> SELECT DISTINCT t1.a FROM t1 JOIN t2 ON t1.a=t2.a
> As far as my understanding goes (I didn't look into it in detail though), I 
> think here the distinct_pathkey will have an EqClass {t1.a, t2.a}. That 
> results in a UniqueKey with expr (t1.a) (because currently we only take the 
> first Expr in the list to construct the UniqueKey). We could also construct 
> *two* unique keys, one with Expr (t1.a) and one with Expr (t2.a), but I don't 
> think that's the correct approach either, as it will explode when you have 
> multiple pathkeys, each having multiple Expr inside their EqClasses.

One UniqueKey can have multiple corresponding expressions, which gives
us also possibility of having one unique key with (t1.a, t2.a) and it
looks now similar to EquivalenceClass.

> > > - the distinct_pathkeys may be NULL, even though there's a possibility for
> > skipping. But it wouldn't create the uniquekeys in this case. This makes the
> > planner not choose skip scans even though it could. For example in queries
> > that do SELECT DISTINCT ON (a) * FROM t1 WHERE a=1 ORDER BY a,b; Since a
> > is constant, it's eliminated from regular pathkeys.
> >
> > What would be the point of skipping if it's a constant?
>
> For the query: SELECT DISTINCT ON (a) * FROM t1 WHERE a=1 ORDER BY a,b;
> There may be 1000s of records with a=1. We're only interested in the first 
> one though. The traditional non-skip approach would still scan all records 
> with a=1. Skip would just fetch the first one with a=1 and then skip to the 
> next prefix and stop the scan.

The idea behind this query sounds questionable to me, more transparent
would be to do this without distinct, skipping will actually do exactly
the same stuff just under another name. But if allowing skipping on
constants do not bring significant changes in the code probably it's
fine.

> > > - to combat the issues mentioned earlier, there's now a check in
> > build_index_paths that checks if the query_pathkeys matches the
> > useful_pathkeys. Note that we have to use the path keys here rather than
> > any of the unique keys. The unique keys are only Expr nodes - they do not
> > contain the necessary information about ordering. Due to elimination of
> > some constant path keys, we have to search the attributes of the index to
> > find the correct prefix to use in skipping.
> >
> > IIUC here you mean this function, right?
> >
> > + prefix = find_index_prefix_for_pathkey(root,
> > +
> > index,
> > +
> > BackwardScanDirection,
> > +
> > llast_node(PathKey,
> > +
> > root->distinct_pathkeys));
> >
> > Doesn't it duplicate the job already done in build_index_pathkeys by 
> > building
> > those pathkeys again? If yes, probably it's possible to reuse 
> > useful_pathkeys.
> > Not sure about unordered indexes, but looks like query_pathkeys should
> > also match in this case.
> >
>
> Yeah, there's definitely some double work there, but the actual impact may be 
> limited - it doesn't actually allocate a new path key, but it looks 

Building 12.3 from source on Mac

2020-07-23 Thread Paul Förster
Hi,

I'm not sure this is the right list, but I have a problem concerning building 
PostgreSQL 12.3 from source on a Mac.

I do:

./configure \
--prefix=${pgTargetDir} \
--enable-nls \
--with-perl \
--with-python \
--with-libxml \
--with-tclconfig=/usr/lib64 \
PG_SYSROOT=$(xcodebuild -version -sdk macosx Path)

and I get:

...
checking for __cpuid... no
checking for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=... no
checking for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=-msse4.2... yes
checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=... no
checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with 
CFLAGS=-march=armv8-a+crc... no
checking which CRC-32C implementation to use... SSE 4.2 with runtime check
checking which semaphore API to use... System V
checking for /dev/urandom... yes
checking which random number source to use... /dev/urandom
checking for library containing bind_textdomain_codeset... no
configure: error: a gettext implementation is required for NLS

If I leave out --enable-nls then building works fine and I get everything 
without error. But why is there a problem with gettext?

My Mac:

MacBook Pro (Retina, 15-inch, Late 2013)
macOS Catalina 10.15.6 (all updates installed)
Xcode 11.6 (11E708) w/ command line tools installed
No brew, no MacPorts, or other stuff like this is installed.

Does anyone have an idea? Thanks in advance.

Cheers,
Paul



Re: Building 12.3 from source on Mac

2020-07-23 Thread Daniel Gustafsson
> On 23 Jul 2020, at 12:01, Paul Förster  wrote:

> If I leave out --enable-nls then building works fine and I get everything 
> without error. But why is there a problem with gettext?

gettext is not shipped by default with macOS, you will have to install it
separately via your favourite package manager or by building from source.  To
verify you can always search your system for the required header file:

mdfind -name libintl.h

See https://www.postgresql.org/docs/current/install-requirements.html for more
information on build-time requirements.

cheers ./daniel



Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-23 Thread Amul Sul
On Thu, Jul 23, 2020 at 3:33 AM Soumyadeep Chakraborty
 wrote:
>
> Hello,
>
> I think we should really term this feature, as it stands, as a means to
> solely stop WAL writes from happening.
>

True.

> The feature doesn't truly make the system read-only (e.g. dirty buffer
> flushes may succeed the system being put into a read-only state), which
> does make it confusing to a degree.
>
> Ideally, if we were to have a read-only system, we should be able to run
> pg_checksums on it, or take file-system snapshots etc, without the need
> to shut down the cluster. It would also enable an interesting use case:
> we should also be able to do a live upgrade on any running cluster and
> entertain read-only queries at the same time, given that all the
> cluster's files will be immutable?
>

Read-only is for the queries.

The aim of this feature is preventing new WAL records from being generated, not
preventing them from being flushed to disk, or streamed to standbys, or anything
else. The rest should happen as normal.

If you can't flush WAL, then you might not be able to evict some number of
buffers, which in the worst case could be large. That's because you can't evict
a dirty buffer until WAL has been flushed up to the buffer's LSN (otherwise,
you wouldn't be following the WAL-before-data rule). And having a potentially
large number of unevictable buffers around sounds terrible, not only for
performance, but also for having the system keep working at all.

> So if we are not going to address those cases, we should change the
> syntax and remove the notion of read-only. It could be:
>
> ALTER SYSTEM SET wal_writes TO off|on;
> or
> ALTER SYSTEM SET prohibit_wal TO off|on;
>
> If we are going to try to make it truly read-only, and cater to the
> other use cases, we have to:
>
> Perform a checkpoint before declaring the system read-only (i.e. before
> the command returns). This may be expensive of course, as Andres has
> pointed out in this thread, but it is a price that has to be paid. If we
> do this checkpoint, then we can avoid an additional shutdown checkpoint
> and an end-of-recovery checkpoint (if we restart the primary after a
> crash while in read-only mode). Also, we would have to prevent any
> operation that touches control files, which I am not sure we do today in
> the current patch.
>

The intention is to change the system to read-only ASAP; the checkpoint will
make it much slower.

I don't think we can skip control file updates that need to make read-only
state persistent across the restart.

> Why not have the best of both worlds? Consider:
>
> ALTER SYSTEM SET read_only to {off, on, wal};
>
> -- on: wal writes off + no writes to disk
> -- off: default
> -- wal: only wal writes off
>
> Of course, there can probably be better syntax for the above.
>

Sure, thanks for the suggestions. Syntax change is not a harder part; we can
choose the better one later.

Regards,
Amul




Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-23 Thread Amul Sul
On Thu, Jul 23, 2020 at 4:34 AM SATYANARAYANA NARLAPURAM
 wrote:
>
> +1 to this feature and I have been thinking about it for sometime. There are 
> several use cases with marking database read only (no transaction log 
> generation). Some of the examples in a hosted service scenario are 1/ when 
> customer runs out of storage space, 2/ Upgrading the server to a different 
> major version (current server can be set to read only, new one can be built 
> and then switch DNS), 3/ If user wants to force a database to read only and 
> not accept writes, may be for import / export a database.
>
Thanks for voting & listing the realistic use cases.

Regards,
Amul




Re: Dumping/restoring fails on inherited generated column

2020-07-23 Thread Masahiko Sawada
On Thu, 16 Jul 2020 at 04:29, Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> >> Right, there were a number of combinations that were not properly
> >> handled.  The attached patch should fix them all.  It's made against
> >> PG12 but also works on master.  See contained commit message and
> >> documentation for details.
>
> > committed to master and PG12
>
> So ... this did not actually fix the dump/restore problem.  In fact,
> it's worse, because in HEAD I see two failures not one when doing the
> same test proposed at the start of this thread:
>
> 1. make installcheck
> 2. pg_dump -Fc regression >r.dump
> 3. createdb r2
> 4. pg_restore -d r2 r.dump
>
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 6253; 2604 226187 DEFAULT gtest1_1 b postgres
> pg_restore: error: could not execute query: ERROR:  column "b" of relation 
> "gtest1_1" is a generated column
> Command was: ALTER TABLE ONLY public.gtest1_1 ALTER COLUMN b SET DEFAULT (a * 
> 2);
>
>
> pg_restore: from TOC entry 6279; 2604 227276 DEFAULT gtest30_1 b postgres
> pg_restore: error: could not execute query: ERROR:  cannot use column 
> reference in DEFAULT expression
> Command was: ALTER TABLE ONLY public.gtest30_1 ALTER COLUMN b SET DEFAULT (a 
> * 2);
>
>
> pg_restore: warning: errors ignored on restore: 2
>

The minimum reproducer is:

create table a (a int, b int generated always as (a * 2) stored);
create table aa () inherits (a);

pg_dump produces the following DDLs:

CREATE TABLE public.a (
a integer,
b integer GENERATED ALWAYS AS ((a * 2)) STORED
);

CREATE TABLE public.aa (
)
INHERITS (public.a);

ALTER TABLE ONLY public.aa ALTER COLUMN b SET DEFAULT (a * 2);

However, the ALTER TABLE fails.

By commit 086ffddf, the child tables must have the same generation
expression as the expression defined in the parent. So I think pg_dump
should not generate the last DDL. I've attached the patch fixing this
issue.

Apart from the fix, I wonder if we can add a test that dumps the
database where executed 'make check' and restore it to another
database.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fix_gcolumn_dump.patch
Description: Binary data


Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-23 Thread Amul Sul
On Thu, Jul 23, 2020 at 6:08 AM Soumyadeep Chakraborty
 wrote:
>
> Hi Amul,
>

Thanks, Soumyadeep for looking and putting your thoughts on the patch.

> On Tue, Jun 16, 2020 at 6:56 AM amul sul  wrote:
> > The proposed feature is built atop of super barrier mechanism commit[1] to
> > coordinate
> > global state changes to all active backends.  Backends which executed
> > ALTER SYSTEM READ { ONLY | WRITE } command places request to checkpointer
> > process to change the requested WAL read/write state aka WAL prohibited and
> > WAL
> > permitted state respectively.  When the checkpointer process sees the WAL
> > prohibit
> > state change request, it emits a global barrier and waits until all
> > backends that
> > participate in the ProcSignal absorbs it.
>
> Why should the checkpointer have the responsibility of setting the state
> of the system to read-only? Maybe this should be the postmaster's
> responsibility - the checkpointer should just handle requests to
> checkpoint.

Well, once we've initiated the change to a read-only state, we probably want to
always either finish that change or go back to read-write, even if the process
that initiated the change is interrupted. Leaving the system in a
half-way-in-between state long term seems bad. Maybe we would have put some
background process, but choose the checkpointer in charge of making the state
change and to avoid the new background process to keep the first version patch
simple.  The checkpointer isn't likely to get killed, but if it does, it will
be relaunched and the new one can clean things up. On the other hand, I agree
making the checkpointer responsible for more than one thing might not
be a good idea
but I don't think the postmaster should do the work that any
background process can
do.

>I think the backend requesting the read-only transition
> should signal the postmaster, which in turn, will take on the aforesaid
> responsibilities. The postmaster, could also additionally request a
> checkpoint, using RequestCheckpoint() (if we want to support the
> read-onlyness discussed in [1]). checkpointer.c should not be touched by
> this feature.
>
> Following on, any condition variable used by the backend to wait for the
> ALTER SYSTEM command to finish (the patch uses
> CheckpointerShmem->readonly_cv), could be housed in ProcGlobal.
>

Relevant only if we don't want to use the checkpointer process.

Regards,
Amul




Re: Parallel Seq Scan vs kernel read ahead

2020-07-23 Thread Amit Kapila
On Wed, Jul 22, 2020 at 10:03 AM Thomas Munro  wrote:
>
> On Wed, Jul 22, 2020 at 3:57 PM Amit Kapila  wrote:
> > Yeah, that is true but every time before the test the same amount of
> > data should be present in shared buffers (or OS cache) if any which
> > will help in getting consistent results.  However, it is fine to
> > reboot the machine as well if that is a convenient way.
>
> We really should have an extension (pg_prewarm?) that knows how to
> kick stuff out PostgreSQL's shared buffers and the page cache
> (POSIX_FADV_DONTNEED).
>

+1.  Such an extension would be quite helpful for performance benchmarks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Compatible defaults for LEAD/LAG

2020-07-23 Thread Daniel Gustafsson
> On 13 Jul 2020, at 19:23, Pavel Stehule  wrote:

> ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing  > napsal:
> On 5/31/20 9:53 PM, Tom Lane wrote:
> > Vik Fearing mailto:v...@postgresfriends.org>> 
> > writes:
> >>   postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >>   postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
> >>   postgres-# ORDER BY n;
> >>   ERROR:  function lag(numeric, integer, integer) does not exist
> >>   LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >>^
> > 
> > Yeah, we have similar issues elsewhere.
> > 
> >> Attached is a patch that fixes this issue using the new anycompatible
> >> pseudotype.  I am hoping this can be slipped into 13 even though it
> >> requires a catversion bump after BETA1.
> > 
> > When the anycompatible patch went in, I thought for a little bit about
> > trying to use it with existing built-in functions, but didn't have the
> > time to investigate the issue in detail.  I'm not in favor of hacking
> > things one-function-at-a-time here; we should look through the whole
> > library and see what we've got.
> > 
> > The main thing that makes this perhaps-not-trivial is that an
> > anycompatible-ified function will match more cases than it would have
> > before, possibly causing conflicts if the function or operator name
> > is overloaded.  We'd have to look at such cases and decide what we
> > want to do --- one answer would be to drop some of the alternatives
> > and rely on the parser to add casts, but that might slow things down.
> > 
> > Anyway, I agree that this is a good direction to pursue, but not in
> > a last-minute-hack-for-v13 way.
> 
> Fair enough.  I put it in the commitfest app for version 14.
> https://commitfest.postgresql.org/28/2574/ 
> 
> -- 
> Vik Fearing
> 
> The patch is ok. It is almost trivial. It solves one issue, but maybe it 
> introduces a new issue. 
> 
> Other databases try to coerce default constant expression to value 
> expression. I found a description about this behaviour for MSSQL, Sybase, 
> BigQuery.
> 
> I didn't find related documentation for Oracle, and I have not a access to 
> some running instance of Oracle to test it.
> 
> Ansi SQL say - type of default expression should be compatible with lag 
> expression, and declared type of result should be type of value expression.
> 
> IWD 9075-2:201?(E) 6.10  - j) v)
> 
> Current implementation is limited (and the behaviour is not user friendly in 
> all details), but new behaviour (implemented by patch) is in half cases out 
> of standard :(.
> 
> These cases are probably not often - and they are really corner cases, but I 
> cannot to qualify how much important this fact is.
> 
> For users, the implemented feature is better, and it is safe. Implementation 
> is trivial - is significantly simpler than implementation that is 100% 
> standard, although it should not be a hard problem too (in analyze stage it 
> probably needs a few lines too).
> 
> There has to be a decision, because now we can go in both directions. After 
> choosing there will not be a way back.

This patch is marked waiting for author, but it's not clear from this review
what we're waiting on.  Should this be moved to Needs review or Ready for
Committer instead to solicit the input from a committer?  ISTM the latter is
more suitable. Or did I misunderstand your review?

cheers ./daniel





Re: explain HashAggregate to report bucket and memory stats

2020-07-23 Thread Daniel Gustafsson
> On 12 Jul 2020, at 21:52, Daniel Gustafsson  wrote:

> This thread has stalled and the patch has been Waiting on Author since March,
> and skimming the thread there seems to be questions raised over the value
> proposition.  Is there progress happening behind the scenes or should we close
> this entry for now, to re-open in case there is renewed activity/interest?

With not too many days of the commitfest left, I'm closing this in 2020-07.
Please feel free to add a new entry if there is renewed interest in this patch.

cheers ./daniel



RE: Index Skip Scan (new UniqueKeys)

2020-07-23 Thread Floris Van Nee
> 
> One UniqueKey can have multiple corresponding expressions, which gives us
> also possibility of having one unique key with (t1.a, t2.a) and it looks now
> similar to EquivalenceClass.
> 

I believe the current definition of a unique key with two expressions (t1.a, 
t2.a) means that it's unique on the tuple (t1.a, t2.a) - this gives weaker 
guarantees than uniqueness on (t1.a) and uniqueness on (t2.a).

> 
> The idea behind this query sounds questionable to me, more transparent
> would be to do this without distinct, skipping will actually do exactly the 
> same
> stuff just under another name. But if allowing skipping on constants do not
> bring significant changes in the code probably it's fine.
> 

Yeah indeed, I didn't say it's a query that people should generally write. :-) 
It's better to write as a regular SELECT with LIMIT 1 of course. However, it's 
more to be consistent and predictable to the user: if a SELECT DISTINCT ON (a) 
* FROM t1 runs fast, then it doesn't make sense to the user if a SELECT 
DISTINCT ON (a) * FROM t1 WHERE a=2 runs slow. And to support it also makes the 
implementation more consistent with little code changes.

> >
> > Yeah, there's definitely some double work there, but the actual impact may
> be limited - it doesn't actually allocate a new path key, but it looks it up 
> in
> root->canon_pathkeys and returns that path key.
> > I wrote it like this, because I couldn't find a way to identify from a 
> > certain
> PathKey the actual location in the index of that column. The constructed path
> keys list filters out all redundant path keys. An index on (a,a,b,a,b) becomes
> path keys (a,b). Now if we skip on (a,b) we actually need to use prefix=3. But
> how to get from PathKey=b to that number 3, I didn't find a solid way except
> doing this. Maybe there is though?
> 
> I don't think there is a direct way, but why not modify build_index_paths to
> also provide this information, or compare index_pathkeys expressions with
> indextlist without actually create those pathkeys again?
> 

I agree there could be other ways - I don't currently have a strong preference 
for either. I can have a look at this later.

> And couple of words about this thread [1]. It looks to me like a strange way
> of interacting with the community. Are you going to duplicate there
> everything, or what are your plans? At the very least you could try to include
> everyone involved in the recipients list, not exclude some of the authors.
> 

When I wrote the first mail in the thread, I went to this thread [1] and 
included everyone from there, but I see now that I only included the to: and 
cc: people and forgot the original thread author, you. I'm sorry about that - I 
should've looked better to make sure I had everyone.
In any case, my plan is to keep the patch at least applicable to master, as I 
believe it can be helpful for discussions about both patches.

[1] 
https://www.postgresql.org/message-id/20200609102247.jdlatmfyeecg52fi%40localhost




Re: Compatible defaults for LEAD/LAG

2020-07-23 Thread Pavel Stehule
čt 23. 7. 2020 v 13:29 odesílatel Daniel Gustafsson 
napsal:

> > On 13 Jul 2020, at 19:23, Pavel Stehule  wrote:
>
> > ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing  > napsal:
> > On 5/31/20 9:53 PM, Tom Lane wrote:
> > > Vik Fearing mailto:v...@postgresfriends.org>>
> writes:
> > >>   postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> > >>   postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
> > >>   postgres-# ORDER BY n;
> > >>   ERROR:  function lag(numeric, integer, integer) does not exist
> > >>   LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> > >>^
> > >
> > > Yeah, we have similar issues elsewhere.
> > >
> > >> Attached is a patch that fixes this issue using the new anycompatible
> > >> pseudotype.  I am hoping this can be slipped into 13 even though it
> > >> requires a catversion bump after BETA1.
> > >
> > > When the anycompatible patch went in, I thought for a little bit about
> > > trying to use it with existing built-in functions, but didn't have the
> > > time to investigate the issue in detail.  I'm not in favor of hacking
> > > things one-function-at-a-time here; we should look through the whole
> > > library and see what we've got.
> > >
> > > The main thing that makes this perhaps-not-trivial is that an
> > > anycompatible-ified function will match more cases than it would have
> > > before, possibly causing conflicts if the function or operator name
> > > is overloaded.  We'd have to look at such cases and decide what we
> > > want to do --- one answer would be to drop some of the alternatives
> > > and rely on the parser to add casts, but that might slow things down.
> > >
> > > Anyway, I agree that this is a good direction to pursue, but not in
> > > a last-minute-hack-for-v13 way.
> >
> > Fair enough.  I put it in the commitfest app for version 14.
> > https://commitfest.postgresql.org/28/2574/ <
> https://commitfest.postgresql.org/28/2574/>
> > --
> > Vik Fearing
> >
> > The patch is ok. It is almost trivial. It solves one issue, but maybe it
> introduces a new issue.
> >
> > Other databases try to coerce default constant expression to value
> expression. I found a description about this behaviour for MSSQL, Sybase,
> BigQuery.
> >
> > I didn't find related documentation for Oracle, and I have not a access
> to some running instance of Oracle to test it.
> >
> > Ansi SQL say - type of default expression should be compatible with lag
> expression, and declared type of result should be type of value expression.
> >
> > IWD 9075-2:201?(E) 6.10  - j) v)
> >
> > Current implementation is limited (and the behaviour is not user
> friendly in all details), but new behaviour (implemented by patch) is in
> half cases out of standard :(.
> >
> > These cases are probably not often - and they are really corner cases,
> but I cannot to qualify how much important this fact is.
> >
> > For users, the implemented feature is better, and it is safe.
> Implementation is trivial - is significantly simpler than implementation
> that is 100% standard, although it should not be a hard problem too (in
> analyze stage it probably needs a few lines too).
> >
> > There has to be a decision, because now we can go in both directions.
> After choosing there will not be a way back.
>
> This patch is marked waiting for author, but it's not clear from this
> review
> what we're waiting on.  Should this be moved to Needs review or Ready for
> Committer instead to solicit the input from a committer?  ISTM the latter
> is
> more suitable. Or did I misunderstand your review?
>

Unfortunately, I don't know - I am not sure about committing this patch,
and I am not able to qualify for impact.

This is nice example of usage of anycompatible type (that is consistent
with other things in Postgres), but standard says something else.

It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
but Tom doesn't like this patch.

I am more inclined to think so this feature should be implemented
differently - there is no strong reason to go against standard or against
the implementations of other databases (and increase the costs of porting).
Now the implementation is limited, but allowed behaviour is 100% ANSI.

On second hand, the implemented feature (behaviour) is pretty small, and
really corner case, so maybe a simple solution (although it isn't perfect)
is good.

So I would listen to the opinions of other people.

Regards

Pavel



> cheers ./daniel
>
>


Re: Why it is not possible to create custom AM which behaves similar to btree?

2020-07-23 Thread Konstantin Knizhnik




On 23.07.2020 03:11, Tom Lane wrote:

Konstantin Knizhnik  writes:

But then I get error for btfloat48cmp and btfloat84cmp functions:
ERROR:  associated data types must be specified for index support function

You need to specify the amproclefttype and amprocrighttype types you
want the function to be registered under.  The core code knows that
for btree, those are the same as the actual parameter types of the
function; but there's no reason to make such an assumption for other AMs.
So you have to write it out; perhaps

  ...
  FUNCTION 1(float4,float8) btfloat48cmp(float4,float8),
  ...


Thank you very much.
It works!





Re: Parallel copy

2020-07-23 Thread Bharath Rupireddy
On Thu, Jul 23, 2020 at 9:22 AM Amit Kapila  wrote:
>
>>
>> I ran tests for partitioned use cases - results are similar to that of
non partitioned cases[1].
>
>
> I could see the gain up to 10-11 times for non-partitioned cases [1], can
we use similar test case here as well (with one of the indexes on text
column or having gist index) to see its impact?
>
> [1] -
https://www.postgresql.org/message-id/CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg%40mail.gmail.com
>

Thanks Amit! Please find the results of detailed testing done for
partitioned use cases:

Range Partitions: consecutive rows go into the same partitions.
parallel workers test case 1(exec time in sec): copy from csv file, 2
indexes on integer columns and 1 index on text column, 4 range partitions test
case 2(exec time in sec): copy from csv file, 1 gist index on text column,
4 range partitions test case 3(exec time in sec): copy from csv file, 3
indexes on integer columns, 4 range partitions
0 1051.924(1X) 785.052(1X) 205.403(1X)
2 589.576(1.78X) 421.974(1.86X) 114.724(1.79X)
4 321.960(3.27X) 230.997(3.4X) 99.017(2.07X)
8 199.245(5.23X) *156.132(5.02X)* 99.722(2.06X)
16 127.343(8.26X) 173.696(4.52X) 98.147(2.09X)
20 *122.029(8.62X)* 186.418(4.21X) 97.723(2.1X)
30 142.876(7.36X) 214.598(3.66X) *97.048(2.11X)*

On Thu, Jul 23, 2020 at 10:21 AM Ashutosh Sharma 
wrote:
>
> I think, when doing the performance testing for partitioned table, it
would be good to also mention about the distribution of data in the input
file. One possible data distribution could be that we have let's say 100
tuples in the input file, and every consecutive tuple belongs to a
different partition.
>

To address Ashutosh's point, I used hash partitioning. Hope this helps to
clear the doubt.

Hash Partitions: where there are high chances that consecutive rows may go
into different partitions.
parallel workers test case 1(exec time in sec): copy from csv file, 2
indexes on integer columns and 1 index on text column, 4 hash partitions test
case 2(exec time in sec): copy from csv file, 1 gist index on text column,
4 hash partitions test case 3(exec time in sec): copy from csv file, 3
indexes on integer columns, 4 hash partitions
0 1060.884(1X) 812.283(1X) 207.745(1X)
2 572.542(1.85X) 418.454(1.94X) 107.850(1.93X)
4 298.132(3.56X) 227.367(3.57X) *83.895(2.48X)*
8 169.449(6.26X) 137.993(5.89X) 85.411(2.43X)
16 112.297(9.45X) 95.167(8.53X) 96.136(2.16X)
20 *101.546(10.45X)* *90.552(8.97X)* 97.066(2.14X)
30 113.877(9.32X) 127.17(6.38X) 96.819(2.14X)


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Re: Add session statistics to pg_stat_database

2020-07-23 Thread Ahsan Hadi
On Wed, Jul 8, 2020 at 4:17 PM Laurenz Albe 
wrote:

> Here is a patch that adds the following to pg_stat_database:
> - number of connections
>

Is it expected behaviour to not count idle connections? The connection is
included after it is aborted but not while it was idle.


> - number of sessions that were not disconnected regularly
> - total time spent in database sessions
> - total time spent executing queries
> - total idle in transaction time
>
> This is useful to check if connection pooling is working.
> It also helps to estimate the size of the connection pool
> required to keep the database busy, which depends on the
> percentage of the transaction time that is spent idling.
>
> Yours,
> Laurenz Albe
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.h...@highgo.ca


Re: Building 12.3 from source on Mac

2020-07-23 Thread Paul Förster
Hi Daniel,

> On 23. Jul, 2020, at 12:37, Daniel Gustafsson  wrote:
> gettext is not shipped by default with macOS, you will have to install it
> separately via your favourite package manager or by building from source.  To
> verify you can always search your system for the required header file:
> 
>mdfind -name libintl.h
> 
> See https://www.postgresql.org/docs/current/install-requirements.html for more
> information on build-time requirements.

thanks for the answer and the pointer.

But I am still wondering: mdfind spits out libintl.h without me installing the 
gettext library:

paul@meerkat:~$ mdfind -name libintl.h
/usr/local/include/libintl.h

Why is that? Did I miss something?

Cheers,
Paul



Re: Building 12.3 from source on Mac

2020-07-23 Thread Tom Lane
=?utf-8?Q?Paul_F=C3=B6rster?=  writes:
>> On 23. Jul, 2020, at 12:37, Daniel Gustafsson  wrote:
>> gettext is not shipped by default with macOS, you will have to install it
>> separately via your favourite package manager or by building from source.

> But I am still wondering: mdfind spits out libintl.h without me installing 
> the gettext library:
> paul@meerkat:~$ mdfind -name libintl.h
> /usr/local/include/libintl.h

Kind of looks like you *did* install gettext as Daniel suggested
(macOS proper would never put anything under /usr/local).  Maybe
you did not ask for that specifically, but installed some package
that requires it?

However, Apple's toolchain doesn't search /usr/local by default,
I believe.  You'll need to add something along the line of

--with-includes=/usr/local/include --with-libs=/usr/local/lib

to your configure command.

regards, tom lane




Re: Transactions involving multiple postgres foreign servers, take 2

2020-07-23 Thread Muhammad Usama
On Wed, Jul 22, 2020 at 12:42 PM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Sat, 18 Jul 2020 at 01:55, Fujii Masao 
> wrote:
> >
> >
> >
> > On 2020/07/16 14:47, Masahiko Sawada wrote:
> > > On Tue, 14 Jul 2020 at 11:19, Fujii Masao 
> wrote:
> > >>
> > >>
> > >>
> > >> On 2020/07/14 9:08, Masahiro Ikeda wrote:
> >  I've attached the latest version patches. I've incorporated the
> review
> >  comments I got so far and improved locking strategy.
> > >>>
> > >>> Thanks for updating the patch!
> > >>
> > >> +1
> > >> I'm interested in these patches and now studying them. While checking
> > >> the behaviors of the patched PostgreSQL, I got three comments.
> > >
> > > Thank you for testing this patch!
> > >
> > >>
> > >> 1. We can access to the foreign table even during recovery in the
> HEAD.
> > >> But in the patched version, when I did that, I got the following
> error.
> > >> Is this intentional?
> > >>
> > >> ERROR:  cannot assign TransactionIds during recovery
> > >
> > > No, it should be fixed. I'm going to fix this by not collecting
> > > participants for atomic commit during recovery.
> >
> > Thanks for trying to fix the issues!
> >
> > I'd like to report one more issue. When I started new transaction
> > in the local server, executed INSERT in the remote server via
> > postgres_fdw and then quit psql, I got the following assertion failure.
> >
> > TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
> > 0   postgres0x00010d52f3c0
> ExceptionalCondition + 160
> > 1   postgres0x00010cefbc49
> ForgetAllFdwXactParticipants + 313
> > 2   postgres0x00010cefff14
> AtProcExit_FdwXact + 20
> > 3   postgres0x00010d313fe3 shmem_exit +
> 179
> > 4   postgres0x00010d313e7a
> proc_exit_prepare + 122
> > 5   postgres0x00010d313da3 proc_exit + 19
> > 6   postgres0x00010d35112f PostgresMain
> + 3711
> > 7   postgres0x00010d27bb3a BackendRun +
> 570
> > 8   postgres0x00010d27af6b
> BackendStartup + 475
> > 9   postgres0x00010d279ed1 ServerLoop +
> 593
> > 10  postgres0x00010d277940
> PostmasterMain + 6016
> > 11  postgres0x00010d1597b9 main + 761
> > 12  libdyld.dylib   0x7fff7161e3d5 start + 1
> > 13  ??? 0x0003 0x0 + 3
> >
>
> Thank you for reporting the issue!
>
> I've attached the latest version patch that incorporated all comments
> I got so far. I've removed the patch adding the 'prefer' mode of
> foreign_twophase_commit to keep the patch set simple.
>

I have started to review the patchset. Just a quick comment.

Patch v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch
contains changes (adding fdwxact includes) for
src/backend/executor/nodeForeignscan.c,
src/backend/executor/nodeModifyTable.c
and  src/backend/executor/execPartition.c files that doesn't seem to be
required with the latest version.


Thanks
Best regards
Muhammad Usama



>
> Regards,
>
> --
> Masahiko Sawadahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Building 12.3 from source on Mac

2020-07-23 Thread Paul Förster
Hi Tom,

> On 23. Jul, 2020, at 15:42, Tom Lane  wrote:
> 
> Kind of looks like you *did* install gettext as Daniel suggested
> (macOS proper would never put anything under /usr/local).  Maybe
> you did not ask for that specifically, but installed some package
> that requires it?
> 
> However, Apple's toolchain doesn't search /usr/local by default,
> I believe.  You'll need to add something along the line of
> 
>--with-includes=/usr/local/include --with-libs=/usr/local/lib
> 
> to your configure command.

I tried with your options. Still, same effect. Ok, worth a try.

I found:

paul@meerkat:~$ mdfind -name gettext | egrep -vi "/(share|man|bin|system)/"   
/usr/local/info/gettext.info
/usr/local/lib/gettext
/Library/i-Installer/Receipts/gettext.ii2receipt
/usr/local/include/gettext-po.h

paul@meerkat:~$ mdfind -name libintl | egrep -vi "/(share|man|bin|system)/"
/usr/local/lib/libintl.3.4.3.dylib
/usr/local/lib/libintl.a
/usr/local/lib/libintl.la
/usr/local/include/libintl.h

But I did not *knowingly* install that. I guess it comes as part of Xcode but I 
really don't know. I'm not a developer, I just want to build PostgreSQL for my 
Mac.

Cheers,
Paul



Re: Building 12.3 from source on Mac

2020-07-23 Thread Tom Lane
=?utf-8?Q?Paul_F=C3=B6rster?=  writes:
>> On 23. Jul, 2020, at 15:42, Tom Lane  wrote:
>> However, Apple's toolchain doesn't search /usr/local by default,
>> I believe.  You'll need to add something along the line of
>> --with-includes=/usr/local/include --with-libs=/usr/local/lib
>> to your configure command.

> I tried with your options. Still, same effect. Ok, worth a try.

I might be wrong about that.  But this shows another issue:

> paul@meerkat:~$ mdfind -name libintl | egrep -vi "/(share|man|bin|system)/"
> /usr/local/lib/libintl.3.4.3.dylib
> /usr/local/lib/libintl.a
> /usr/local/lib/libintl.la
> /usr/local/include/libintl.h

Looks like what you lack is a symlink libintl.dylib -> libintl.3.4.3.dylib
in /usr/local/lib.  It's not real clear to me why you'd have .a and .la
files and no versionless symlink, because all of those files would
just be used for linking dependent software.

> But I did not *knowingly* install that. I guess it comes as part of Xcode but 
> I really don't know. I'm not a developer, I just want to build PostgreSQL for 
> my Mac.

These files absolutely, positively, gold-plated 100% did not come
with XCode.  Homebrew installs stuff under /usr/local though.
Not sure about MacPorts.

regards, tom lane




Re: Open Item: Should non-text EXPLAIN always show properties?

2020-07-23 Thread Justin Pryzby
On Thu, Jun 25, 2020 at 08:41:43AM -0400, James Coleman wrote:
> On Thu, Jun 25, 2020 at 5:15 AM David Rowley  wrote:
> > Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should
> > always show the "Disk Usage" and "HashAgg Batches" properties.  I
> > agree with this. show_wal_usage() is a good example of how we normally
> > do things.  We try to keep the text format as humanly readable as
> > possible but don't really expect humans to be commonly reading the
> > other supported formats, so we care less about including additional
> > details there.
> >
> > There's also an open item regarding this for Incremental Sort, so I've
> > CC'd James and Tomas here. This seems like a good place to discuss
> > both.
> 
> Yesterday I'd replied [1] to Justin's proposal for this WRT
> incremental sort and expressed my opinion that including both
> unnecessarily (i.e., including disk when an in-memory sort was used)
> is undesirable and confusing and leads to shortcuts I believe to be
> bad habits when using the data programmatically.

I have gone back and forth about this.

The current non-text output for Incremental Sort is like:

   Sort Methods Used:+
 - "quicksort"   +
   Sort Space Memory:+
 Average Sort Space Used: 26 +
 Peak Sort Space Used: 26+

explain.c determines whether to output in non-text mode by checking:
| if (groupInfo->maxDiskSpaceUsed > 0)

Which I think is per se wrong.  Either it should use a test like:
| if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT != 0)
or it should output the "Sort Space" unconditionally.

It does seem wrong if Incr Sort says "Sort Space Disk / Average: 0, Peak: 0"
when there was no disk sort at all, and it wasn't listed as a "Sort Method".

On the other hand, that's determined during execution, right?  (Based on things
like table content and table order and tuple width) ?  David's argument in
making the HashAgg's explain output unconditionally show Disk/Batch was that
this is not known until execution time (based on table content).

HashAgg now shows:

SET work_mem='64 MB'; explain(format yaml, analyze) SELECT a ,COUNT(1) FROM 
generate_series(1,9)a GROUP BY 1;
...
 Disk Usage: 0   +
 HashAgg Batches: 0  +

So I think I still think incr sort should do the same, showing Disk:0.

Otherwise, I think it should at least use a test like this, rather than 
(DiskSpaceUsed > 0):
| if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT != 0)

-- 
Justin




Re: Building 12.3 from source on Mac

2020-07-23 Thread Paul Förster
Hi Tom,

> On 23. Jul, 2020, at 16:03, Tom Lane  wrote:
> 
> Looks like what you lack is a symlink libintl.dylib -> libintl.3.4.3.dylib
> in /usr/local/lib.  It's not real clear to me why you'd have .a and .la
> files and no versionless symlink, because all of those files would
> just be used for linking dependent software.

there is not a single symlink in /usr/local/lib:

paul@meerkat:~$ ll /usr/local/lib
total 113968
drwxr-xr-x  4 root  wheel  128 Oct 17  2014 ImageMagick-6.3.3
-rw-r--r--+ 1 root  wheel  606 May 14  2010 charset.alias
drwxr-xr-x  6 root  wheel  192 Oct 17  2014 gettext
-rwxr-xr-x+ 1 root  wheel  4244568 Mar  6  2007 libMagick++.10.0.7.dylib
-rw-r--r--+ 1 root  wheel  4722468 Mar  6  2007 libMagick++.a
-rwxr-xr-x+ 1 root  wheel  980 Mar  6  2007 libMagick++.la
-rwxr-xr-x+ 1 root  wheel  8414800 Mar  6  2007 libMagick.10.0.7.dylib
-rw-r--r--+ 1 root  wheel  8129604 Mar  6  2007 libMagick.a
-rwxr-xr-x+ 1 root  wheel  912 Mar  6  2007 libMagick.la
-rwxr-xr-x+ 1 root  wheel  2416164 Mar  6  2007 libWand.10.0.7.dylib
-rw-r--r--+ 1 root  wheel  3354004 Mar  6  2007 libWand.a
-rwxr-xr-x+ 1 root  wheel  926 Mar  6  2007 libWand.la
-rwxr-xr-x+ 1 root  wheel   737672 Sep 23  2006 libasprintf.0.0.0.dylib
-rw-r--r--+ 1 root  wheel47704 Sep 23  2006 libasprintf.a
-rwxr-xr-x+ 1 root  wheel  832 Sep 23  2006 libasprintf.la
-rwxr-xr-x+ 1 root  wheel  4024172 Mar  6  2007 libfreetype.6.3.12.dylib
-rw-r--r--+ 1 root  wheel  4240572 Mar  6  2007 libfreetype.a
-rwxr-xr-x+ 1 root  wheel  838 Mar  6  2007 libfreetype.la
-rwxr-xr-x+ 1 root  wheel  3429720 Mar 13  2007 libgdraw.1.0.14.dylib
-rwxr-xr-x+ 1 root  wheel  891 Mar 13  2007 libgdraw.la
-rwxr-xr-x+ 1 root  wheel   485908 Sep 23  2006 libgettextlib-0.14.5.dylib
-rwxr-xr-x+ 1 root  wheel  908 Sep 23  2006 libgettextlib.la
-rwxr-xr-x+ 1 root  wheel79480 Sep 23  2006 libgettextpo.0.1.0.dylib
-rw-r--r--+ 1 root  wheel62136 Sep 23  2006 libgettextpo.a
-rwxr-xr-x+ 1 root  wheel  954 Sep 23  2006 libgettextpo.la
-rwxr-xr-x+ 1 root  wheel  1097632 Sep 23  2006 libgettextsrc-0.14.5.dylib
-rwxr-xr-x+ 1 root  wheel  940 Sep 23  2006 libgettextsrc.la
-rwxr-xr-x+ 1 root  wheel  5713584 Mar 13  2007 libgunicode.2.0.3.dylib
-rwxr-xr-x+ 1 root  wheel  877 Mar 13  2007 libgunicode.la
-rw-r--r--+ 1 root  wheel   253512 Sep 23  2006 libintl.3.4.3.dylib
-rw-r--r--+ 1 root  wheel   286284 Sep 23  2006 libintl.a
-rw-r--r--+ 1 root  wheel  829 Sep 23  2006 libintl.la
-rwxr-xr-x+ 1 root  wheel  2121700 Mar 13  2007 libuninameslist-fr.0.0.1.dylib
-rwxr-xr-x+ 1 root  wheel  774 Mar 13  2007 libuninameslist-fr.la
-rwxr-xr-x+ 1 root  wheel  2148388 Mar 13  2007 libuninameslist.0.0.1.dylib
-rwxr-xr-x+ 1 root  wheel  756 Mar 13  2007 libuninameslist.la
-rw-r--r--+ 1 root  wheel  1670612 Sep 28  2006 libwmf.a
-rwxr-xr-x+ 1 root  wheel  913 Sep 28  2006 libwmf.la
-rw-r--r--+ 1 root  wheel   571300 Sep 28  2006 libwmflite.a
-rwxr-xr-x+ 1 root  wheel  751 Sep 28  2006 libwmflite.la
drwxr-xr-x  7 root  wheel  224 Oct 17  2014 pkgconfig

> These files absolutely, positively, gold-plated 100% did not come
> with XCode.  Homebrew installs stuff under /usr/local though.
> Not sure about MacPorts.

but I didn't install Homebrew, MacPorts, Fink or other package managers.

Are they leftovers from old OS versions? After all, I started with don't know 
what macOS (Mavericks or Yosemite?) back then and always upgraded the OS, one 
major release after the other and it has always been working fine with no 
problem at all. And now, as I mentioned before, it's Catalina and still works 
with not a single reinstall.

paul@meerkat:~$ uname -a
Darwin meerkat.local 19.6.0 Darwin Kernel Version 19.6.0: Sun Jul  5 00:43:10 
PDT 2020; root:xnu-6153.141.1~9/RELEASE_X86_64 x86_64

Cheers,
Paul





Re: Building 12.3 from source on Mac

2020-07-23 Thread Tom Lane
=?utf-8?Q?Paul_F=C3=B6rster?=  writes:
> there is not a single symlink in /usr/local/lib:

Not only that, but look at the file dates:

> -rw-r--r--+ 1 root  wheel   253512 Sep 23  2006 libintl.3.4.3.dylib
> -rw-r--r--+ 1 root  wheel   286284 Sep 23  2006 libintl.a
> -rw-r--r--+ 1 root  wheel  829 Sep 23  2006 libintl.la

You should see what "file" reports these as, but there's a good
bet that these are 32-bit code and won't even run on Catalina.

>> These files absolutely, positively, gold-plated 100% did not come
>> with XCode.  Homebrew installs stuff under /usr/local though.
>> Not sure about MacPorts.

> but I didn't install Homebrew, MacPorts, Fink or other package managers.

You apparently installed *something*, or several somethings, back
in ought-six or so.  Do you really remember what you were doing
back then?

Anyway, now that we realize these are ancient history, you likely
need to install a more modern version anyway.

regards, tom lane




Re: Building 12.3 from source on Mac

2020-07-23 Thread Paul Förster
Hi Tom,

> On 23. Jul, 2020, at 16:50, Tom Lane  wrote:
> 
> You should see what "file" reports these as, but there's a good
> bet that these are 32-bit code and won't even run on Catalina.

yes, they seem pretty old:

paul@meerkat:/usr/local/lib$ file libintl.*  
libintl.3.4.3.dylib: Mach-O universal binary with 2 architectures: [i386:Mach-O 
dynamically linked shared library i386] [ppc:Mach-O dynamically linked shared 
library ppc]
libintl.3.4.3.dylib (for architecture i386):Mach-O dynamically linked 
shared library i386
libintl.3.4.3.dylib (for architecture ppc): Mach-O dynamically linked 
shared library ppc
libintl.a:   Mach-O universal binary with 2 architectures: 
[i386:current ar archive random library] [ppc:current ar archive random library]
libintl.a (for architecture i386):  current ar archive random library
libintl.a (for architecture ppc):   current ar archive random library
libintl.la:  libtool library file, ASCII text

> You apparently installed *something*, or several somethings, back
> in ought-six or so.  Do you really remember what you were doing
> back then?

I used to have an old iMac. And when I got this laptop, I did a time machine 
backup then and restored it to this laptop to not have to install all my 
software from scratch. Maybe it's old stuff from Java installations. I also 
have XQuartz running since ages now which has been updated from version to 
version, currently 2.7.11. I don't know which it could be that could be that 
old.

Seems like a lot of old stuff:

paul@meerkat:/usr/local/lib$ file * | egrep "(i386|ppc)" | awk '{ print $1 }' | 
tr -d ':' | sort -u
libMagick++.10.0.7.dylib
libMagick++.a
libMagick.10.0.7.dylib
libMagick.a
libWand.10.0.7.dylib
libWand.a
libasprintf.0.0.0.dylib
libasprintf.a
libfreetype.6.3.12.dylib
libfreetype.a
libgdraw.1.0.14.dylib
libgettextlib-0.14.5.dylib
libgettextpo.0.1.0.dylib
libgettextpo.a
libgettextsrc-0.14.5.dylib
libgunicode.2.0.3.dylib
libintl.3.4.3.dylib
libintl.a
libuninameslist-fr.0.0.1.dylib
libuninameslist.0.0.1.dylib
libwmf.a
libwmflite.a

> Anyway, now that we realize these are ancient history, you likely
> need to install a more modern version anyway.

will try, thanks. :-)

Cheers,
Paul





Re: Building 12.3 from source on Mac

2020-07-23 Thread Pavel Borisov
I'd like to add that MacPorts installs everything to /opt/ and /opt/local
unless someone configures other path.
You can also easily check is something from homebrew installation by
running 'brew config' and looking at HOMEBREW_PREFIX entry.

Regards,
Pavel

чт, 23 июл. 2020 г. в 19:05, Paul Förster :

> Hi Tom,
>
> > On 23. Jul, 2020, at 16:50, Tom Lane  wrote:
> >
> > You should see what "file" reports these as, but there's a good
> > bet that these are 32-bit code and won't even run on Catalina.
>
> yes, they seem pretty old:
>
> paul@meerkat:/usr/local/lib$ file libintl.*
> libintl.3.4.3.dylib: Mach-O universal binary with 2 architectures:
> [i386:Mach-O dynamically linked shared library i386] [ppc:Mach-O
> dynamically linked shared library ppc]
> libintl.3.4.3.dylib (for architecture i386):Mach-O dynamically linked
> shared library i386
> libintl.3.4.3.dylib (for architecture ppc): Mach-O dynamically linked
> shared library ppc
> libintl.a:   Mach-O universal binary with 2 architectures:
> [i386:current ar archive random library] [ppc:current ar archive random
> library]
> libintl.a (for architecture i386):  current ar archive random library
> libintl.a (for architecture ppc):   current ar archive random library
> libintl.la:  libtool library file, ASCII text
>
> > You apparently installed *something*, or several somethings, back
> > in ought-six or so.  Do you really remember what you were doing
> > back then?
>
> I used to have an old iMac. And when I got this laptop, I did a time
> machine backup then and restored it to this laptop to not have to install
> all my software from scratch. Maybe it's old stuff from Java installations.
> I also have XQuartz running since ages now which has been updated from
> version to version, currently 2.7.11. I don't know which it could be that
> could be that old.
>
> Seems like a lot of old stuff:
>
> paul@meerkat:/usr/local/lib$ file * | egrep "(i386|ppc)" | awk '{ print
> $1 }' | tr -d ':' | sort -u
> libMagick++.10.0.7.dylib
> libMagick++.a
> libMagick.10.0.7.dylib
> libMagick.a
> libWand.10.0.7.dylib
> libWand.a
> libasprintf.0.0.0.dylib
> libasprintf.a
> libfreetype.6.3.12.dylib
> libfreetype.a
> libgdraw.1.0.14.dylib
> libgettextlib-0.14.5.dylib
> libgettextpo.0.1.0.dylib
> libgettextpo.a
> libgettextsrc-0.14.5.dylib
> libgunicode.2.0.3.dylib
> libintl.3.4.3.dylib
> libintl.a
> libuninameslist-fr.0.0.1.dylib
> libuninameslist.0.0.1.dylib
> libwmf.a
> libwmflite.a
>
> > Anyway, now that we realize these are ancient history, you likely
> > need to install a more modern version anyway.
>
> will try, thanks. :-)
>
> Cheers,
> Paul
>
>
>
>


Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-23 Thread Soumyadeep Chakraborty
On Thu, Jul 23, 2020 at 3:42 AM Amul Sul  wrote:
> The aim of this feature is preventing new WAL records from being generated, 
> not
> preventing them from being flushed to disk, or streamed to standbys, or 
> anything
> else. The rest should happen as normal.
>
> If you can't flush WAL, then you might not be able to evict some number of
> buffers, which in the worst case could be large. That's because you can't 
> evict
> a dirty buffer until WAL has been flushed up to the buffer's LSN (otherwise,
> you wouldn't be following the WAL-before-data rule). And having a potentially
> large number of unevictable buffers around sounds terrible, not only for
> performance, but also for having the system keep working at all.

In the read-only level I was suggesting, I wasn't suggesting that we
stop WAL flushes, in fact we should flush the WAL before we mark the
system as read-only. Once the system declares itself as read-only, it
will not perform any more on-disk changes; It may perform all the
flushes it needs as a part of the read-only request handling.

WAL should still stream to the secondary of course, even after you mark
the primary as read-only.

> Read-only is for the queries.

What I am saying is it doesn't have to be just the queries. I think we
can cater to all the other use cases simply by forcing a checkpoint
before marking the system as read-only.

> The intention is to change the system to read-only ASAP; the checkpoint will
> make it much slower.

I agree - if one needs that speed, then they can do the equivalent of:
ALTER SYSTEM SET read_only to 'wal';
and the expensive checkpoint you mentioned can be avoided.

> I don't think we can skip control file updates that need to make read-only
> state persistent across the restart.

I was referring to control file updates post the read-only state change.
Any updates done as a part of the state change is totally cool.


Regards,
Soumyadeep (VMware)




'with' regression tests fails rarely (and spuriously)

2020-07-23 Thread Andres Freund
Hi,

I've twice seen the below failure when running tests in a loop (to
verify another rare issue in a patch is fixed):

diff -du10 /home/andres/src/postgresql/src/test/regress/expected/with.out 
/home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/with.out
--- /home/andres/src/postgresql/src/test/regress/expected/with.out  
2020-07-21 15:03:11.239754712 -0700
+++ 
/home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/with.out  
2020-07-23 04:25:07.955839299 -0700
@@ -2207,28 +2207,30 @@
Output: a_1.ctid, a_1.aa
  ->  CTE Scan on wcte
Output: wcte.*, wcte.q2
->  Nested Loop
  Output: a_2.ctid, wcte.*
  Join Filter: (a_2.aa = wcte.q2)
  ->  Seq Scan on public.c a_2
Output: a_2.ctid, a_2.aa
  ->  CTE Scan on wcte
Output: wcte.*, wcte.q2
-   ->  Nested Loop
+   ->  Hash Join
  Output: a_3.ctid, wcte.*
- Join Filter: (a_3.aa = wcte.q2)
+ Hash Cond: (a_3.aa = wcte.q2)
  ->  Seq Scan on public.d a_3
Output: a_3.ctid, a_3.aa
- ->  CTE Scan on wcte
+ ->  Hash
Output: wcte.*, wcte.q2
-(38 rows)
+   ->  CTE Scan on wcte
+ Output: wcte.*, wcte.q2
+(40 rows)
 
 -- error cases
 -- data-modifying WITH tries to use its own output
 WITH RECURSIVE t AS (
INSERT INTO y
SELECT * FROM t
 )
 VALUES(FALSE);
 ERROR:  recursive query "t" must not contain data-modifying statements
 LINE 1: WITH RECURSIVE t AS (


Searching the archives didn't unearth other reports of the same.


This was the first failure after 404 iterations of installcheck, so it's
clearly not a common occurance on my machine.

Greetings,

Andres Freund




Making CASE error handling less surprising

2020-07-23 Thread Tom Lane
Every so often we get a complaint like [1] about how a CASE should have
prevented a run-time error and didn't, because constant-folding tried
to evaluate a subexpression that would not have been entered at run-time.

It struck me that it would not be hard to improve this situation a great
deal.  If, within a CASE subexpression that isn't certain to be executed
at runtime, we refuse to pre-evaluate *any* function (essentially, treat
them all as volatile), then we should largely get the semantics that
users expect.  There's some potential for query slowdown if a CASE
contains a constant subexpression that we formerly reduced at plan time
and now do not, but that doesn't seem to me to be a very big deal.

Attached is a draft patch that handles CASE and COALESCE this way.

This is not a complete fix, because if you write a sub-SELECT the
contents of the sub-SELECT are not processed by the outer query's
eval_const_expressions pass; instead, we look at it within the
sub-SELECT itself, and in that context there's no apparent reason
to avoid const-folding.  So
   CASE WHEN x < 0 THEN (SELECT 1/0) END
fails even if x is never less than zero.  I don't see any great way
to avoid that, and I'm not particularly concerned about it anyhow;
usually the point of a sub-SELECT like this is to be decoupled from
outer query evaluation, so that the behavior should not be that
surprising.

One interesting point is that the join regression test contains a
number of uses of "coalesce(int8-variable, int4-constant)" which is
treated a little differently than before: we no longer constant-fold
the int4 constant to int8.  That causes the run-time cost of the
expression to be estimated slightly higher, which changes plans in
a couple of these tests; and in any case the EXPLAIN output looks
different since it shows the runtime coercion explicitly.  To avoid
those changes I made all these examples quote the constants, so that
the parser resolves them as int8 out of the gate.  (Perhaps it'd be
okay to just accept the changes, but I didn't feel like trying to
analyze in detail what each test case had been meant to prove.)

Also, I didn't touch the docs yet.  Sections 4.2.14 and 9.18.1
contain some weasel wording that could be backed off, but in light
of the sub-SELECT exception we can't just remove the issue
altogether I think.  Not quite sure how to word it.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/16549-4991fbf36fcec234%40postgresql.org

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index e04b144072..8a41dce235 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -61,6 +61,15 @@ typedef struct
 	AggClauseCosts *costs;
 } get_agg_clause_costs_context;
 
+typedef enum
+{
+	/* Ordering is important here! */
+	ece_eval_nothing,			/* be unconditionally safe */
+	ece_eval_immutable,			/* eval immutable functions */
+	ece_eval_stable,			/* eval stable functions too */
+	ece_eval_volatile			/* eval volatile functions too */
+} ece_level;
+
 typedef struct
 {
 	ParamListInfo boundParams;
@@ -68,6 +77,7 @@ typedef struct
 	List	   *active_fns;
 	Node	   *case_val;
 	bool		estimate;
+	ece_level	eval_level;
 } eval_const_expressions_context;
 
 typedef struct
@@ -119,6 +129,8 @@ static Node *eval_const_expressions_mutator(Node *node,
 static bool contain_non_const_walker(Node *node, void *context);
 static bool ece_function_is_safe(Oid funcid,
  eval_const_expressions_context *context);
+static bool ece_provolatile_is_safe(char provolatile,
+	eval_const_expressions_context *context);
 static Node *apply_const_relabel(Node *arg, Oid rtype,
  int32 rtypmod, Oid rcollid,
  CoercionForm rformat, int rlocation);
@@ -2264,6 +2276,7 @@ eval_const_expressions(PlannerInfo *root, Node *node)
 	context.active_fns = NIL;	/* nothing being recursively simplified */
 	context.case_val = NULL;	/* no CASE being examined */
 	context.estimate = false;	/* safe transformations only */
+	context.eval_level = ece_eval_immutable;	/* eval immutable functions */
 	return eval_const_expressions_mutator(node, &context);
 }
 
@@ -2280,8 +2293,11 @@ eval_const_expressions(PlannerInfo *root, Node *node)
  *	  available by the caller of planner(), even if the Param isn't marked
  *	  constant.  This effectively means that we plan using the first supplied
  *	  value of the Param.
- * 2. Fold stable, as well as immutable, functions to constants.
+ * 2. Fold stable, as well as immutable, functions to constants.  The risk
+ *	  that the result might change from planning time to execution time is
+ *	  worth taking, as we otherwise couldn't get an estimate at all.
  * 3. Reduce PlaceHolderVar nodes to their contained expressions.
+ * 4. Ignore domain constraints, assuming that CoerceToDomain will succeed.
  *
  */
 Node *
@@ -2295,6 +2311,7 @@ estimate_expression_value(PlannerInfo *root, Node

Re: Loaded footgun open_datasync on Windows

2020-07-23 Thread Jeff Janes
On Fri, Sep 14, 2018 at 3:32 AM Michael Paquier  wrote:

> On Fri, Sep 14, 2018 at 08:43:18AM +0200, Laurenz Albe wrote:
>
> > If it turns out not to break anything, would you consider backpatching?
> > On the one hand it fixes a bug, on the other hand it affects all
> > frontend executables...
>
> Yeah, for this reason I would not do a backpatch.  I have a very hard
> time to believe that any frontend tools on Windows developed by anybody
> rely on files to be opened only by a single process, still if they do
> they would be surprised to see a change of behavior after a minor
> update in case they rely on the concurrency limitations.
>

Reviving an old thread here.

Could it be back-patched in some pg_test_fsync specific variant?  I
don't think we should just ignore the fact that pg_test_fsync on Windows is
unfit for its intended purpose on 4 still-supported versions.


> > I wonder why nobody noticed the problem in pg_test_fsync earlier.
> > Is it that people running Windows care less if their storage is
> > reliable?
>
> likely so.
>

I have noticed this before, but since it wasn't a production machine I just
shrugged it off as being a hazard of using consumer-grade stuff; it didn't
seem to be worth investigating further.

 Cheers,

Jeff


Re: 'with' regression tests fails rarely (and spuriously)

2020-07-23 Thread Tom Lane
Andres Freund  writes:
> I've twice seen the below failure when running tests in a loop (to
> verify another rare issue in a patch is fixed):

Weird.  It sort of looks like autovacuum came along and changed the
stats for those tables, but I didn't think they were big enough to
draw autovac's attention.

regards, tom lane




[BUG] Error in BRIN summarization

2020-07-23 Thread Anastasia Lubennikova
One of our clients caught an error "failed to find parent tuple for 
heap-only tuple at (50661,130) in table "tbl'" in PostgreSQL v12.


Steps to reproduce (REL_12_STABLE):

1) Create table with primary key, create brin index, fill table with 
some initial data:


create table tbl (id int primary key, a int) with (fillfactor=50);
create index idx on tbl using brin (a) with (autosummarize=on);
insert into tbl select i, i from generate_series(0,10) as i;

2) Run script test_brin.sql using pgbench:

 pgbench postgres -f ../review/brin_test.sql  -n -T 120

The script is a bit messy because I was trying to reproduce a 
problematic workload. Though I didn't manage to simplify it.
The idea is that it inserts new values into the table to produce 
unindexed pages and also updates some values to trigger HOT-updates on 
these pages.


3) Open psql session and run brin_summarize_new_values

select brin_summarize_new_values('idx'::regclass::oid); \watch 2

Wait a bit. And in psql you will see the ERROR.

This error is caused by the problem with root_offsets array bounds. It 
occurs if a new HOT tuple was inserted after we've collected 
root_offsets, and thus we don't have root_offset for tuple's offnum. 
Concurrent insertions are possible, because brin_summarize_new_values() 
only holds ShareUpdateLock on table and no lock (only pin) on the page.


The draft fix is in the attachments. It saves root_offsets_size and 
checks that we only access valid fields.

Patch also adds some debug messages, just to ensure that problem was caught.

TODO:

- check if  heapam_index_validate_scan() has the same problem
- code cleanup
- test other PostgreSQL versions

[1] 
https://www.postgresql.org/message-id/flat/CA%2BTgmoYgwjmmjK24Qxb_vWAu8_Hh7gfVFcr3%2BR7ocdLvYOWJXg%40mail.gmail.com


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 71bdcd268386c4240613b080b9ebd5ae935c1405
Author: anastasia 
Date:   Thu Jul 23 17:55:16 2020 +0300

WIP Fix root_offsets out of bound error in brin_summarize_new_values()

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index fc19f40a2e3..34f74a61ea3 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1176,6 +1176,7 @@ heapam_index_build_range_scan(Relation heapRelation,
 	BlockNumber previous_blkno = InvalidBlockNumber;
 	BlockNumber root_blkno = InvalidBlockNumber;
 	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
+	OffsetNumber root_offsets_size = 0;
 
 	/*
 	 * sanity checks
@@ -1355,7 +1356,7 @@ heapam_index_build_range_scan(Relation heapRelation,
 			Page		page = BufferGetPage(hscan->rs_cbuf);
 
 			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
+			root_offsets_size = heap_get_root_tuples(page, root_offsets);
 			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 			root_blkno = hscan->rs_cblock;
@@ -1643,13 +1644,31 @@ heapam_index_build_range_scan(Relation heapRelation,
 			rootTuple = *heapTuple;
 			offnum = ItemPointerGetOffsetNumber(&heapTuple->t_self);
 
+			elog(LOG, "HeapTupleIsHeapOnly. Check Offset last_valid %d offnum %d", root_offsets_size, offnum);
+
+			if (root_offsets_size <= offnum)
+			{
+Page		page = BufferGetPage(hscan->rs_cbuf);
+
+ereport(WARNING,
+	(errcode(ERRCODE_DATA_CORRUPTED),
+		errmsg_internal("DEBUG failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
+		ItemPointerGetBlockNumber(&heapTuple->t_self),
+		offnum,
+		RelationGetRelationName(heapRelation;
+
+LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
+root_offsets_size = heap_get_root_tuples(page, root_offsets);
+LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+			}
+
 			if (!OffsetNumberIsValid(root_offsets[offnum - 1]))
 ereport(ERROR,
-		(errcode(ERRCODE_DATA_CORRUPTED),
-		 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-		 ItemPointerGetBlockNumber(&heapTuple->t_self),
-		 offnum,
-		 RelationGetRelationName(heapRelation;
+	(errcode(ERRCODE_DATA_CORRUPTED),
+	 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
+	ItemPointerGetBlockNumber(&heapTuple->t_self),
+	offnum,
+	RelationGetRelationName(heapRelation;
 
 			ItemPointerSetOffsetNumber(&rootTuple.t_self,
 	   root_offsets[offnum - 1]);
@@ -1820,6 +1839,7 @@ heapam_index_validate_scan(Relation heapRelation,
 		if (HeapTupleIsHeapOnly(heapTuple))
 		{
 			root_offnum = root_offsets[root_offnum - 1];
+			// TODO add check of root_offsets_size like in heapam_index_build_range_scan()
 			if (!OffsetNumberIsValid(root_offnum))
 ereport(ERROR,
 		(errcode(ERRCODE_DATA_CORRUPTED),
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0efe3ce9995..9b2bff1

Re: Making CASE error handling less surprising

2020-07-23 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> Every so often we get a complaint like [1] about how a CASE should have
> prevented a run-time error and didn't, because constant-folding tried
> to evaluate a subexpression that would not have been entered at run-time.
>
> It struck me that it would not be hard to improve this situation a great
> deal.  If, within a CASE subexpression that isn't certain to be executed
> at runtime, we refuse to pre-evaluate *any* function (essentially, treat
> them all as volatile), then we should largely get the semantics that
> users expect.  There's some potential for query slowdown if a CASE
> contains a constant subexpression that we formerly reduced at plan time
> and now do not, but that doesn't seem to me to be a very big deal.
[…]
> Thoughts?

Would it be feasible to set up an exception handler when constant-
folding cases that might not be reached, and leave the expression
unfolded only if an error was thrown, or does that have too much
overhead to be worthwhile?

- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen




Re: 'with' regression tests fails rarely (and spuriously)

2020-07-23 Thread Andres Freund
Hi,

On 2020-07-23 13:05:32 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I've twice seen the below failure when running tests in a loop (to
> > verify another rare issue in a patch is fixed):
> 
> Weird.  It sort of looks like autovacuum came along and changed the
> stats for those tables, but I didn't think they were big enough to
> draw autovac's attention.

Hm. I guess I could run them again after enabling more logging. Don't
really have a better idea. Probably not worth investing more energy into
if I can't readily reproduce over night.

Greetings,

Andres Freund




HOT vs freezing issue causing "cannot freeze committed xmax"

2020-07-23 Thread Andres Freund
Hi,

In a development branch of mine Thomas / the CF bot found a relatively
rare regression failures. That turned out to be because there was an
edge case in which heap_page_prune() was a bit more pessimistic than
lazy_scan_heap(). But I wonder if this isn't an issue more broadly:

The issue I am concerned about is lazy_scan_heap()'s logic for DEAD HOT
updated tuples:

/*
 * Ordinarily, DEAD tuples would have 
been removed by
 * heap_page_prune(), but it's possible 
that the tuple
 * state changed since 
heap_page_prune() looked.  In
 * particular an INSERT_IN_PROGRESS 
tuple could have
 * changed to DEAD if the inserter 
aborted.  So this
 * cannot be considered an error 
condition.
 *
 * If the tuple is HOT-updated then it 
must only be
 * removed by a prune operation; so we 
keep it just as if
 * it were RECENTLY_DEAD.  Also, if 
it's a heap-only
 * tuple, we choose to keep it, because 
it'll be a lot
 * cheaper to get rid of it in the next 
pruning pass than
 * to treat it like an indexed tuple. 
Finally, if index
 * cleanup is disabled, the second heap 
pass will not
 * execute, and the tuple will not get 
removed, so we must
 * treat it like any other dead tuple 
that we choose to
 * keep.
 *
 * If this were to happen for a tuple 
that actually needed
 * to be deleted, we'd be in trouble, 
because it'd
 * possibly leave a tuple below the 
relation's xmin
 * horizon alive.  
heap_prepare_freeze_tuple() is prepared
 * to detect that case and abort the 
transaction,
 * preventing corruption.
 */
if (HeapTupleIsHotUpdated(&tuple) ||
HeapTupleIsHeapOnly(&tuple) ||
params->index_cleanup == 
VACOPT_TERNARY_DISABLED)
nkeep += 1;
else
tupgone = true; /* we can 
delete the tuple */
all_visible = false;
break;

In the case the HOT logic triggers, we'll call
heap_prepare_freeze_tuple() even when the tuple is dead. Which then can
lead us to
if (TransactionIdPrecedes(xid, cutoff_xid))
{
/*
 * If we freeze xmax, make absolutely sure that it's 
not an XID
 * that is important.  (Note, a lock-only xmax can be 
removed
 * independent of committedness, since a committed lock 
holder has
 * released the lock).
 */
if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
TransactionIdDidCommit(xid))
ereport(PANIC,

(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg_internal("cannot freeze 
committed xmax %u",

 xid)));
freeze_xmax = true;

(before those errors we'd just have unset xmax)

Now obviously the question is whether it's possible that
heap_page_prune() left alive anything that could be seen as DEAD for the
check in lazy_scan_heap(), and that additionally is older than the
cutoff passed to heap_prepare_freeze_tuple().

I'm not sure - it seems like it could be possible in some corner cases,
when transactions abort after the heap_page_prune() but before the
second HeapTupleSatisfiesVacuum().

But regardless of whether it's possible today, it seems extremely
fragile. ISTM we should at least have a bunch of additional error checks
in the HOT branch for HEAPTUPLE_DEAD.

Greetings,

Andres Freund




Re: Making CASE error handling less surprising

2020-07-23 Thread Andres Freund
Hi,

On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane  writes:
> 
> > Every so often we get a complaint like [1] about how a CASE should have
> > prevented a run-time error and didn't, because constant-folding tried
> > to evaluate a subexpression that would not have been entered at run-time.
> >
> > It struck me that it would not be hard to improve this situation a great
> > deal.  If, within a CASE subexpression that isn't certain to be executed
> > at runtime, we refuse to pre-evaluate *any* function (essentially, treat
> > them all as volatile), then we should largely get the semantics that
> > users expect.  There's some potential for query slowdown if a CASE
> > contains a constant subexpression that we formerly reduced at plan time
> > and now do not, but that doesn't seem to me to be a very big deal.
> […]
> > Thoughts?
> 
> Would it be feasible to set up an exception handler when constant-
> folding cases that might not be reached, and leave the expression
> unfolded only if an error was thrown, or does that have too much
> overhead to be worthwhile?

That'd require using a subtransaction for expression
simplification. That'd be way too high overhead.

Given how often we've had a need to call functions while handling
errors, I do wonder if it'd be worthwhile and feasible to mark functions
as being safe to call without subtransactions, or mark them as not
erroring out (e.g. comparators would usually be safe).

Greetings,

Andres Freund




heap_abort_speculative() sets xmin to Invalid* without HEAP_XMIN_INVALID

2020-07-23 Thread Andres Freund
Hi,

After adding a few assertions to validate the connection scalability
patch I saw failures that also apply to master:

I added an assertion to TransactionIdIsCurrentTransactionId(),
*IsInProgress(), ... ensuring that the xid is within an expected
range. Which promptly failed in isolation tests.

The reason for that is that heap_abort_speculative() sets xmin to
InvalidTransactionId but does *not* add HEAP_XMIN_INVALID to infomask.

The various HeapTupleSatisfies* routines avoid calling those routines
for invalid xmins by checking HeapTupleHeaderXminInvalid() first. But
since heap_abort_speculative() didn't set that, we end up calling
TransactionIdIsCurrentTransactionId(InvalidTransactionId) which then
triggered my assertion.


Obviously I can trivially fix that by adjusting the assertions to allow
InvalidTransactionId. But to me it seems fragile to only have xmin == 0
in one rare occasion, and to rely on TransactionIdIs* to return
precisely the right thing without those functions necessarily having
been designed with that in mind.


I think we should change heap_abort_speculative() to set
HEAP_XMIN_INVALID in master. But we can't really do anything about
existing tuples without it - therefore we will have to forever take care
about encountering that combination :(.


Perhaps we should instead or additionally make
HeapTupleHeaderXminInvalid() explicitly check for InvalidTransactionId?

Greetings,

Andres Freund




Re: Making CASE error handling less surprising

2020-07-23 Thread Tom Lane
Andres Freund  writes:
> On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
>> Would it be feasible to set up an exception handler when constant-
>> folding cases that might not be reached, and leave the expression
>> unfolded only if an error was thrown, or does that have too much
>> overhead to be worthwhile?

> That'd require using a subtransaction for expression
> simplification. That'd be way too high overhead.

That's my opinion as well.  It'd be a subtransaction for *each*
operator/function call we need to simplify, which seems completely
disastrous.

> Given how often we've had a need to call functions while handling
> errors, I do wonder if it'd be worthwhile and feasible to mark functions
> as being safe to call without subtransactions, or mark them as not
> erroring out (e.g. comparators would usually be safe).

Yeah.  I was wondering whether the existing "leakproof" marking would
be adequate for this purpose.  It's a little stronger than what we
need, but the pain-in-the-rear factor for adding YA function property
is high enough that I'm inclined to just use it anyway.

We do have to assume that "leakproof" includes "cannot throw any
input-dependent error", but it seems to me that that's true.

regards, tom lane




Re: Making CASE error handling less surprising

2020-07-23 Thread Pavel Stehule
čt 23. 7. 2020 v 21:43 odesílatel Tom Lane  napsal:

> Andres Freund  writes:
> > On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
> >> Would it be feasible to set up an exception handler when constant-
> >> folding cases that might not be reached, and leave the expression
> >> unfolded only if an error was thrown, or does that have too much
> >> overhead to be worthwhile?
>
> > That'd require using a subtransaction for expression
> > simplification. That'd be way too high overhead.
>
> That's my opinion as well.  It'd be a subtransaction for *each*
> operator/function call we need to simplify, which seems completely
> disastrous.
>
> > Given how often we've had a need to call functions while handling
> > errors, I do wonder if it'd be worthwhile and feasible to mark functions
> > as being safe to call without subtransactions, or mark them as not
> > erroring out (e.g. comparators would usually be safe).
>
> Yeah.  I was wondering whether the existing "leakproof" marking would
> be adequate for this purpose.  It's a little stronger than what we
> need, but the pain-in-the-rear factor for adding YA function property
> is high enough that I'm inclined to just use it anyway.
>
> We do have to assume that "leakproof" includes "cannot throw any
> input-dependent error", but it seems to me that that's true.
>

I am afraid of a performance impact.

lot of people expects constant folding everywhere now and I can imagine
query like

SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye')  END FROM ...

Now, it is optimized well, but with the proposed patch, this query can be
slow.

We should introduce planner safe functions for some usual functions, or
maybe better explain the behaviour, the costs, and benefits.  I don't think
this issue is too common.

Regards

Pavel



> regards, tom lane
>
>
>


Re: Making CASE error handling less surprising

2020-07-23 Thread Andres Freund
Hi,

On 2020-07-23 15:43:44 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
> >> Would it be feasible to set up an exception handler when constant-
> >> folding cases that might not be reached, and leave the expression
> >> unfolded only if an error was thrown, or does that have too much
> >> overhead to be worthwhile?
> 
> > That'd require using a subtransaction for expression
> > simplification. That'd be way too high overhead.
> 
> That's my opinion as well.  It'd be a subtransaction for *each*
> operator/function call we need to simplify, which seems completely
> disastrous.

I guess we could optimize it to be one subtransaction by having error
recovery be to redo simplification with a parameter that prevents doing
simplification within CASE etc. Still too unattractive performancewise
to consider imo.


> > Given how often we've had a need to call functions while handling
> > errors, I do wonder if it'd be worthwhile and feasible to mark functions
> > as being safe to call without subtransactions, or mark them as not
> > erroring out (e.g. comparators would usually be safe).
> 
> Yeah.  I was wondering whether the existing "leakproof" marking would
> be adequate for this purpose.  It's a little stronger than what we
> need, but the pain-in-the-rear factor for adding YA function property
> is high enough that I'm inclined to just use it anyway.

Hm, I didn't consider that. Good idea.


> We do have to assume that "leakproof" includes "cannot throw any
> input-dependent error", but it seems to me that that's true.

A quick look through the list seems to confirm that. There's errors like
in text_starts_with:

if (mylocale && !mylocale->deterministic)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("nondeterministic collations are not 
supported for substring searches")));

but that's not a content dependent error, so I don't think it's problem.


So the idea would be to continue to do simplification like we do right
now for things outside a CASE but to only call leakproof functions
within a case?

Is there any concern about having to do additional lookups for
leakproofness? It doesn't seem likely to me since we already need to do
lookups for the FmgrInfo?

Greetings,

Andres Freund




Re: Making CASE error handling less surprising

2020-07-23 Thread Pavel Stehule
čt 23. 7. 2020 v 21:56 odesílatel Pavel Stehule 
napsal:

>
>
> čt 23. 7. 2020 v 21:43 odesílatel Tom Lane  napsal:
>
>> Andres Freund  writes:
>> > On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
>> >> Would it be feasible to set up an exception handler when constant-
>> >> folding cases that might not be reached, and leave the expression
>> >> unfolded only if an error was thrown, or does that have too much
>> >> overhead to be worthwhile?
>>
>> > That'd require using a subtransaction for expression
>> > simplification. That'd be way too high overhead.
>>
>> That's my opinion as well.  It'd be a subtransaction for *each*
>> operator/function call we need to simplify, which seems completely
>> disastrous.
>>
>> > Given how often we've had a need to call functions while handling
>> > errors, I do wonder if it'd be worthwhile and feasible to mark functions
>> > as being safe to call without subtransactions, or mark them as not
>> > erroring out (e.g. comparators would usually be safe).
>>
>> Yeah.  I was wondering whether the existing "leakproof" marking would
>> be adequate for this purpose.  It's a little stronger than what we
>> need, but the pain-in-the-rear factor for adding YA function property
>> is high enough that I'm inclined to just use it anyway.
>>
>> We do have to assume that "leakproof" includes "cannot throw any
>> input-dependent error", but it seems to me that that's true.
>>
>
> I am afraid of a performance impact.
>
> lot of people expects constant folding everywhere now and I can imagine
> query like
>
> SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye')  END FROM ...
>
> Now, it is optimized well, but with the proposed patch, this query can be
> slow.
>
> We should introduce planner safe functions for some usual functions, or
> maybe better explain the behaviour, the costs, and benefits.  I don't think
> this issue is too common.
>

what about different access. We can introduce function

create or replace function volatile_expr(anyelement) returns anyelement as
$$ begin return $1; end $$ language plpgsql;

and this can be used as a constant folding optimization fence.

select case col when 1 then volatile_expr(1/$1) else $1 end;

I don't think so people have a problem with this behaviour - the problem is
unexpected behaviour change between major releases without really
illustrative explanation in documentation.

Regards

Pavel


> Regards
>
> Pavel
>
>
>
>> regards, tom lane
>>
>>
>>


Re: Making CASE error handling less surprising

2020-07-23 Thread Andres Freund
Hi,

On 2020-07-23 21:56:26 +0200, Pavel Stehule wrote:
> I am afraid of a performance impact.
> 
> lot of people expects constant folding everywhere now and I can imagine
> query like
> 
> SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye')  END FROM ...
> 
> Now, it is optimized well, but with the proposed patch, this query can be
> slow.

I'd be more concerned about thinks like conditional expressions that
involve both columns and non-comparison operations on constants. Where
right now we'd simplify the constant part of the expression, but
wouldn't at all anymore after this.

Is there an argument to continue simplifying expressions within case
when only involving "true" constants even with not leakproof functions,
but only simplify "pseudo" constants like parameters with leakproof
functions?  I.e CASE WHEN ... THEN 1 / 0 would still raise an error
during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
is not a real constant (even if PARAM_FLAG_CONST).

It doesn't seem like it'd be too hard to implement that, but that it'd
probably be fairly bulky because we'd need to track more state across
recursive expression_tree_mutator() calls.

Greetings,

Andres Freund




Re: Making CASE error handling less surprising

2020-07-23 Thread Tom Lane
Andres Freund  writes:
> Is there any concern about having to do additional lookups for
> leakproofness? It doesn't seem likely to me since we already need to do
> lookups for the FmgrInfo?

No, we could easily fix it so that one syscache lookup gets both
the provolatile and proleakproof markings.

regards, tom lane




Re: Making CASE error handling less surprising

2020-07-23 Thread Tom Lane
Andres Freund  writes:
> Is there an argument to continue simplifying expressions within case
> when only involving "true" constants even with not leakproof functions,
> but only simplify "pseudo" constants like parameters with leakproof
> functions?  I.e CASE WHEN ... THEN 1 / 0 would still raise an error
> during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
> is not a real constant (even if PARAM_FLAG_CONST).

Hmm, interesting idea.  That might fix all the practical cases in plpgsql,
but it wouldn't do anything to make the behavior more explainable.  Not
sure if we care about that though.

If we go this way I'd be inclined to do this instead of, not in addition
to, what I originally proposed.  Not sure if that was how you envisioned
it, but I think this is probably sufficient for its purpose and we would
not need any additional lobotomization of const-simplification.

> It doesn't seem like it'd be too hard to implement that, but that it'd
> probably be fairly bulky because we'd need to track more state across
> recursive expression_tree_mutator() calls.

It wouldn't be any harder than what I posted upthread; it would
just be a different flag getting passed down in the context struct
and getting tested in a different place.

regards, tom lane




Re: Making CASE error handling less surprising

2020-07-23 Thread Andres Freund
Hi,

On 2020-07-23 16:34:25 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Is there an argument to continue simplifying expressions within case
> > when only involving "true" constants even with not leakproof functions,
> > but only simplify "pseudo" constants like parameters with leakproof
> > functions?  I.e CASE WHEN ... THEN 1 / 0 would still raise an error
> > during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
> > is not a real constant (even if PARAM_FLAG_CONST).
> 
> Hmm, interesting idea.  That might fix all the practical cases in plpgsql,
> but it wouldn't do anything to make the behavior more explainable.  Not
> sure if we care about that though.

I've probably done too much compiler stuff, but to me it doesn't seem
too hard to understand that purely constant expressions may get
evaluated unconditionally even when inside a CASE, but everything else
won't. The fact that we sometimes optimize params to be essentially
constants isn't really exposed to users, so shouldn't be confusing.


> If we go this way I'd be inclined to do this instead of, not in addition
> to, what I originally proposed.  Not sure if that was how you envisioned
> it, but I think this is probably sufficient for its purpose and we would
> not need any additional lobotomization of const-simplification.

Yea, I would assume that we'd not need anything else. I've not thought
about the subquery case yet, so perhaps it'd be desirable to do
something additional there.


> > It doesn't seem like it'd be too hard to implement that, but that it'd
> > probably be fairly bulky because we'd need to track more state across
> > recursive expression_tree_mutator() calls.
> 
> It wouldn't be any harder than what I posted upthread; it would
> just be a different flag getting passed down in the context struct
> and getting tested in a different place.

Cool.

Greetings,

Andres Freund




Re: Making CASE error handling less surprising

2020-07-23 Thread Andres Freund
Hi,

On 2020-07-23 13:42:08 -0700, Andres Freund wrote:
> On 2020-07-23 16:34:25 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > It doesn't seem like it'd be too hard to implement that, but that it'd
> > > probably be fairly bulky because we'd need to track more state across
> > > recursive expression_tree_mutator() calls.
> > 
> > It wouldn't be any harder than what I posted upthread; it would
> > just be a different flag getting passed down in the context struct
> > and getting tested in a different place.
> 
> Cool.

Hm. Would SQL function inlining be a problem? It looks like that just
substitutes parameters. Before calling
eval_const_expressions_mutator(). So we'd not know not to evaluate such
"pseudo constants".  And that'd probably be confusing, especially
because it's not exactly obvious when inlining happens.

Greetings,

Andres Freund




Re: Making CASE error handling less surprising

2020-07-23 Thread Tom Lane
Andres Freund  writes:
> Hm. Would SQL function inlining be a problem? It looks like that just
> substitutes parameters. Before calling
> eval_const_expressions_mutator(). So we'd not know not to evaluate such
> "pseudo constants".  And that'd probably be confusing, especially
> because it's not exactly obvious when inlining happens.

Hm, interesting question.  I think it might be all right without any
further hacking, because the parameters we care about substituting
would have been handled (or not) before inlining.  But the interactions
would be ticklish, and surely worthy of a test case or three.

regards, tom lane




Re: Making CASE error handling less surprising

2020-07-23 Thread Andres Freund
Hi,

On 2020-07-23 16:56:44 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Hm. Would SQL function inlining be a problem? It looks like that just
> > substitutes parameters. Before calling
> > eval_const_expressions_mutator(). So we'd not know not to evaluate such
> > "pseudo constants".  And that'd probably be confusing, especially
> > because it's not exactly obvious when inlining happens.
> 
> Hm, interesting question.  I think it might be all right without any
> further hacking, because the parameters we care about substituting
> would have been handled (or not) before inlining.  But the interactions
> would be ticklish, and surely worthy of a test case or three.

I'm a bit worried about a case like:

SELECT foo(17);
CREATE FUNCTION yell(int, int)
RETURNS int
IMMUTABLE
LANGUAGE SQL AS $$
   SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
$$;

EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);

I don't think the parameters here would have been handled before
inlining, right?

Greetings,

Andres Freund




Re: heap_abort_speculative() sets xmin to Invalid* without HEAP_XMIN_INVALID

2020-07-23 Thread Alvaro Herrera
On 2020-Jul-23, Andres Freund wrote:

> I think we should change heap_abort_speculative() to set
> HEAP_XMIN_INVALID in master.

+1

> But we can't really do anything about
> existing tuples without it - therefore we will have to forever take care
> about encountering that combination :(.
> 
> Perhaps we should instead or additionally make
> HeapTupleHeaderXminInvalid() explicitly check for InvalidTransactionId?

+1 for doing it as an additional fix, with a fat comment somewhere
explaining where such tuples would come from.

Additionally, but perhaps not very usefully, maybe we could have a
mechanism to inject such tuples so that code can be hardened against the
condition.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Default setting for enable_hashagg_disk

2020-07-23 Thread Peter Geoghegan
On Fri, Jul 17, 2020 at 5:13 PM Jeff Davis  wrote:
> The patch itself looks reasonable to me. I don't see a lot of obvious
> dangers, but perhaps someone would like to take a closer look at the
> planner changes as you suggest.

Attached is v3 of the hash_mem_multiplier patch series, which now has
a preparatory patch that removes hashagg_avoid_disk_plan. What do you
think of this approach, Jeff?

It seems as if removing hashagg_avoid_disk_plan will necessitate
removing various old bits of planner.c that were concerned with
avoiding hash aggs that spill (the bits that hashagg_avoid_disk_plan
skips in the common case where it's turned off). This makes v3-0001-*
a bit trickier than I imagined it would have to be. At least it lowers
the footprint of the hash_mem_multiplier code added by v3-0002-*
(compared to the last version of the patch).

I find the partial group paths stuff added to planner.c by commit
4f15e5d09de rather confusing (that commit was preparatory work for the
main feature commit e2f1eb0e). Hopefully the
hash_mem_multiplier-removal patch didn't get anything wrong in this
area. Perhaps Robert can comment on this as the committer of record
for partition-wise grouping/aggregation.

I would like to commit this patch series by next week, and close out
the two relevant open items. Separately, I suspect that we'll also
need to update the cost model for hash aggs that spill, but that now
seems like a totally unrelated matter. I'm waiting to hear back from
Tomas about that. Tomas?

Thanks
--
Peter Geoghegan


v3-0001-Remove-hashagg_avoid_disk_plan-GUC.patch
Description: Binary data


v3-0002-Add-hash_mem_multiplier-GUC.patch
Description: Binary data


Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-23 Thread Soumyadeep Chakraborty
On Thu, Jul 23, 2020 at 3:57 AM Amul Sul  wrote:

> Well, once we've initiated the change to a read-only state, we probably want 
> to
> always either finish that change or go back to read-write, even if the process
> that initiated the change is interrupted. Leaving the system in a
> half-way-in-between state long term seems bad. Maybe we would have put some
> background process, but choose the checkpointer in charge of making the state
> change and to avoid the new background process to keep the first version patch
> simple.  The checkpointer isn't likely to get killed, but if it does, it will
> be relaunched and the new one can clean things up. On the other hand, I agree
> making the checkpointer responsible for more than one thing might not
> be a good idea
> but I don't think the postmaster should do the work that any
> background process can
> do.

+1 for doing it in a background process rather than in the backend
itself (as we can't risk doing it in a backend as it can crash and won't
restart and clean up as a background process would).

As my co-worker pointed out to me, doing the work in the postmaster is a
very bad idea as we don't want delays in serving connection requests on
account of the barrier that comes with this patch.

I would like to see this responsibility in a separate auxiliary process
but I guess having it in the checkpointer isn't the end of the world.

Regards,
Soumyadeep (VMware)




Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-23 Thread Soumyadeep Chakraborty
On Thu, Jun 18, 2020 at 7:54 AM Robert Haas  wrote:
> I think we'd want the FIRST write operation to be the end-of-recovery
> checkpoint, before the system is fully read-write. And then after that
> completes you could do other things.

I can't see why this is necessary from a correctness or performance
point of view. Maybe I'm missing something.

In case it is necessary, the patch set does not wait for the checkpoint to
complete before marking the system as read-write. Refer:

/* Set final state by clearing in-progress flag bit */
if (SetWALProhibitState(wal_state & ~(WALPROHIBIT_TRANSITION_IN_PROGRESS)))
{
  if ((wal_state & WALPROHIBIT_STATE_READ_ONLY) != 0)
ereport(LOG, (errmsg("system is now read only")));
  else
  {
/* Request checkpoint */
RequestCheckpoint(CHECKPOINT_IMMEDIATE);
ereport(LOG, (errmsg("system is now read write")));
  }
}

We should RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT) before
we SetWALProhibitState() and do the ereport(), if we have a read-write
state change request.

Also, we currently request this checkpoint even if there was no startup
recovery and we don't set CHECKPOINT_END_OF_RECOVERY in the case where
the read-write request does follow a startup recovery.
So it should really be:
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
CHECKPOINT_END_OF_RECOVERY);
We would need to convey that an end-of-recovery-checkpoint is pending in
shmem somehow (and only if one such checkpoint is pending, should we do
it as a part of the read-write request handling).
Maybe we can set CHECKPOINT_END_OF_RECOVERY in ckpt_flags where we do:
/*
 * Skip end-of-recovery checkpoint if the system is in WAL prohibited state.
 */
and then check for that.

Some minor comments about the code (some of them probably doesn't
warrant immediate attention, but for the record...):

1. There are some places where we can use a local variable to store the
result of RelationNeedsWAL() to avoid repeated calls to it. E.g.
brin_doupdate()

2. Similarly, we can also capture the calls to GetWALProhibitState() in
a local variable where applicable. E.g. inside WALProhibitRequest().

3. Some of the functions that were added such as GetWALProhibitState(),
IsWALProhibited() etc could be declared static inline.

4. IsWALProhibited(): Shouldn't it really be:
bool
IsWALProhibited(void)
{
  uint32 walProhibitState = GetWALProhibitState();
  return (walProhibitState & WALPROHIBIT_STATE_READ_ONLY) != 0
&& (walProhibitState & WALPROHIBIT_TRANSITION_IN_PROGRESS) == 0;
}

5. I think the comments:
/* Must be performing an INSERT or UPDATE, so we'll have an XID */
and
/* Can reach here from VACUUM, so need not have an XID */
can be internalized in the function/macro comment header.

6. Typo: ConditionVariable readonly_cv; /* signaled when ckpt_started
advances */
We need to update the comment here.

Regards,
Soumyadeep (VMware)




Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

2020-07-23 Thread Andres Freund
Hi,

On 2020-07-01 15:40:38 +, Zidenberg, Tsahi wrote:
> Outline-atomics is a gcc compilation flag that adds runtime detection
> of weather or not the cpu supports atomic instructions. CPUs that
> don't support atomic instructions will use the old
> load-exclusive/store-exclusive instructions. If a different
> compilation flag defined an architecture that unconditionally supports
> atomic instructions (e.g. -march=armv8.2), the outline-atomic flag
> will have no effect.

Sounds attractive.


> The patch was tested to improve pgbench simple-update by 10% and
> sysbench write-only by 3% on a 64-core armv8.2 machine (AWS
> m6g.16xlarge). Select-only and read-only benchmarks were not
> significantly affected, and neither was performance on a 16-core
> armv8.0 machine that does not support atomic instructions (AWS
> a1.4xlarge).

What does "not significantly affected" exactly mean? Could you post the
raw numbers?  I'm a bit concerned that the additional conditional
branches on platforms without non ll/sc atomics could hurt noticably.

I'm surprised that read-only didn't benefit - with ll/sc that ought to
have pretty high contention on a few lwlocks.

Could you post the numbers?

Greetings,

Andres Freund




Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-23 Thread Andres Freund
Hi,

> From f0188a48723b1ae7372bcc6a344ed7868fdc40fb Mon Sep 17 00:00:00 2001
> From: Amul Sul 
> Date: Fri, 27 Mar 2020 05:05:38 -0400
> Subject: [PATCH v3 2/6] Add alter system read only/write syntax
> 
> Note that syntax doesn't have any implementation.
> ---
>  src/backend/nodes/copyfuncs.c| 12 
>  src/backend/nodes/equalfuncs.c   |  9 +
>  src/backend/parser/gram.y| 13 +
>  src/backend/tcop/utility.c   | 20 
>  src/bin/psql/tab-complete.c  |  6 --
>  src/include/nodes/nodes.h|  1 +
>  src/include/nodes/parsenodes.h   | 10 ++
>  src/tools/pgindent/typedefs.list |  1 +
>  8 files changed, 70 insertions(+), 2 deletions(-)

Shouldn't there be at outfuncs support as well? Perhaps we even need
readfuncs, not immediately sure.


> From 2c5db7db70d4cebebf574fbc47db7fbf7c440be1 Mon Sep 17 00:00:00 2001
> From: Amul Sul 
> Date: Fri, 19 Jun 2020 06:29:36 -0400
> Subject: [PATCH v3 3/6] Implement ALTER SYSTEM READ ONLY using global barrier.
> 
> Implementation:
> 
>  1. When a user tried to change server state to WAL-Prohibited using
> ALTER SYSTEM READ ONLY command; AlterSystemSetWALProhibitState() will emit
> PROCSIGNAL_BARRIER_WAL_PROHIBIT_STATE_CHANGE barrier and will wait until 
> the
> barrier has been absorbed by all the backends.
> 
>  2. When a backend receives the WAL-Prohibited barrier, at that moment if
> it is already in a transaction and the transaction already assigned XID,
> then the backend will be killed by throwing FATAL(XXX: need more 
> discussion
> on this)

I think we should consider introducing XACTFATAL or such, guaranteeing
the transaction gets aborted, without requiring a FATAL. This has been
needed for enough cases that it's worthwhile.


There are several cases where we WAL log without having an xid
assigned. E.g. when HOT pruning during syscache lookups or such. Are
there any cases where the check for being in recovery is followed by a
CHECK_FOR_INTERRUPTS, before the WAL logging is done?


>  3. Otherwise, if that backend running transaction which yet to get XID
> assigned we don't need to do anything special, simply call
> ResetLocalXLogInsertAllowed() so that any future WAL insert in will check
> XLogInsertAllowed() first which set ready only state appropriately.
> 
>  4. A new transaction (from existing or new backend) starts as a read-only
> transaction.

Why do we need 4)? And doesn't that have the potential to be
unnecessarily problematic if a the server is subsequently brought out of
the readonly state again?


>  5. Auxiliary processes like autovacuum launcher, background writer,
> checkpointer and  walwriter will don't do anything in WAL-Prohibited
> server state until someone wakes us up. E.g. a backend might later on
> request us to put the system back to read-write.

Hm. It's not at all clear to me why bgwriter and walwriter shouldn't do
anything in this state. bgwriter for example is even running entirely
normally in a hot standby node?


>  6. At shutdown in WAL-Prohibited mode, we'll skip shutdown checkpoint
> and xlog rotation. Starting up again will perform crash recovery(XXX:
> need some discussion on this as well)
> 
>  7. ALTER SYSTEM READ ONLY/WRITE is restricted on standby server.
> 
>  8. Only super user can toggle WAL-Prohibit state.
> 
>  9. Add system_is_read_only GUC show the system state -- will true when system
> is wal prohibited or in recovery.



> +/*
> + * AlterSystemSetWALProhibitState
> + *
> + * Execute ALTER SYSTEM READ { ONLY | WRITE } statement.
> + */
> +void
> +AlterSystemSetWALProhibitState(AlterSystemWALProhibitState *stmt)
> +{
> + if (!superuser())
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +  errmsg("must be superuser to execute ALTER 
> SYSTEM command")));

ISTM we should rather do this in a GRANTable manner. We've worked
substantially towards that in the last few years.


>  
> + /*
> +  * WALProhibited indicates if we have stopped allowing WAL writes.
> +  * Protected by info_lck.
> +  */
> + boolWALProhibited;
> +
>   /*
>* SharedHotStandbyActive indicates if we allow hot standby queries to 
> be
>* run.  Protected by info_lck.
> @@ -7962,6 +7969,25 @@ StartupXLOG(void)
>   RequestCheckpoint(CHECKPOINT_FORCE);
>  }
>  
> +void
> +MakeReadOnlyXLOG(void)
> +{
> + SpinLockAcquire(&XLogCtl->info_lck);
> + XLogCtl->WALProhibited = true;
> + SpinLockRelease(&XLogCtl->info_lck);
> +}
> +
> +/*
> + * Is the system still in WAL prohibited state?
> + */
> +bool
> +IsWALProhibited(void)
> +{
> + volatile XLogCtlData *xlogctl = XLogCtl;
> +
> + return xlogctl->WALProhibited;
> +}

What does this kind of locking achieving? It doesn't protect against
concurrent ALTER SYSTEM SET READ ONLY or such?


> + /

Re: Default setting for enable_hashagg_disk

2020-07-23 Thread Peter Geoghegan
On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra
 wrote:
> So let me share some fresh I/O statistics collected on the current code
> using iosnoop. I've done the tests on two different machines using the
> "aggregate part" of TPC-H Q17, i.e. essentially this:
>
>  SELECT * FROM (
> SELECT
> l_partkey AS agg_partkey,
> 0.2 * avg(l_quantity) AS avg_quantity
> FROM lineitem GROUP BY l_partkey OFFSET 10
>  ) part_agg;
>
> The OFFSET is there just to ensure we don't need to send anything to
> the client, etc.

Thanks for testing this.

> So sort writes ~3.4GB of data, give or take. But hashagg/master writes
> almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the
> original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still
> much more than the 3.4GB of data written by sort (which has to spill
> everything, while hashagg only spills rows not covered by the groups
> that fit into work_mem).

What I find when I run your query (with my own TPC-H DB that is
smaller than what you used here -- 59,986,052 lineitem tuples) is that
the sort required about 7x more memory than the hash agg to do
everything in memory: 4,384,711KB for the quicksort vs 630,801KB peak
hash agg memory usage. I'd be surprised if the ratio was very
different for you -- but can you check?

I think that there is something pathological about this spill
behavior, because it sounds like the precise opposite of what you
might expect when you make a rough extrapolation of what disk I/O will
be based on the memory used in no-spill cases (as reported by EXPLAIN
ANALYZE).

> What I find really surprising is the costing - despite writing about
> twice as much data, the hashagg cost is estimated to be much lower than
> the sort. For example on the i5 machine, the hashagg cost is ~10M, while
> sort cost is almost 42M. Despite using almost twice as much disk. And
> the costing is exactly the same for master and the CP_SMALL_TLIST.

That does make it sound like the costs of the hash agg aren't being
represented. I suppose it isn't clear if this is a costing issue
because it isn't clear if the execution time performance itself is
pathological or is instead something that must be accepted as the cost
of spilling the hash agg in a general kind of way.

-- 
Peter Geoghegan




Re: Making CASE error handling less surprising

2020-07-23 Thread Tom Lane
Andres Freund  writes:
> I'm a bit worried about a case like:

> CREATE FUNCTION yell(int, int)
> RETURNS int
> IMMUTABLE
> LANGUAGE SQL AS $$
>SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
> $$;

> EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);

> I don't think the parameters here would have been handled before
> inlining, right?

Ah, I see what you mean.  Yeah, that throws an error today, and it
still would with the patch I was envisioning (attached), because
inlining does Param substitution in a different way.  I'm not
sure that we could realistically fix the inlining case with this
sort of approach.

I think this bears out the comment I made before that this approach
still leaves us with a very complicated behavior.  Maybe we should
stick with the previous approach, possibly supplemented with a
leakproofness exception.

regards, tom lane

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index e04b144072..4ff7a69908 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -61,6 +61,13 @@ typedef struct
 	AggClauseCosts *costs;
 } get_agg_clause_costs_context;
 
+typedef enum
+{
+	ece_param_never,			/* never inject values for Params */
+	ece_param_const,			/* inject values for Params if marked CONST */
+	ece_param_always			/* always inject values for Params */
+} ece_p_mode;
+
 typedef struct
 {
 	ParamListInfo boundParams;
@@ -68,6 +75,7 @@ typedef struct
 	List	   *active_fns;
 	Node	   *case_val;
 	bool		estimate;
+	ece_p_mode	param_mode;
 } eval_const_expressions_context;
 
 typedef struct
@@ -2264,6 +2272,7 @@ eval_const_expressions(PlannerInfo *root, Node *node)
 	context.active_fns = NIL;	/* nothing being recursively simplified */
 	context.case_val = NULL;	/* no CASE being examined */
 	context.estimate = false;	/* safe transformations only */
+	context.param_mode = ece_param_const;	/* inject only CONST Params */
 	return eval_const_expressions_mutator(node, &context);
 }
 
@@ -2280,8 +2289,11 @@ eval_const_expressions(PlannerInfo *root, Node *node)
  *	  available by the caller of planner(), even if the Param isn't marked
  *	  constant.  This effectively means that we plan using the first supplied
  *	  value of the Param.
- * 2. Fold stable, as well as immutable, functions to constants.
+ * 2. Fold stable, as well as immutable, functions to constants.  The risk
+ *	  that the result might change from planning time to execution time is
+ *	  worth taking, as we otherwise couldn't get an estimate at all.
  * 3. Reduce PlaceHolderVar nodes to their contained expressions.
+ * 4. Ignore domain constraints, assuming that CoerceToDomain will succeed.
  *
  */
 Node *
@@ -2295,6 +2307,7 @@ estimate_expression_value(PlannerInfo *root, Node *node)
 	context.active_fns = NIL;	/* nothing being recursively simplified */
 	context.case_val = NULL;	/* no CASE being examined */
 	context.estimate = true;	/* unsafe transformations OK */
+	context.param_mode = ece_param_always;	/* inject all Param values */
 	return eval_const_expressions_mutator(node, &context);
 }
 
@@ -2372,8 +2385,22 @@ eval_const_expressions_mutator(Node *node,
 		prm->ptype == param->paramtype)
 	{
 		/* OK to substitute parameter value? */
-		if (context->estimate ||
-			(prm->pflags & PARAM_FLAG_CONST))
+		bool		subst = false;
+
+		switch (context->param_mode)
+		{
+			case ece_param_never:
+/* subst is already false */
+break;
+			case ece_param_const:
+subst = (prm->pflags & PARAM_FLAG_CONST) != 0;
+break;
+			case ece_param_always:
+subst = true;
+break;
+		}
+
+		if (subst)
 		{
 			/*
 			 * Return a Const representing the param value.
@@ -2973,10 +3000,26 @@ eval_const_expressions_mutator(Node *node,
  * expression when executing the CASE, since any contained
  * CaseTestExprs that might have referred to it will have been
  * replaced by the constant.
+ *
+ * An additional consideration is that the user might be
+ * expecting the CASE to prevent run-time errors, as in
+ *		CASE ... THEN 1/$1 ELSE ...
+ * If a constant value of zero is available for $1, we would
+ * end up trying to simplify the division, thus drawing a
+ * divide-by-zero error even if this CASE result would not be
+ * reached at runtime.  This is problematic because it can
+ * occur even if the user has not written anything as silly as
+ * a constant "1/0" expression: substitution of values for
+ * Params is standard in plpgsql, for instance.  To ameliorate
+ * this problem without giving up const-simplification of CASE
+ * subexpressions entirely, we prevent substitution of Param
+ * values within test condition or result expressions that
+ * will not certainly be evaluated at run-time.
  *--
  */
 CaseExpr   *caseexp

Re: heap_abort_speculative() sets xmin to Invalid* without HEAP_XMIN_INVALID

2020-07-23 Thread Peter Geoghegan
On Thu, Jul 23, 2020 at 2:49 PM Alvaro Herrera  wrote:
> On 2020-Jul-23, Andres Freund wrote:
>
> > I think we should change heap_abort_speculative() to set
> > HEAP_XMIN_INVALID in master.
>
> +1

+1

> +1 for doing it as an additional fix, with a fat comment somewhere
> explaining where such tuples would come from.

There could be an opportunity to put this on a formal footing by doing
something in the amcheck heap checker patch.

-- 
Peter Geoghegan




Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-23 Thread Amul Sul
On Fri, Jul 24, 2020 at 6:28 AM Soumyadeep Chakraborty <
soumyadeep2...@gmail.com> wrote:
>
> On Thu, Jun 18, 2020 at 7:54 AM Robert Haas  wrote:
> > I think we'd want the FIRST write operation to be the end-of-recovery
> > checkpoint, before the system is fully read-write. And then after that
> > completes you could do other things.
>
> I can't see why this is necessary from a correctness or performance
> point of view. Maybe I'm missing something.
>
> In case it is necessary, the patch set does not wait for the checkpoint to
> complete before marking the system as read-write. Refer:
>
> /* Set final state by clearing in-progress flag bit */
> if (SetWALProhibitState(wal_state &
~(WALPROHIBIT_TRANSITION_IN_PROGRESS)))
> {
>   if ((wal_state & WALPROHIBIT_STATE_READ_ONLY) != 0)
> ereport(LOG, (errmsg("system is now read only")));
>   else
>   {
> /* Request checkpoint */
> RequestCheckpoint(CHECKPOINT_IMMEDIATE);
> ereport(LOG, (errmsg("system is now read write")));
>   }
> }
>
> We should RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT) before
> we SetWALProhibitState() and do the ereport(), if we have a read-write
> state change request.
>
+1, I too have the same question.

FWIW, I don't we can request CHECKPOINT_WAIT for this place, otherwise, it
think
it will be deadlock case -- checkpointer process waiting for itself.

> Also, we currently request this checkpoint even if there was no startup
> recovery and we don't set CHECKPOINT_END_OF_RECOVERY in the case where
> the read-write request does follow a startup recovery.
> So it should really be:
> RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
> CHECKPOINT_END_OF_RECOVERY);
> We would need to convey that an end-of-recovery-checkpoint is pending in
> shmem somehow (and only if one such checkpoint is pending, should we do
> it as a part of the read-write request handling).
> Maybe we can set CHECKPOINT_END_OF_RECOVERY in ckpt_flags where we do:
> /*
>  * Skip end-of-recovery checkpoint if the system is in WAL prohibited
state.
>  */
> and then check for that.
>
Yep, we need some indication that end-of-recovery was skipped at the
startup,
but I haven't added that since I wasn't sure do we really need
CHECKPOINT_END_OF_RECOVERY as part of the previous concern?

> Some minor comments about the code (some of them probably doesn't
> warrant immediate attention, but for the record...):
>
> 1. There are some places where we can use a local variable to store the
> result of RelationNeedsWAL() to avoid repeated calls to it. E.g.
> brin_doupdate()
>
Ok.

> 2. Similarly, we can also capture the calls to GetWALProhibitState() in
> a local variable where applicable. E.g. inside WALProhibitRequest().
>
I don't think so.

> 3. Some of the functions that were added such as GetWALProhibitState(),
> IsWALProhibited() etc could be declared static inline.
>
IsWALProhibited() can be static but not GetWALProhibitState() since it
needed to
be accessible from other files.

> 4. IsWALProhibited(): Shouldn't it really be:
> bool
> IsWALProhibited(void)
> {
>   uint32 walProhibitState = GetWALProhibitState();
>   return (walProhibitState & WALPROHIBIT_STATE_READ_ONLY) != 0
> && (walProhibitState & WALPROHIBIT_TRANSITION_IN_PROGRESS) == 0;
> }
>
I think the current one is better, this allows read-write transactions from
existing backend which has absorbed barrier or from new backend while we
changing stated to read-write in the assumption that we never fallback.

> 5. I think the comments:
> /* Must be performing an INSERT or UPDATE, so we'll have an XID */
> and
> /* Can reach here from VACUUM, so need not have an XID */
> can be internalized in the function/macro comment header.
>
Ok.

> 6. Typo: ConditionVariable readonly_cv; /* signaled when ckpt_started
> advances */
> We need to update the comment here.
>
Ok.

Will try to address all the above review comments in the next version along
with
Andres' concern/suggestion. Thanks again for your time.

Regards,
Amul