date:20230107




On 08/01/23 04:06, David Rowley wrote:


On Sun, 8 Jan 2023 at 05:45, Ankit Kumar Pandey  wrote:

Attached patch with test cases.



I can look at this in a bit more detail if you find a way to fix the
case you mentioned earlier. i.e, push the sort down to the deepest
WindowAgg that has pathkeys contained in the query's ORDER BY
pathkeys.



EXPLAIN (COSTS OFF)
SELECT empno,
  depname,
  min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
  sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
 QUERY PLAN
---
Incremental Sort
  Sort Key: depname, empno, enroll_date
  Presorted Key: depname, empno
  ->  WindowAgg
->  WindowAgg
 ->  Sort
   Sort Key: depname, empno
   ->  Seq Scan on empsalary
(8 rows)

You'll also need to pay attention to how the has_runcondition is set.
If you start pushing before looking at all the WindowClauses then you
won't know if some later WindowClause has a runCondition. 


Yes, this should be the main culprit.



Adding an additional backwards foreach loop should allow you to do all the
required prechecks and find the index of the WindowClause which you
should start pushing from.


This should do the trick. Thanks for headup, I will update the patch 
with suggested


changes and required fixes.


Regards,

Ankit

Re: Todo: Teach planner to evaluate multiple windows in the optimal order




On 08/01/23 03:56, David Rowley wrote:

> (your email client still seems broken)

I am looking at this again, will be changing client for here onward.


You might need to have another loop before the foreach loop that loops
backwards through the WindowClauses and remembers the index of the
WindowClause which has pathkeys contained in the query's ORDER BY
pathkeys then apply the optimisation from that point in the main
foreach loop.  Also, if the condition within the foreach loop which
checks when we want to apply this optimisation is going to be run > 1
time, then you should probably have  boolean variable that's set
before the loop which saves if we're going to try to apply the
optimisation.  That'll save from having to check things like if the
 query has a LIMIT clause multiple times.


Thanks, this should do the trick.


a) looks like the best plan to me.  What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.


Yes, you are right, not in this case. I actually mentioned wrong case here,

real problematic case is:

EXPLAIN (COSTS OFF)
SELECT empno,
   depname,
   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
   sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
QUERY PLAN
---
 Incremental Sort
   Sort Key: depname, empno, enroll_date
   Presorted Key: depname, empno
   ->  WindowAgg
 ->  WindowAgg
   ->  Incremental Sort
 Sort Key: depname, empno
 Presorted Key: depname
 ->  Index Scan using depname_idx on empsalary
(9 rows)

Here, it could have sorted on depname, empno, enroll_date.

Again, as I mentioned before, this is implementation issue. We shouldn't be

skipping optimization if pre-sorted keys are present.

--
Regards,
Ankit Kumar Pandey

Re: [BUG] Logical replica crash if there was an error in a function.

2023-01-07 Thread Anton A. Melnikov


Thanks for your remarks.

On 07.01.2023 15:27, vignesh C wrote:


Few suggestions:
1) There is a warning:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+   "ERROR:  relation \"error_name\" does not exist at character"
+);

  "my" variable $result masks earlier declaration in same scope at
t/100_bugs.pl line 400.

You can change:
my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
to
$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");


The reason is that the patch fell behind the master.
Fixed in v4 together with rebasing on current master.


2) Now that the crash is fixed, you could change it to a better message:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+   "ERROR:  relation \"error_name\" does not exist at character"
+);



Tried to make this comment more clear.

Best wishes for the new year!


--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 16beb574708e0e2e993eb2fa1b093be97b8e53cf
Author: Anton A. Melnikov 
Date:   Sun Dec 11 06:26:03 2022 +0300

Add test for syntax error in the function in a a logical replication
worker. See dea834938.

diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 036576acab..b241a7d498 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -69,6 +69,51 @@ $node_publisher->wait_for_catchup('sub1');
 
 pass('index predicates do not cause crash');
 
+# https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru
+
+# The bug was that when a syntax error occurred in a SQL-language or PL/pgSQL-language
+# CREATE FUNCTION or DO command executed in a logical replication worker,
+# we'd suffer a null pointer dereference or assertion failure.
+
+$node_subscriber->safe_psql('postgres', q{
+	create or replace procedure rebuild_test(
+	) as
+	$body$
+	declare
+		l_code  text:= E'create or replace function public.test_selector(
+	) returns setof public.tab1 as
+	\$body\$
+		select * from  error_name
+	\$body\$ language sql;';
+	begin
+		execute l_code;
+		perform public.test_selector();
+	end
+	$body$ language plpgsql;
+	create or replace function test_trg()
+	returns trigger as
+	$body$
+		declare
+		begin
+			call public.rebuild_test();
+			return NULL;
+		end
+	$body$ language plpgsql;
+	create trigger test_trigger after insert on tab1 for each row
+execute function test_trg();
+	alter table tab1 enable replica trigger test_trigger;
+});
+
+# This command will cause a crash on the replica if that bug hasn't been fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+	"ERROR:  relation \"error_name\" does not exist at character"
+);
+
+ok($result,
+	"ERROR: Logical decoding doesn't fail on function error");
+
 # We'll re-use these nodes below, so drop their replication state.
 $node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub1");
 $node_publisher->safe_psql('postgres', "DROP PUBLICATION pub1");
@@ -352,7 +397,7 @@ $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub_sch');
 
 # Subscriber's sch1.t1 should receive the row inserted into the new sch1.t1,
 # but not the row inserted into the old sch1.t1 post-rename.
-my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
 is( $result, qq(1
 2), 'check data in subscriber sch1.t1 after schema rename');

Re: drop postmaster symlink

On Sat, 7 Jan 2023 19:56:08 -0600
"Karl O. Pinc"  wrote:

> On Sat, 07 Jan 2023 18:38:25 -0500
> Tom Lane  wrote:
> 
> > "Karl O. Pinc"  writes:  
> > > This is a review of Peter's 2 patches.  I see only 1 small
> > > problem. ...

> > Hmm ... I thought this patch was about getting rid of the
> > admittedly-obsolete installed symlink. ...

> No.  It's about getting rid of the symlink. 

The only way I could think of to review a patch
that removes something is to report all the places
I looked where a reference to the symlink might be.
Then report what I found each place I looked and
report if there's a problem, or might be.

That way the committer knows where I didn't look if there's
more that needs removing.

Apologies that this was not clear.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: pgsql: Delay commit status checks until freezing executes.

Hi,

On 2023-01-07 15:41:29 -0800, Peter Geoghegan wrote:
> Do we need to do anything about this to the "pg_xact and pg_subtrans"
> section of the transam README?

Probably a good idea, although it doesn't neatly fit right now.


> Also, does amcheck's get_xid_status() need a reference to these rules?

Don't think so? Whad made you ask?


> FWIW, I found an existing comment about this rule in the call to
> TransactionIdAbortTree() from RecordTransactionAbort() -- which took
> me quite a while to find. So you might have been remembering that
> comment before.

Possible, my memory is vague enough that it's hard to be sure...

Greetings,

Andres Freund

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-01-07 16:29:23 -0800, Andres Freund wrote:
> It's probably not too hard to fix specifically in this one place - we could
> just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but
> it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I
> suspect this might not be the only place running into problems with such
> "before the universe" xids.

I haven't found other problematic places in HEAD, but did end up find a less
serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify
that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns
values that look likely to cause problems. Its "just" used in gist luckily.

It's hard to find places that do this kind of arithmetic, we traditionally
haven't had a helper for it. So it's open-coded in various ways.

xidStopLimit = xidWrapLimit - 300;
if (xidStopLimit < FirstNormalTransactionId)
xidStopLimit -= FirstNormalTransactionId;

and oddly:
xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age;
if (xidVacLimit < FirstNormalTransactionId)
xidVacLimit += FirstNormalTransactionId;

or (in < 14):

RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
if (!TransactionIdIsNormal(RecentGlobalXmin))
RecentGlobalXmin = FirstNormalTransactionId;

The currently existing places I found, other than the ones in procarray.c,
luckily don't seem to convert the xids to 64bit xids.

> For a bit I was thinking that we should introduce the notion that a
> FullTransactionId can be from the past. Specifically that when the upper 32bit
> are all set, we treat the lower 32bit as a value from before xid 0 using the
> normal 32bit xid arithmetic.  But it sucks to add overhead for that
> everywhere.
> 
> It might be a bit more palatable to designate one individual value,
> e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
> before the start of the universe an xid point to...

On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I
hacked up a patch that converts various fxid functions to inline functions
with such assertions, and it indeed quickly catches the problem this thread
reported, close to the source of the use.

One issue with that is is that it'd reduce what can be input for the xid8
type. But it's hard to believe that'd be a real issue?

It's quite unfortunate that we don't have a test for vacuum_defer_cleanup_age
yet :(.

Greetings,

Andres Freund

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-07 Thread Amit Kapila

On Sat, Jan 7, 2023 at 2:25 PM Dilip Kumar  wrote:
>

Today, I was analyzing this patch w.r.t recent commit c6e1f62e2c and
found that pa_set_xact_state() should set the latch (wake up) for the
leader worker as the leader could be waiting in
pa_wait_for_xact_state(). What do you think? But otherwise, it should
be okay w.r.t DDLs because this patch allows the leader worker to
restart logical replication for subscription parameter change which
will in turn stop/restart parallel workers if required.

-- 
With Regards,
Amit Kapila.

Re: drop postmaster symlink

On Sat, 7 Jan 2023 19:33:38 -0500
Joe Conway  wrote:

> On 1/7/23 18:38, Tom Lane wrote:
> > "Karl O. Pinc"  writes:  
> >> This is a review of Peter's 2 patches.  I see only 1 small
> >> problem.  

The small problem is a reference to a deleted file.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: drop postmaster symlink

On Sat, 07 Jan 2023 18:38:25 -0500
Tom Lane  wrote:

> "Karl O. Pinc"  writes:
> > This is a review of Peter's 2 patches.  I see only 1 small problem.
> >  
> 
> > Looking at the documentation, a "postmaster" in the glossary is
> > defined as the controlling process.  This works; it needs to be
> > called something.  There is still a postmaster.pid (etc.) in the
> > data directory.  
> 
> > The word "postmaster" (case insensitive) shows up 84 times in the
> > documentation.  I looked at all of these.
> 
> Hmm ... I thought this patch was about getting rid of the
> admittedly-obsolete installed symlink.  If it's trying to get rid of
> the "postmaster" terminology for our parent process, I'm very strongly
> against that, either as regards to code or documentation.

No.  It's about getting rid of the symlink.  I was grepping around
to look for use of the symlink, started with the glossary just
to be sure, and saw that "postmaster" is consistently (I think)
used to refer to the controlling, parent, process.  And wrote
down what I was doing and what I found as I went along.  And then
sent out my notes.

Sorry for the confusion.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

On 2023-01-07 21:06:06 +0300, Michail Nikolaev wrote:
> 2) It is not an issue at table creation time. Issue is reproducible if
> vacuum_defer_cleanup_age set after table preparation.
> 
> 3) To reproduce the issue, vacuum_defer_cleanup_age should flip xid
> over zero (be >= txid_current()).
> And it is stable So, for example - unable to reproduce with 733
> value, but 734 gives error each time.
> Just a single additional txid_current() (after data is filled) fixes a
> crash... It looks like the first SELECT FOR UPDATE + UPDATE silently
> poisons everything somehow.
> You could use such PSQL script:

FWIW, the concrete value for vacuum_defer_cleanup_age is not crucial to
encounter the problem. It needs to be a value that, when compared to the xid
that did the "INSERT INTO something_is_wrong_here", results in value <= 0.

Setting vacuum_defer_cleanup_age than the xid to a much larger value allows
the crash to be encountered repeatedly.

Greetings,

Andres Freund

Re: drop postmaster symlink

2023-01-07 Thread Joe Conway


On 1/7/23 18:38, Tom Lane wrote:

"Karl O. Pinc"  writes:

This is a review of Peter's 2 patches.  I see only 1 small problem.



Looking at the documentation, a "postmaster" in the glossary is
defined as the controlling process.  This works; it needs to be called
something.  There is still a postmaster.pid (etc.) in the data
directory.



The word "postmaster" (case insensitive) shows up 84 times in the
documentation.  I looked at all of these.  


Hmm ... I thought this patch was about getting rid of the
admittedly-obsolete installed symlink.


That was my understanding too.


If it's trying to get rid of the "postmaster" terminology for our
parent process, I'm very strongly against that, either as regards to
code or documentation.


+1

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Hi,

Thomas, CCing you because of the 64bit xid representation aspect.

On 2023-01-06 00:39:44 +0300, Michail Nikolaev wrote:
> I apologize for the direct ping, but I think your snapshot scalability
> work in PG14 could be related to the issue.

Good call!

> The TransactionIdRetreatedBy implementation looks correct... But with
> txid_current=212195 I see errors like "could not access status of
> transaction 58643736"...
> So, maybe vacuum_defer_cleanup_age just highlights some special case
> (something with "previous" xids on the left side of zero?)

I think the bug is close to TransactionIdRetreatedBy(). Arguably in
FullXidRelativeTo(). Or a more fundamental data representation issue with
64bit xids.

To explain, here's a trace to the bottom of GetSnapshotData() leading to the
problem:

In the case I'm looking at here we end up with 720:
oldestfxid = FullXidRelativeTo(latest_completed, oldestxid);
and xmin is 255271, both correct.

Then in TransactionIdRetreatedBy:
/* apply vacuum_defer_cleanup_age */
def_vis_xid_data =
TransactionIdRetreatedBy(xmin, 
vacuum_defer_cleanup_age);

things start to be iffy. Because we retreat by vacuum_defer_cleanup_age, which
was set to txid_current() in scheme.sql, and the xmin above is that xid,
TransactionIdRetreatedBy() first ends up with 0. It then backtracks further to
the highest 32 xid (i.e. 4294967295). So far so good.

We could obviously end up with values further in the past as well, if
vacuum_defer_cleanup_age were larger.

Things start to seriously go off the rails when we convert that 32bit xid to
64 bit with:
def_vis_fxid = FullXidRelativeTo(latest_completed, def_vis_xid);
which returns {value = 18446744073709551615}, which 0-1 in 64bit.

However, as 64bit xids are not supposed to wrap around, we're in trouble -
it's an xid *very* far into the future. Allowing things to be pruned that
shouldn't, because everything is below that.

I don't quite know how to best fix this. 4294967295 is the correct result for
TransactionIdRetreatedBy() in this case, and it'd not cause problems for
FullXidRelativeTo() if we actually had wrapped around the xid counter before
(since then we'd just represent it as a fxid "of the proper epoch").

Basically, in contrast to 32bit xids, 64bit xids cannot represent xids from
before the start of the universe, whereas with the modulo arithmetic of 32bit
that's not a real problem.

It's probably not too hard to fix specifically in this one place - we could
just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but
it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I
suspect this might not be the only place running into problems with such
"before the universe" xids.

For a bit I was thinking that we should introduce the notion that a
FullTransactionId can be from the past. Specifically that when the upper 32bit
are all set, we treat the lower 32bit as a value from before xid 0 using the
normal 32bit xid arithmetic.  But it sucks to add overhead for that
everywhere.

It might be a bit more palatable to designate one individual value,
e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
before the start of the universe an xid point to...

FullXidRelativeTo() did assert that the input 32bit xid is in a reasonable
range, but unfortunately didn't do a similar check for the 64bit case.

Greetings,

Andres Freund

Re: pgsql: Delay commit status checks until freezing executes.

2023-01-07 Thread Peter Geoghegan

On Sat, Jan 7, 2023 at 1:47 PM Andres Freund  wrote:
> > What do you think of the attached patch, which revises comments over
> > TransactionIdDidAbort, and adds something about it to the top of
> > heapam_visbility.c?
>
> Mostly looks good to me.  I think it'd be good to add a reference to the
> heapam_visbility.c? comment to the top of transam.c (or move it).

Makes sense.

> I think it's currently very likely to be true, but I'd weaken the "never" a
> bit nonetheless. I think it'd also be good to point to what to do instead. How
> about:
>   Note that TransactionIdDidAbort() returns true only for explicitly aborted
>   transactions, as transactions implicitly aborted due to a crash will
>   commonly still appear to be in-progress in the clog. Most of the time
>   TransactionIdDidCommit(), with a preceding TransactionIdIsInProgress()
>   check, should be used instead of TransactionIdDidAbort().

That does seem better.

Do we need to do anything about this to the "pg_xact and pg_subtrans"
section of the transam README? Also, does amcheck's get_xid_status()
need a reference to these rules?

FWIW, I found an existing comment about this rule in the call to
TransactionIdAbortTree() from RecordTransactionAbort() -- which took
me quite a while to find. So you might have been remembering that
comment before.

-- 
Peter Geoghegan

Re: drop postmaster symlink

"Karl O. Pinc"  writes:
> This is a review of Peter's 2 patches.  I see only 1 small problem.

> Looking at the documentation, a "postmaster" in the glossary is
> defined as the controlling process.  This works; it needs to be called
> something.  There is still a postmaster.pid (etc.) in the data
> directory.

> The word "postmaster" (case insensitive) shows up 84 times in the
> documentation.  I looked at all of these.  

Hmm ... I thought this patch was about getting rid of the
admittedly-obsolete installed symlink.  If it's trying to get rid of
the "postmaster" terminology for our parent process, I'm very strongly
against that, either as regards to code or documentation.

regards, tom lane

FYI: 2022-10 thorntail failures from coreutils FICLONE

2023-01-07 Thread Noah Misch

thorntail failed some recovery tests in 2022-10:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-11-02%2004%3A25%3A43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-31%2013%3A32%3A42
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-29%2017%3A48%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-24%2013%3A48%3A16
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-24%2010%3A08%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-21%2000%3A58%3A14
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-16%2000%3A08%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-15%2020%3A48%3A18
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-14%2020%3A13%3A35
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail=2022-10-14%2006%3A58%3A15

thorntail has long seen fsync failures, due to a driver bug[1].  On
2022-09-28, its OS updated coreutils from 8.32-4.1, 9.1-1.  That brought in
"cp" use of the FICLONE ioctl.  FICLONE internally syncs its source file,
reporting EIO if that fails.  A bug[2] in "cp" allowed it to silently make a
defective copy instead of reporting that EIO.  Since the recovery suite
archive_command uses "cp", these test failures emerged.  The kernel may
change[3] to make such userspace bugs harder to add.

For thorntail, my workaround was to replace "cp" with a wrapper doing 'exec
/usr/bin/cp --reflink=never "$@"'.  I might eventually propose the ability to
disable FICLONE calls in PostgreSQL code.  So far, those calls (in pg_upgrade)
have not caused thorntail failures.

[1] https://postgr.es/m/flat/20210508001418.ga3076...@rfd.leadboat.com
[2] 
https://github.com/coreutils/coreutils/commit/f6c93f334ef5dbc5c68c299785565ec7b9ba5180
[3] https://lore.kernel.org/linux-xfs/20221108172436.ga3613...@rfd.leadboat.com

Re: drop postmaster symlink

Hello,

This is a review of Peter's 2 patches.  I see only 1 small problem.

   +++

Looking at the documentation, a "postmaster" in the glossary is
defined as the controlling process.  This works; it needs to be called
something.  There is still a postmaster.pid (etc.) in the data
directory.

The word "postmaster" (case insensitive) shows up 84 times in the
documentation.  I looked at all of these.  

I see a possible problem at line 1,412 of runtime.sgml, the "Linux
Memory Overcommit" section.  It talks about the postmaster's startup
script invoking the postmaster.  It might, possibly, be better to say
that "postgres" is invoked, that being the name of the invoked program.
There's a similar usage at line 1,416. On the other hand, the existing
text makes it quite clear what is going on and there's a lot to be said
for consistently using the word "postmaster".  I mention this only in
case somebody deems it significant.  Perhaps there's a way to use
markup, identifying "postgres" as a program with a name in the file
system, to make things more clear.  Most likely, nobody will care.

In doc/src/sgml/ref/allfiles.sgml at line 222 there is an ENTITY
defined which references the deleted postmaster.sgml file.  Since I
did a maintainer-clean, and the patch deletes the postmaster.sgml
file, and I don't see any references to the entity in the docs, I
believe that this line should be removed.  (In other words, I don't
think this file is automatically maintained.)

After applying the patches the documentation seems to build just fine.

I have not run contrib/start-scripts/{freebsd,linux}, but the patch
looks fine and the result of applying the patch looks fine, and the
patch is a one-line simple replacement of "postmaster" with "postgres"
so I expect no problems.

The modification to src/port/path.c is in a comment, so it will surely
not affect anything at runtime.  And the changed comment looks right
to me.

There's thousands of lines of comments in the code where "postmaster"
(case insensitive) shows up after applying the patches.  I've not
reviewed any of these.  Or looked for odd variable names, or
references in the code to the postmaster binary - by name, etc.  I'm
not sure where it would make sense to look for such problems.

Aside from the "allfiles.sgml" problem, I see no reason why these 2
patches should not be applied.  As mentioned by others, I don't have
strong feelings about getting rid of the postmaster symlink.  But I
did pick this patch to review because I remember, once upon a time,
being slightly confused by a "postmaster" in the process list.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: Using WaitEventSet in the postmaster

Hi,

On 2023-01-07 18:08:11 +1300, Thomas Munro wrote:
> On Sat, Jan 7, 2023 at 12:25 PM Andres Freund  wrote:
> > On 2023-01-07 11:08:36 +1300, Thomas Munro wrote:
> > > 3.  Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
> > > SIGINT?  If you send all of these extremely rapidly, it's
> > > indeterminate which one will be seen by handle_shutdown_request().
> >
> > That doesn't seem optimal. I'm mostly worried that we can end up 
> > downgrading a
> > shutdown request.
> 
> I was contemplating whether I needed to do some more push-ups to
> prefer the first delivered signal (instead of the last), but you're
> saying that it would be enough to prefer the fastest shutdown type, in
> cases where more than one signal was handled between server loops.
> WFM.

I don't see any need for such an order requirement - in case of receiving a
"less severe" shutdown request first, we'd process the more severe one soon
after. There's nothing to be gained by trying to follow the order of the
incoming signals.

Afaict that's also the behaviour today. pmdie() has blocks like this:
case SIGTERM:

/*
 * Smart Shutdown:
 *
 * Wait for children to end their work, then shut down.
 */
if (Shutdown >= SmartShutdown)
break;

I briefly wondered about deduplicating that code, now that we we know the
requested mode ahead of the switch. So it could be a something like:

/* don't interrupt an already in-progress shutdown in a more "severe" mode */
if (mode < Shutdown)
   return;

but that's again probaly something for later.

> > > 5.  Is the signal mask being correctly handled during forking?  I
> > > think so: I decided to push the masking logic directly into the
> > > routine that forks, to make it easy to verify that all paths set the
> > > mask the way we want.
> >
> > Hm. If I understand correctly, you used sigprocmask() directly (vs
> > PG_SETMASK()) in fork_process() because you want the old mask? But why do we
> > restore the prior mask, instead of just using PG_SETMASK(); as we
> > still do in a bunch of places in the postmaster?
> 
> It makes zero difference in practice but I think it's a nicer way to
> code it because it doesn't make an unnecessary assumption about what
> the signal mask was on entry.

Heh, to me doing something slightly different than in other places actually
seems to make it a bit less nice :). There's also some benefit in the
certainty of knowing what the mask will be.  But it doesn't matter mcuh.

> > > 6.  Is the naming and style OK?  Better ideas welcome, but basically I
> > > tried to avoid all unnecessary refactoring and changes, so no real
> > > logic moves around, and the changes are pretty close to "mechanical".
> > > One bikeshed decision was what to call the {handle,process}_XXX
> > > functions and associated flags.
> >
> > I wonder if it'd be good to have a _pm_ in the name.
> 
> I dunno about this one, it's all static stuff in a file called
> postmaster.c and one (now) already has pm in it (see below).

I guess stuff like signal handlers and their state somehow seems more global
to me than their C linkage type suggests. Hence the desire to be a bit more
"namespaced" in their naming. I do find it somewhat annoying when reasonably
important global variables aren't uniquely named when using a debugger...

But again, this isn't anything that should hold up the patch.

> > Is there any reason to use MAXLISTEN here? We know how many sockets w're
> > listening on by this point, I think? No idea if the overhead matters 
> > anywhere,
> > but ...
> 
> Fixed.

I was thinking of determining the number once, in PostmasterMain(). But that's
perhaps better done as a separate change... WFM.

> > Hm. This is preexisting behaviour, but now it seems somewhat odd that we 
> > might
> > end up happily forking a backend for each socket without checking signals
> > inbetween.  Forking might take a while, and if a signal arrived since the
> > WaitEventSetWait() we'll not react to it.
> 
> Right, so if you have 64 server sockets and they all have clients
> waiting on their listen queue, we'll service one connection from each
> before getting back to checking for pmsignals or shutdown, and that's
> unchanged in this patch.  (FWIW I also noticed that problem while
> experimenting with the idea that you could handle multiple clients in
> one go on OSes that report the listen queue depth size along with
> WL_SOCKET_ACCEPT events, but didn't pursue it...)
> 
> I guess we could check every time through the nevents loop.  I may
> look into that in a later patch, but I prefer to keep the same policy
> in this patch.

Makes sense. This was mainly me trying to make sure we're not changing subtle
stuff accidentally (and trying to understand how things work currently, as a
prerequisite).

> > >  static void
> > >

Re: Infinite Interval

2023-01-07 Thread Joseph Koshakow

On Sat, Jan 7, 2023 at 3:05 PM Joseph Koshakow  wrote:
>
> On Sat, Jan 7, 2023 at 3:04 PM Joseph Koshakow  wrote:
> >
> > I think this patch is just about ready for review, except for the
> > following two questions:
> >   1. Should finite checks on intervals only look at months or all three
> >   fields?
> >   2. Should we make the error messages for adding/subtracting infinite
> >   values more generic or leave them as is?
> >
> > My opinions are
> >   1. We should only look at months.
> >   2. We should make the errors more generic.
> >
> > Anyone else have any thoughts?

Here's a patch with the more generic error messages.

- Joe
From 6ed93bc20db57cea2d692e9288d97b66f4a526dc Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 17 Dec 2022 14:21:26 -0500
Subject: [PATCH] This is WIP.

TODOs
1. Should we just use the months field to test for infinity?

Ashutosh Bapat and Joe Koshakow and Jian He
---
 doc/src/sgml/datatype.sgml |   2 +-
 doc/src/sgml/func.sgml |   5 +-
 src/backend/utils/adt/date.c   |  20 ++
 src/backend/utils/adt/datetime.c   |  14 +-
 src/backend/utils/adt/timestamp.c  | 448 
 src/include/datatype/timestamp.h   |  21 ++
 src/test/regress/expected/horology.out |   6 +-
 src/test/regress/expected/interval.out | 466 +++--
 src/test/regress/sql/horology.sql  |   6 +-
 src/test/regress/sql/interval.sql  | 130 ++-
 10 files changed, 1002 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index fdffba4442..2bcf959f70 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2316,7 +2316,7 @@ TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
  
  
   infinity
-  date, timestamp
+  date, timestamp, interval
   later than all other time stamps
  
  
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3bf8d021c3..7ddf76da4a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9369,7 +9369,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
  boolean
 
 
- Test for finite interval (currently always true)
+ Test for finite interval (not +/-infinity)
 
 
  isfinite(interval '4 hours')
@@ -10256,7 +10256,8 @@ SELECT EXTRACT(YEAR FROM TIMESTAMP '2001-02-16 20:38:40');
  When the input value is +/-Infinity, extract returns
  +/-Infinity for monotonically-increasing fields (epoch,
  julian, year, isoyear,
- decade, century, and millennium).
+ decade, century, and millennium
+ for all types and hour and day just for interval).
  For other fields, NULL is returned.  PostgreSQL
  versions before 9.6 returned zero for all cases of infinite input.
 
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 99171d9c92..8334b9053f 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -2067,6 +2067,11 @@ time_pl_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeADT		result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot add infinite interval to time")));
+
 	result = time + span->time;
 	result -= result / USECS_PER_DAY * USECS_PER_DAY;
 	if (result < INT64CONST(0))
@@ -2085,6 +2090,11 @@ time_mi_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeADT		result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot subtract infinite interval from time")));
+
 	result = time - span->time;
 	result -= result / USECS_PER_DAY * USECS_PER_DAY;
 	if (result < INT64CONST(0))
@@ -2599,6 +2609,11 @@ timetz_pl_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeTzADT  *result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot add infinite interval to time")));
+
 	result = (TimeTzADT *) palloc(sizeof(TimeTzADT));
 
 	result->time = time->time + span->time;
@@ -2621,6 +2636,11 @@ timetz_mi_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeTzADT  *result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot subtract infinite interval from time")));
+
 	result = (TimeTzADT *) palloc(sizeof(TimeTzADT));
 
 	result->time = time->time - span->time;
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index d166613895..4192e7a74b 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -70,7 +70,7 @@ static bool DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t,
   const char *abbr, pg_tz *tzp,
   int *offset, int

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

On Sun, 8 Jan 2023 at 05:45, Ankit Kumar Pandey  wrote:
> Attached patch with test cases.

I can look at this in a bit more detail if you find a way to fix the
case you mentioned earlier. i.e, push the sort down to the deepest
WindowAgg that has pathkeys contained in the query's ORDER BY
pathkeys.

EXPLAIN (COSTS OFF)
SELECT empno,
   depname,
   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
   sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
  QUERY PLAN
---
 Incremental Sort
   Sort Key: depname, empno, enroll_date
   Presorted Key: depname, empno
   ->  WindowAgg
 ->  WindowAgg
   ->  Sort
 Sort Key: depname, empno
 ->  Seq Scan on empsalary
(8 rows)

You'll also need to pay attention to how the has_runcondition is set.
If you start pushing before looking at all the WindowClauses then you
won't know if some later WindowClause has a runCondition. Adding an
additional backwards foreach loop should allow you to do all the
required prechecks and find the index of the WindowClause which you
should start pushing from.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

(your email client still seems broken)

On Sun, 8 Jan 2023 at 05:27, Ankit Kumar Pandey  wrote:
>
>
> While writing test cases, I found that optimization do not happen for
> case #1
>
> (which is prime candidate for such operation) like
>
> EXPLAIN (COSTS OFF)
> SELECT empno,
> depname,
> min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
> sum(salary) OVER (PARTITION BY depname) depsalary
> FROM empsalary
> ORDER BY depname, empno, enroll_date
>
> This happens because mutual exclusiveness of two operands (when number
> of window functions > 1) viz
>
> is_sorted and last activeWindow in the condition:
>
> ( !is_sorted && lnext(activeWindows, l) == NULL)
>
> For 2nd last window function, is_sorted is false and path keys get added.
>
> In next run (for last window function), is_sorted becomes true and whole
> optimization
>
> part is skipped.
>
> Note: Major issue that if I remove is_sorted from condition, even though
>
> path keys are added, it still do not perform optimization and works same
> as in master/unoptimized case.
>
> Perhaps adding path keys at last window function is not doing trick?
> Maybe we need to add pathkeys
>
> to all window functions which are subset of query's order by
> irrespective of being last or not?

You might need to have another loop before the foreach loop that loops
backwards through the WindowClauses and remembers the index of the
WindowClause which has pathkeys contained in the query's ORDER BY
pathkeys then apply the optimisation from that point in the main
foreach loop.  Also, if the condition within the foreach loop which
checks when we want to apply this optimisation is going to be run > 1
time, then you should probably have  boolean variable that's set
before the loop which saves if we're going to try to apply the
optimisation.  That'll save from having to check things like if the
query has a LIMIT clause multiple times.

> Case #2:
>
> For presorted columns, eg
>
> CREATE INDEX depname_idx ON empsalary(depname);
> SET enable_seqscan=0;
> EXPLAIN (COSTS OFF)
> SELECT empno,
>  min(salary) OVER (PARTITION BY depname) depminsalary
> FROM empsalary
> ORDER BY depname, empno;
>
> Is this correct plan:
>
> a)
>
>QUERY PLAN
> ---
>   Incremental Sort
> Sort Key: depname, empno
> Presorted Key: depname
> ->  WindowAgg
>   ->  Index Scan using depname_idx on empsalary
> (5 rows)
>
> or this:
>
> b) (Akin to Optimized version)
>
>QUERY PLAN
> ---
>   WindowAgg
> ->  Incremental Sort
>   Sort Key: depname, empno
>   Presorted Key: depname
>   ->  Index Scan using depname_idx on empsalary
> (5 rows)
>
> Patched version does (a) because of is_sorted condition.

a) looks like the best plan to me.  What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.

David

Re: delayed initialization in worktable scan

Andres Freund  writes:
> Basically the issue is that in queries with two CTEs we can, at least
> currently, end up with a WorkTable scans on a CTE we've not yet initialized,
> due to the WorkTable scan of one CTE appearing in the other. Thus
> ExecInitRecursiveUnion() hasn't yet set up the param we use in
> nodeWorktablescan.c to find the tuplestore and the type of the tuples it
> contains.

> I don't think this is a huge issue, but it surprised multiple times, so I'd
> like to expand the comment. At least for me it's hard to get from "corner
> cases" to one worktable scan appearing in another CTE and to mutally recursive
> CTEs.

Sure.  I think I wrote the comment the way I did because I hadn't done
enough analysis to be sure that mutually recursive CTEs was the only
way to trigger it.  But as long as you say "for example" or words to
that effect, I don't have a problem with giving an example case here.

> I did go down the rabbit hole of trying to avoid this issue because it "feels
> wrong" to not know the return type of an executor node during initialization.
> ...
> I'm not sure it's worth changing this. Or whether that'd be the right 
> approach.

I wouldn't bother unless we find a compelling reason to need the info
earlier.

regards, tom lane

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

On Sun, 8 Jan 2023 at 01:52, Ankit Kumar Pandey  wrote:
> I am just wondering though, why can we not do top-N sort
> in optimized version if we include limit clause? Is top-N sort is
> limited to non strict sorting or
> cases last operation before limit is sort? .

Maybe the sort bound can be pushed down. You'd need to adjust
ExecSetTupleBound() so that it pushes the bound through
WindowAggState.

David

Re: pgsql: Delay commit status checks until freezing executes.

Hi,

On 2023-01-06 11:16:00 -0800, Peter Geoghegan wrote:
> On Tue, Jan 3, 2023 at 10:52 PM Peter Geoghegan  wrote:
> > > And it'd make sense to have
> > > the explanation of why TransactionIdDidAbort() isn't the same as
> > > !TransactionIdDidCommit(), even for !TransactionIdIsInProgress() xacts, 
> > > near
> > > the explanation for doing TransactionIdIsInProgress() first.
> >
> > I think that we should definitely have a comment directly over
> > TransactionIdDidAbort(). Though I wouldn't mind reorganizing these
> > other comments, or making the comment over TransactionIdDidAbort()
> > mostly just point to the other comments.
> 
> What do you think of the attached patch, which revises comments over
> TransactionIdDidAbort, and adds something about it to the top of
> heapam_visbility.c?

Mostly looks good to me.  I think it'd be good to add a reference to the
heapam_visbility.c? comment to the top of transam.c (or move it).

>   *   Assumes transaction identifier is valid and exists in clog.
> + *
> + *   Not all transactions that must be treated as aborted will be
> + *   explicitly marked as such in clog.  Transactions that were in
> + *   progress during a crash are never reported as aborted by us.
>   */
>  bool /* true if given 
> transaction aborted */
>  TransactionIdDidAbort(TransactionId transactionId)

I think it's currently very likely to be true, but I'd weaken the "never" a
bit nonetheless. I think it'd also be good to point to what to do instead. How
about:
  Note that TransactionIdDidAbort() returns true only for explicitly aborted
  transactions, as transactions implicitly aborted due to a crash will
  commonly still appear to be in-progress in the clog. Most of the time
  TransactionIdDidCommit(), with a preceding TransactionIdIsInProgress()
  check, should be used instead of TransactionIdDidAbort().

Greetings,

Andres Freund

delayed initialization in worktable scan

Hi,

while looking at fixing [1], I again came across the fact that we don't
initialize the projection etc during ExecInitWorkTableScan(), but do so during
the first call to ExecWorkTableScan().

This is explained with the following comment:

/*
 * On the first call, find the ancestor RecursiveUnion's state via the
 * Param slot reserved for it.  (We can't do this during node init 
because
 * there are corner cases where we'll get the init call before the
 * RecursiveUnion does.)
 */

I remember being confused about this before. I think I dug out the relevant
commit back then 0a7abcd4c983 [2], but didn't end up finding the relevant
thread. This time I did: [3].


Basically the issue is that in queries with two CTEs we can, at least
currently, end up with a WorkTable scans on a CTE we've not yet initialized,
due to the WorkTable scan of one CTE appearing in the other. Thus
ExecInitRecursiveUnion() hasn't yet set up the param we use in
nodeWorktablescan.c to find the tuplestore and the type of the tuples it
contains.

I don't think this is a huge issue, but it surprised multiple times, so I'd
like to expand the comment. At least for me it's hard to get from "corner
cases" to one worktable scan appearing in another CTE and to mutally recursive
CTEs.


I did go down the rabbit hole of trying to avoid this issue because it "feels
wrong" to not know the return type of an executor node during initialization.
The easiest approach I could see was to add the "scan type" to WorkTableScan
(vs the target list, which includes the projection). Most of the other scan
nodes get the scan type from the underlying relation etc, and thus don't need
it in the plan node ([4]). That way the projection can be built normally
during ExecInitWorkTableScan(), but we still need to defer the lookup of
node->rustate. But that bothers me a lot less.

I'm not sure it's worth changing this. Or whether that'd be the right approach.


I'm also wondering if Tom's first instinct from back then making this an error
would have been the right call. But that ship has sailed.


To be clear, this "issue" is largely independent of [1] / not a blocker
whatsoever. Partially I wrote this to have an email to find the next time I
encounter this.

Greetings,

Andres Freund

[1] https://postgr.es/m/17737-55a063e3e6c41b4f%40postgresql.org
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a7abcd4c983
[3] https://postgr.es/m/87zllbxngg.fsf%40oxford.xeocode.com
[4] I don't particularly like that, we spend a lot of time converting memory
inefficient target lists into tupledescs during executor initialization,
even though we rely on the tuple types not to be materially different
anyway. But that's a separate issue.

Re: How to define template types in PostgreSQL

2023-01-07 Thread Nikita Malakhov

Hi!

I'd suggest creating an API that defines a general function set with
variable input,
and calling implementation defined on the input type?

On Sat, Jan 7, 2023 at 12:32 PM Esteban Zimanyi 
wrote:

> Dear all
>
> MobilityDB (https://github.com/MobilityDB/MobilityDB) defines at the C
> level four template types: Set, Span, SpanSet, and Temporal. The type Set
> is akin to PostgreSQL's ArrayType restricted to one dimension, but enforces
> the constraint that sets do not have duplicates, the types Span and SpanSet
> are akin to PostgreSQL's RangeType and MultirangeType but enforce the
> constraints that span types are of fixed length and that empty spans and
> infinite bounds are not allowed, and the typeTemporal is used to
> manipulate time-varying values.
>
> These template types need to be instantiated at the SQL level with base
> types (int, bigint, float, timestamptz, text, ...) and because of this,
> MobilityDB needs to define numerous SQL functions that all call the same
> function in C. Taking as example the Set type, we need to define, e.g.,
>
> CREATE FUNCTION intset_eq(intset, intset) RETURNS bool AS
> 'MODULE_PATHNAME', 'Set_eq' ...
> CREATE FUNCTION bigintset_eq(bigintset, bigintset) RETURNS bool AS
> 'MODULE_PATHNAME', 'Set_eq' ...
> CREATE FUNCTION floatset_eq(floatset, floatset) RETURNS bool AS
> 'MODULE_PATHNAME', 'Set_eq' ...
> CREATE FUNCTION textset_eq(textset, textset) RETURNS bool AS
> 'MODULE_PATHNAME', 'Set_eq' ...
> ...
>
> CREATE FUNCTION intset_ne(intset, intset) RETURNS bool AS
> 'MODULE_PATHNAME', 'Set_ne' ...
> CREATE FUNCTION bigintset_ne(bigintset, bigintset) RETURNS bool AS
> 'MODULE_PATHNAME', 'Set_ne' ...
> CREATE FUNCTION floatset_ne(floatset, floatset) RETURNS bool AS
> 'MODULE_PATHNAME', 'Set_ne' ...
> CREATE FUNCTION textset_ne(textset, textset) RETURNS bool AS
> 'MODULE_PATHNAME', 'Set_ne' ...
> ...
>
> In the case of arrays, ranges, and multiranges, PostgreSQL avoids this
> redundancy using pseudo-types such as anyarray, anyrange, anymultirange, ...
>
> Is there a possibility that we can also define pseudo types such as
> anyset, anyspan, anyspanset, anytemporal,  ?
>
> This will considerably reduce the number of SQL functions to define.
> Currently, given the high number of functions in MobilityDB, creating the
> extension takes a lng time 
>
> Regards
>
> Esteban
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: Infinite Interval

2023-01-07 Thread Joseph Koshakow

On Sat, Jan 7, 2023 at 3:04 PM Joseph Koshakow  wrote:
>
> On Thu, Jan 5, 2023 at 11:30 PM jian he  wrote:
> >
> >
> >
> > On Fri, Jan 6, 2023 at 6:54 AM Joseph Koshakow  wrote:
> >>
> >> Looks like some of the error messages have changed and we
> >> have some issues with parsing "+infinity" after rebasing.
> >
> >
> > There is a commit 2ceea5adb02603ef52579b568ca2c5aebed87358
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ceea5adb02603ef52579b568ca2c5aebed87358
> > if you pull this commit then you can do select interval '+infinity', even 
> > though I don't know why.
>
> It turns out that I was just misreading the error. The test was
> expecting us to fail on "+infinity" but we succeeded. I just removed
> that test case.
>
> >> pgindent. Looks like some of the error messages have changed
>
> The conditions for checking valid addition/subtraction between infinite
> values were missing some cases which explains the change in error
> messages. I've updated the logic and removed duplicate checks.
>
> I removed the extract/date_part tests since they were duplicated in a
> test above. I also converted the DO command tests to using SQL with
> joins so it more closely matches the existing tests.
>
> I've updated the extract/date_part logic for infinite intervals. Fields
> that are monotonically increasing should return +/-infinity and all
> others should return NULL. For Intervals, the fields are the same as
> timestamps plus the hour and day fields since those don't overflow into
> the next highest field.
>
> I think this patch is just about ready for review, except for the
> following two questions:
>   1. Should finite checks on intervals only look at months or all three
>   fields?
>   2. Should we make the error messages for adding/subtracting infinite
>   values more generic or leave them as is?
>
> My opinions are
>   1. We should only look at months.
>   2. We should make the errors more generic.
>
> Anyone else have any thoughts?
>
> - Joe

Oops I forgot the actual patch. Please see attached.
From 4ea7c98d47dcbff1313a5013572cc79839e4417e Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 17 Dec 2022 14:21:26 -0500
Subject: [PATCH] This is WIP.

TODOs
1. Should we just use the months field to test for infinity?
2. Should the error messages for adding different sign infinties be "interval out of range"?

Ashutosh Bapat and Joe Koshakow and Jian He
---
 doc/src/sgml/datatype.sgml |   2 +-
 doc/src/sgml/func.sgml |   5 +-
 src/backend/utils/adt/date.c   |  20 ++
 src/backend/utils/adt/datetime.c   |  14 +-
 src/backend/utils/adt/timestamp.c  | 448 
 src/include/datatype/timestamp.h   |  21 ++
 src/test/regress/expected/horology.out |   6 +-
 src/test/regress/expected/interval.out | 466 +++--
 src/test/regress/sql/horology.sql  |   6 +-
 src/test/regress/sql/interval.sql  | 130 ++-
 10 files changed, 1002 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index fdffba4442..2bcf959f70 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2316,7 +2316,7 @@ TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
  
  
   infinity
-  date, timestamp
+  date, timestamp, interval
   later than all other time stamps
  
  
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3bf8d021c3..7ddf76da4a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9369,7 +9369,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
  boolean
 
 
- Test for finite interval (currently always true)
+ Test for finite interval (not +/-infinity)
 
 
  isfinite(interval '4 hours')
@@ -10256,7 +10256,8 @@ SELECT EXTRACT(YEAR FROM TIMESTAMP '2001-02-16 20:38:40');
  When the input value is +/-Infinity, extract returns
  +/-Infinity for monotonically-increasing fields (epoch,
  julian, year, isoyear,
- decade, century, and millennium).
+ decade, century, and millennium
+ for all types and hour and day just for interval).
  For other fields, NULL is returned.  PostgreSQL
  versions before 9.6 returned zero for all cases of infinite input.
 
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 99171d9c92..8334b9053f 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -2067,6 +2067,11 @@ time_pl_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeADT		result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot add infinite interval to time")));
+
 	result = time + span->time;
 	result -= result / USECS_PER_DAY * USECS_PER_DAY;
 	if (result < INT64CONST(0))
@@ -2085,6 +2090,11 @@

Re: Infinite Interval

2023-01-07 Thread Joseph Koshakow

On Thu, Jan 5, 2023 at 11:30 PM jian he  wrote:
>
>
>
> On Fri, Jan 6, 2023 at 6:54 AM Joseph Koshakow  wrote:
>>
>> Looks like some of the error messages have changed and we
>> have some issues with parsing "+infinity" after rebasing.
>
>
> There is a commit 2ceea5adb02603ef52579b568ca2c5aebed87358
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ceea5adb02603ef52579b568ca2c5aebed87358
> if you pull this commit then you can do select interval '+infinity', even 
> though I don't know why.

It turns out that I was just misreading the error. The test was
expecting us to fail on "+infinity" but we succeeded. I just removed
that test case.

>> pgindent. Looks like some of the error messages have changed

The conditions for checking valid addition/subtraction between infinite
values were missing some cases which explains the change in error
messages. I've updated the logic and removed duplicate checks.

I removed the extract/date_part tests since they were duplicated in a
test above. I also converted the DO command tests to using SQL with
joins so it more closely matches the existing tests.

I've updated the extract/date_part logic for infinite intervals. Fields
that are monotonically increasing should return +/-infinity and all
others should return NULL. For Intervals, the fields are the same as
timestamps plus the hour and day fields since those don't overflow into
the next highest field.

I think this patch is just about ready for review, except for the
following two questions:
  1. Should finite checks on intervals only look at months or all three
  fields?
  2. Should we make the error messages for adding/subtracting infinite
  values more generic or leave them as is?

My opinions are
  1. We should only look at months.
  2. We should make the errors more generic.

Anyone else have any thoughts?

- Joe

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

=?UTF-8?B?w5ZuZGVyIEthbGFjxLE=?= writes:
> Attached v22.

I took a very brief look through this. I'm not too pleased with
this whole line of development TBH. It seems to me that the core
design of execReplication.c and related code is "let's build our
own half-baked executor and much-less-than-half-baked planner,
because XXX". (I'm not too sure what XXX was, really, but apparently
somebody managed to convince people that that is a sane and
maintainable design.) Now this patch has decided that it *will*
use the real planner, or at least portions of it in some cases.
If we're going to do that ISTM we ought to replace all the existing
not-really-a-planner logic, but this has not done that; instead
we have a large net addition to the already very duplicative
replication code, with weird restrictions because it doesn't want
to make changes to the half-baked executor.

I think we should either live within the constraints set by this
overarching design, or else nuke execReplication.c from orbit and
start using the real planner and executor. Perhaps the foreign
key enforcement mechanisms could be a model --- although if you
don't want to buy into using SPI as well, you probably should look
at Amit L's work at [1].

Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
sanely defined in the first place? It looks to me that
RelationFindReplTupleSeq assumes without proof that there is a unique
full-tuple match, but that is only reasonable to assume if there is at
least one unique index (and maybe not even then, if nulls are involved).
If there is a unique index, why can't that be chosen as replica identity?
If there isn't, what semantics are we actually providing?

What I'm thinking about is that maybe REPLICA IDENTITY FULL should be
defined as "the subscriber can pick any unique index to match on,
and is allowed to fail if the table has none". Or if "fail" is a bridge
too far for you, we could fall back to the existing seqscan logic.
But thumbing through the existing indexes to find a non-expression unique
index wouldn't require invoking the full planner. Any candidate index
would result in a plan estimated to fetch just one row, so there aren't
likely to be serious cost differences.

regards, tom lane

[1]
https://www.postgresql.org/message-id/flat/CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=phkmodk1rywxf...@mail.gmail.com

Re: [RFC] Add jit deform_counter

2023-01-07 Thread Pavel Stehule

so 7. 1. 2023 v 16:48 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Fri, Jan 06, 2023 at 09:42:09AM +0100, Pavel Stehule wrote:
> > The explain part is working, the part of pg_stat_statements doesn't
> >
> > set jit_above_cost to 10;
> > set jit_optimize_above_cost to 10;
> > set jit_inline_above_cost to 10;
> >
> > (2023-01-06 09:08:59) postgres=# explain analyze select
> > count(length(prosrc) > 0) from pg_proc;
> >
> ┌┐
> > │ QUERY PLAN
> >   │
> >
> ╞╡
> > │ Aggregate  (cost=154.10..154.11 rows=1 width=8) (actual
> > time=132.320..132.321 rows=1 loops=1)
>   │
> > │   ->  Seq Scan on pg_proc  (cost=0.00..129.63 rows=3263 width=16)
> (actual
> > time=0.013..0.301 rows=3266 loops=1) │
> > │ Planning Time: 0.070 ms
> >  │
> > │ JIT:
> >   │
> > │   Functions: 3
> >   │
> > │   Options: Inlining true, Optimization true, Expressions true,
> Deforming
> > true  │
> > │   Timing: Generation 0.597 ms, Deforming 0.407 ms, Inlining 8.943 ms,
> > Optimization 79.403 ms, Emission 43.091 ms, Total 132.034 ms │
> > │ Execution Time: 132.986 ms
> >   │
> >
> └┘
> > (8 rows)
> >
> > I see the result of deforming in explain analyze, but related values in
> > pg_stat_statements are 0.
>
> I'm not sure why, but pgss jit metrics are always nulls for explain
> analyze queries. I have noticed this with surprise myself, when recently
> was reviewing the lazy jit patch, but haven't yet figure out what is the
> reason. Anyway, without "explain analyze" you'll get correct deforming
> numbers in pgss.
>

It was really strange, because I tested the queries without EXPLAIN ANALYZE
too, and new columns were always zero on my comp. Other jit columns were
filled.  But I didn't do a deeper investigation.



> > Minimally, the values are assigned in wrong order
> >
> > +   if (api_version >= PGSS_V1_11)
> > +   {
> > +   values[i++] = Float8GetDatumFast(tmp.jit_deform_time);
> > +   values[i++] = Int64GetDatumFast(tmp.jit_deform_count);
> > +   }
>
> (facepalm) Yep, will fix the order.
>
> > After reading the doc, I am confused what this metric means
> >
> > + 
> > +  
> > +   jit_deform_count bigint
> > +  
> > +  
> > +   Number of times tuples have been deformed
> > +  
> > + 
> > +
> > + 
> > +  
> > +   jit_deform_time double
> > precision
> > +  
> > +  
> > +   Total time spent by the statement on deforming tuples, in
> > milliseconds
> > +  
> > + 
> >
> > It is not clean so these times and these numbers are related just to the
> > compilation of the deforming process, not by own deforming.
>
> Good point, I need to formulate this more clearly.
>

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-07 Thread Michail Nikolaev

Hello.

The few things I have got so far:

1) It is not required to order by random() to reproduce the issue - it
could be done using queries like:

  BEGIN;
  SELECT omg.*
 FROM something_is_wrong_here AS omg
 ORDER BY value -- change is here
 LIMIT 1
 FOR UPDATE
  \gset


  UPDATE something_is_wrong_here SET value = :value + 1 WHERE id = :id;
  COMMIT;

But for some reason it is harder to reproduce without random in my
case (typically need to wait for about a minute with 100 connections).

2) It is not an issue at table creation time. Issue is reproducible if
vacuum_defer_cleanup_age set after table preparation.

3) To reproduce the issue, vacuum_defer_cleanup_age should flip xid
over zero (be >= txid_current()).
And it is stable So, for example - unable to reproduce with 733
value, but 734 gives error each time.
Just a single additional txid_current() (after data is filled) fixes a
crash... It looks like the first SELECT FOR UPDATE + UPDATE silently
poisons everything somehow.
You could use such PSQL script:

  DROP TABLE IF EXISTS something_is_wrong_here;

  CREATE TABLE something_is_wrong_here (id bigserial PRIMARY KEY,
value numeric(15,4) DEFAULT 0 NOT NULL);

  INSERT INTO something_is_wrong_here (value) (SELECT 1 from
generate_series(0, 100));

  SELECT txid_current() \gset

  SELECT :txid_current + 1 as txid \gset

  ALTER SYSTEM SET vacuum_defer_cleanup_age to :txid;SELECT
pg_reload_conf();

I have attached some scripts if someone goes to reproduce.

Best regards,
Michail.


reproduce.sh
Description: application/shellscript


reproduce.bench
Description: Binary data


scheme.sql
Description: application/sql

Re: Todo: Teach planner to evaluate multiple windows in the optimal order



On 07/01/23 21:57, Ankit Kumar Pandey wrote:

On 07/01/23 09:58, David Rowley wrote:
>
> The attached patch has no tests added. It's going to need some of
> those.

While writing test cases, I found that optimization do not happen for
case #1

(which is prime candidate for such operation) like

EXPLAIN (COSTS OFF)
SELECT empno,
     depname,
     min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
     sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date

This happens because mutual exclusiveness of two operands (when number
of window functions > 1) viz

is_sorted and last activeWindow in the condition:

( !is_sorted && lnext(activeWindows, l) == NULL)

For 2nd last window function, is_sorted is false and path keys get added.

In next run (for last window function), is_sorted becomes true and whole
optimization

part is skipped.

Note: Major issue that if I remove is_sorted from condition, even though

path keys are added, it still do not perform optimization and works same
as in master/unoptimized case.

Perhaps adding path keys at last window function is not doing trick?
Maybe we need to add pathkeys

to all window functions which are subset of query's order by
irrespective of being last or not?


Case #2:

For presorted columns, eg

CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;
EXPLAIN (COSTS OFF)
SELECT empno,
      min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;

Is this correct plan:

a)

    QUERY PLAN
---
   Incremental Sort
     Sort Key: depname, empno
     Presorted Key: depname
     ->  WindowAgg
   ->  Index Scan using depname_idx on empsalary
(5 rows)

or this:

b) (Akin to Optimized version)

    QUERY PLAN
---
   WindowAgg
     ->  Incremental Sort
   Sort Key: depname, empno
   Presorted Key: depname
   ->  Index Scan using depname_idx on empsalary
(5 rows)

Patched version does (a) because of is_sorted condition.

If we remove both is_sorted and lnext(activeWindows, l) == NULL conditions,

we get correct results in these two cases.



Attached patch with test cases.

For case #2, test cases still uses (a) as expected output which I don't 
think is right


and we should revisit. Other than that, only failing case is due to 
issue mentioned in case #1.


Thanks


--
Regards,
Ankit Kumar Pandey
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..781916f219 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4423,6 +4423,7 @@ create_one_window_path(PlannerInfo *root,
 	PathTarget *window_target;
 	ListCell   *l;
 	List	   *topqual = NIL;
+	bool		has_runcondition = false;
 
 	/*
 	 * Since each window clause could require a different sort order, we stack
@@ -4457,6 +4458,42 @@ create_one_window_path(PlannerInfo *root,
 path->pathkeys,
 _keys);
 
+		has_runcondition |= (wc->runCondition != NIL);
+
+		/*
+		 * If not sorted already and the query has an ORDER BY clause, then we
+		 * try to push the entire ORDER BY below the final WindowAgg.  We
+		 * don't do this when the query has a DISTINCT clause as the planner
+		 * might opt to do a Hash Aggregate and spoil our efforts here.  We
+		 * also can't do this when the ORDER BY contains any WindowFuncs as we
+		 * obviously can't sort something that's yet to be evaluated.  We also
+		 * don't bother with this when there is a WindowClauses with a
+		 * runCondition as those can reduce the number of rows to sort in the
+		 * ORDER BY and we don't want add complexity to the WindowAgg sort if
+		 * the final ORDER BY sort is going to have far fewer rows to sort.
+		 * For the same reason, don't bother if the query has a LIMIT clause.
+		 */
+		if (!is_sorted && lnext(activeWindows, l) == NULL &&
+			root->parse->distinctClause == NIL &&
+			root->parse->sortClause != NIL &&
+			!has_runcondition && root->parse->limitCount == NULL &&
+			!contain_window_function((Node *) get_sortgrouplist_exprs(root->parse->sortClause,
+	  root->processed_tlist)))
+		{
+			List	   *orderby_pathkeys;
+
+			orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+			 root->parse->sortClause,
+			 root->processed_tlist);
+
+			if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+window_pathkeys = orderby_pathkeys;
+
+			is_sorted = pathkeys_count_contained_in(window_pathkeys,
+	path->pathkeys,
+	_keys);
+		}
+
 		/* Sort if necessary */
 		if (!is_sorted)
 		{
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..2b8b5c4533 100644
--- a/src/test/regress/expected/window.out
+++

Re: Todo: Teach planner to evaluate multiple windows in the optimal order




On 07/01/23 09:58, David Rowley wrote:


The attached patch has no tests added. It's going to need some of
those.


While writing test cases, I found that optimization do not happen for 
case #1


(which is prime candidate for such operation) like

EXPLAIN (COSTS OFF)
SELECT empno,
   depname,
   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
   sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date

This happens because mutual exclusiveness of two operands (when number 
of window functions > 1) viz


is_sorted and last activeWindow in the condition:

( !is_sorted && lnext(activeWindows, l) == NULL)

For 2nd last window function, is_sorted is false and path keys get added.

In next run (for last window function), is_sorted becomes true and whole 
optimization


part is skipped.

Note: Major issue that if I remove is_sorted from condition, even though

path keys are added, it still do not perform optimization and works same 
as in master/unoptimized case.


Perhaps adding path keys at last window function is not doing trick? 
Maybe we need to add pathkeys


to all window functions which are subset of query's order by 
irrespective of being last or not?



Case #2:

For presorted columns, eg

CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;
EXPLAIN (COSTS OFF)
SELECT empno,
    min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;

Is this correct plan:

a)

  QUERY PLAN
---
 Incremental Sort
   Sort Key: depname, empno
   Presorted Key: depname
   ->  WindowAgg
 ->  Index Scan using depname_idx on empsalary
(5 rows)

or this:

b) (Akin to Optimized version)

  QUERY PLAN
---
 WindowAgg
   ->  Incremental Sort
 Sort Key: depname, empno
 Presorted Key: depname
 ->  Index Scan using depname_idx on empsalary
(5 rows)

Patched version does (a) because of is_sorted condition.

If we remove both is_sorted and lnext(activeWindows, l) == NULL conditions,

we get correct results in these two cases.


--
Regards,
Ankit Kumar Pandey

Re: [RFC] Add jit deform_counter

2023-01-07 Thread Dmitry Dolgov

> On Fri, Jan 06, 2023 at 09:42:09AM +0100, Pavel Stehule wrote:
> The explain part is working, the part of pg_stat_statements doesn't
>
> set jit_above_cost to 10;
> set jit_optimize_above_cost to 10;
> set jit_inline_above_cost to 10;
>
> (2023-01-06 09:08:59) postgres=# explain analyze select
> count(length(prosrc) > 0) from pg_proc;
> ┌┐
> │ QUERY PLAN
>   │
> ╞╡
> │ Aggregate  (cost=154.10..154.11 rows=1 width=8) (actual
> time=132.320..132.321 rows=1 loops=1)  │
> │   ->  Seq Scan on pg_proc  (cost=0.00..129.63 rows=3263 width=16) (actual
> time=0.013..0.301 rows=3266 loops=1) │
> │ Planning Time: 0.070 ms
>  │
> │ JIT:
>   │
> │   Functions: 3
>   │
> │   Options: Inlining true, Optimization true, Expressions true, Deforming
> true  │
> │   Timing: Generation 0.597 ms, Deforming 0.407 ms, Inlining 8.943 ms,
> Optimization 79.403 ms, Emission 43.091 ms, Total 132.034 ms │
> │ Execution Time: 132.986 ms
>   │
> └┘
> (8 rows)
>
> I see the result of deforming in explain analyze, but related values in
> pg_stat_statements are 0.

I'm not sure why, but pgss jit metrics are always nulls for explain
analyze queries. I have noticed this with surprise myself, when recently
was reviewing the lazy jit patch, but haven't yet figure out what is the
reason. Anyway, without "explain analyze" you'll get correct deforming
numbers in pgss.

> Minimally, the values are assigned in wrong order
>
> +   if (api_version >= PGSS_V1_11)
> +   {
> +   values[i++] = Float8GetDatumFast(tmp.jit_deform_time);
> +   values[i++] = Int64GetDatumFast(tmp.jit_deform_count);
> +   }

(facepalm) Yep, will fix the order.

> After reading the doc, I am confused what this metric means
>
> + 
> +  
> +   jit_deform_count bigint
> +  
> +  
> +   Number of times tuples have been deformed
> +  
> + 
> +
> + 
> +  
> +   jit_deform_time double
> precision
> +  
> +  
> +   Total time spent by the statement on deforming tuples, in
> milliseconds
> +  
> + 
>
> It is not clean so these times and these numbers are related just to the
> compilation of the deforming process, not by own deforming.

Good point, I need to formulate this more clearly.

Re: [PATCH] Add function to_oct

vignesh C  writes:
> Few suggestions
> 1) We could use to_oct instead of to_oct32 as we don't have multiple
> implementations for to_oct

That seems (a) shortsighted and (b) inconsistent with the naming
pattern used for to_hex, so I doubt it'd be an improvement.

regards, tom lane

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

Tomas Vondra  writes:
> However, maybe views are not the best / most common example to think
> about. I'd imagine it's much more common to reference a regular table,
> but the table gets truncated / populated quickly, and/or the autovacuum
> workers are busy so it takes time to update reltuples. But in this case
> it's also quite simple to fix the correlation by just ordering by ctid
> (which I guess we might do based on the relkind).

> There's a minor issue with partitioned tables, with foreign partitions
> pointing to remote views. This is kinda broken, because the sample size
> for individual relations is determined based on relpages. But that's 0
> for views, so these partitions get ignored when building the sample. But
> that's a pre-existing issue.

I wonder if we should stop consulting reltuples directly at all,
and instead do

"EXPLAIN SELECT * FROM remote_table"

to get the remote planner's estimate of the number of rows involved.
Even for a plain table, that's likely to be a better number.

regards, tom lane

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-07 Thread Tomas Vondra

On 1/6/23 23:41, Tomas Vondra wrote:
> On 1/6/23 17:58, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> The one difference is that I realized the relkind check does not
>>> actually say we can't do sampling - it just means we can't use
>>> TABLESAMPLE to do it. We could still use "random()" ...
>>
>>> Furthermore, I don't think we should silently disable sampling when the
>>> user explicitly requests TABLESAMPLE by specifying bernoulli/system for
>>> the table - IMHO it's less surprising to just fail in that case.
>>
>> Agreed on both points.  This patch looks good to me.
>>
> 
> Good, I'll get this committed.The "ORDER BY random()" idea is a possible
> improvement, can be discussed on it's own.
> 

Pushed.

>>> Of course, all relkinds that don't support TABLESAMPLE currently have
>>> reltuples value that will disable sampling anyway (e.g. views have -1).
>>> So we won't actually fallback to random() anyway, because we can't
>>> calculate the sample fraction.
>>> That's a bit annoying for foreign tables pointing at a view, which is a
>>> more likely use case than table pointing at a sequence.
>>
>> Right, that's a case worth being concerned about.
>>
>>> But I realized we could actually still do "random()" sampling:
>>> SELECT * FROM t ORDER BY random() LIMIT $X;
>>
>> Hmm, interesting idea, but it would totally bollix our correlation
>> estimates.  Not sure that those are worth anything for remote views,
>> but still...
> 
> But isn't that an issue that we already have? I mean, if we do ANALYZE
> on a foreign table pointing to a view, we fetch all the results. But if
> the view does not have a well-defined ORDER BY, a trivial plan change
> may change the order of results and thus the correlation.
> 
> Actually, how is a correlation even defined for a view?
> 
> It's true this "ORDER BY random()" thing would make it less stable, as
> it would change the correlation on every run. Although - the calculated
> correlation is actually quite stable, because it's guaranteed to be
> pretty close to 0 because we make the order random.
> 
> Maybe that's actually better than the current state where it depends on
> the plan? Why not to look at the relkind and just set correlation to 0.0
> in such cases?
> 
> But if we want to restore that current behavior (where it depends on the
> actual query plan), we could do something like this:
> 
>SELECT * FROM the_remote_view ORDER BY row_number() over ();
> 
> But yeah, this makes the remote sampling more expensive. Probably still
> a win because of network costs, but not great.
> 

I've been thinking about this a bit more, and I'll consider submitting a
patch for the next CF. IMHO it's probably better to accept correlation
being 0 in these cases - it's more representative of what we know about
the view / plan output (or rather lack of knowledge).

However, maybe views are not the best / most common example to think
about. I'd imagine it's much more common to reference a regular table,
but the table gets truncated / populated quickly, and/or the autovacuum
workers are busy so it takes time to update reltuples. But in this case
it's also quite simple to fix the correlation by just ordering by ctid
(which I guess we might do based on the relkind).

There's a minor issue with partitioned tables, with foreign partitions
pointing to remote views. This is kinda broken, because the sample size
for individual relations is determined based on relpages. But that's 0
for views, so these partitions get ignored when building the sample. But
that's a pre-existing issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-01-07 Thread Dean Rasheed

On Thu, 5 Jan 2023 at 13:21, Dean Rasheed  wrote:
>
> On Thu, 5 Jan 2023 at 11:03, Alvaro Herrera  wrote:
> >
> > > + /* Join type required */
> > > + if (left_join && right_join)
> > > + qry->mergeJoinType = JOIN_FULL;
> > > + else if (left_join)
> > > + qry->mergeJoinType = JOIN_LEFT;
> > > + else if (right_join)
> > > + qry->mergeJoinType = JOIN_RIGHT;
> > > + else
> > > + qry->mergeJoinType = JOIN_INNER;
> >
> > One of the review comments that MERGE got initially was that parse
> > analysis was not a place to "do query optimization", in the sense that
> > the original code was making a decision whether to make an outer or
> > inner join based on the set of WHEN clauses that appear in the command.
> > That's how we ended up with transform_MERGE_to_join and
> > mergeUseOuterJoin instead.  This new code is certainly not the same, but
> > it makes me a bit unconfortable.  Maybe it's OK, though.
> >
>
> Yeah I agree, it's a bit ugly. Perhaps a better solution would be to
> do away with that field entirely and just make the decision in
> transform_MERGE_to_join() by examining the action list again.
>

Attached is an updated patch taking that approach, allowing
mergeUseOuterJoin to be removed from the Query node, which I think is
probably a good thing.

Aside from that, it includes a few additional comment updates in the
executor that I'd missed, and psql tab completion support.

Regards,
Dean
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
new file mode 100644
index b87ad5c..1482ede
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -396,8 +396,8 @@
 originally matched appears later in the list of actions.
 On the other hand, if the row is concurrently updated or deleted so
 that the join condition fails, then MERGE will
-evaluate the condition's NOT MATCHED actions next,
-and execute the first one that succeeds.
+evaluate the condition's NOT MATCHED [BY TARGET]
+actions next, and execute the first one that succeeds.
 If MERGE attempts an INSERT
 and a unique index is present and a duplicate row is concurrently
 inserted, then a uniqueness violation error is raised;
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
new file mode 100644
index 0995fe0..8ef121a
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -33,7 +33,8 @@ USING dat
 and when_clause is:
 
 { WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
-  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }
+  WHEN NOT MATCHED BY SOURCE [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
+  WHEN NOT MATCHED [BY TARGET] [ AND condition ] THEN { merge_insert | DO NOTHING } }
 
 and merge_insert is:
 
@@ -70,7 +71,9 @@ DELETE
from data_source to
target_table_name
producing zero or more candidate change rows.  For each candidate change
-   row, the status of MATCHED or NOT MATCHED
+   row, the status of MATCHED,
+   NOT MATCHED BY SOURCE,
+   or NOT MATCHED [BY TARGET]
is set just once, after which WHEN clauses are evaluated
in the order specified.  For each candidate change row, the first clause to
evaluate as true is executed.  No more than one WHEN
@@ -226,16 +229,37 @@ DELETE
   At least one WHEN clause is required.
  
  
+  The WHEN clause may specify WHEN MATCHED,
+  WHEN NOT MATCHED BY SOURCE, or
+  WHEN NOT MATCHED [BY TARGET].
+  Note that the SQL standard only defines
+  WHEN MATCHED and WHEN NOT MATCHED
+  (which is defined to mean no matching target row).
+  WHEN NOT MATCHED BY SOURCE is an extension to the
+  SQL standard, as is the option to append
+  BY TARGET to WHEN NOT MATCHED, to
+  make its meaning more explicit.
+ 
+ 
   If the WHEN clause specifies WHEN MATCHED
-  and the candidate change row matches a row in the
+  and the candidate change row matches a row in the source to a row in the
   target_table_name,
   the WHEN clause is executed if the
   condition is
   absent or it evaluates to true.
  
  
-  Conversely, if the WHEN clause specifies
-  WHEN NOT MATCHED
+  If the WHEN clause specifies
+  WHEN NOT MATCHED BY SOURCE and the candidate change
+  row represents a row in the
+  target_table_name that does
+  not match a source row, the WHEN clause is executed
+  if the condition is
+  absent or it evaluates to true.
+ 
+ 
+  If the WHEN clause specifies
+  WHEN NOT MATCHED [BY TARGET]
   and the candidate change row does not match a row in the
   target_table_name,
   the WHEN clause is executed if the
@@ -257,7 +281,10 @@ DELETE
  
   A condition on a WHEN MATCHED clause can refer to columns
   in both the source and the target relations. A condition on a
-  WHEN NOT MATCHED clause can only

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

On 07/01/23 17:28, David Rowley wrote:

Your email client seems to be adding additional vertical space to your
emails. I've removed the additional newlines in the quotes. Are you
able to fix the client so it does not do that?

I have adjusted my mail client, hope it is better now?

On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey  wrote:
>
> On 07/01/23 09:58, David Rowley wrote:
> > You also don't seem to be considering the fact that the query might
> > have a DISTINCT clause.
>
> Major reason for this was that I am not exactly aware of what distinct
> clause means (especially in
>
> context of window functions) and how it is different from other
> sortClauses (partition, order by, group).

I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT.
If you look in the grouping_planner() function, you'll see that
create_distinct_paths() is called between create_window_paths() and
create_ordered_paths().

Yes just saw this and got what you meant.

> Yes, this is a fair point. Multiple sort is actually beneficial in cases
> like this, perhaps limits clause and runCondition should be no op too?

I'm not sure what you mean by "no op". Do you mean not apply the optimization?

Yes, no op = no optimization. Sorry I didn't mention it before.

> > I think the patch should also be using pathkeys_contained_in() and
> > Lists of pathkeys rather than concatenating lists of SortGroupClauses
> > together.  That should allow things to work correctly when a given
> > pathkey has become redundant due to either duplication or a Const in
> > the Eclass.
>
> Make sense, I actually duplicated that logic from
> make_pathkeys_for_window. We should make this changes there as well because
> if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)
> (weird example but you get the idea), it leads to duplicates in
> window_sortclauses.

It won't lead to duplicate pathkeys. Look in
make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does.
Notice that it checks if the pathkey already exists in the list before
appending.

Okay I see this, pathkey_is_redundant is much more smarter as well.

Replacing list_concat_copy with list_concat_unique in 
make_pathkeys_for_window

won't be of much benefit.

> Agree with runConditions part but for limit clause, row reduction happens
> at the last, so whether we use patched version or master version,
> none of sorts would benefit/degrade from that, right?

Maybe you're right. Just be aware that a sort done for a query with an
ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to
store the top-n tuples and that can significantly reduce the memory
required for sorting perhaps resulting in the sort fitting in memory
rather than spilling out to disk.

You might want to test this by having the leading sort column as an
INT, and then the 2nd one as a long text column of maybe around two
kilobytes. Make all the leading column values the same so that the
comparison for the text column is always performed. Make the LIMIT
small compared to the total number of rows, that should test the worse
case.  Check the performance with and without the limitCount != NULL
part of the patch that disables the optimization for LIMIT.

I checked this. For limit <<< total number of rows, top-n sort was

performed but when I changed limit to higher value (or no limit),

quick sort was performed.

Top-n sort was twice as fast.

Also, tested (first) patch version vs master, top-n sort

was twice as fast there as well (outputs mentioned below).

Current patch version (with limit excluded for optimizations)

explain (analyze ,costs off) select count(*) over (order by id) from tt 
order by id, name limit 1;

    QUERY PLAN
---
 Limit (actual time=1.718..1.719 rows=1 loops=1)
   ->  Incremental Sort (actual time=1.717..1.717 rows=1 loops=1)
 Sort Key: id, name
 Presorted Key: id
 Full-sort Groups: 1  Sort Method: top-N heapsort  Average 
Memory: 25kB  Peak Memory: 25kB

 ->  WindowAgg (actual time=0.028..0.036 rows=6 loops=1)
   ->  Sort (actual time=0.017..0.018 rows=6 loops=1)
 Sort Key: id
 Sort Method: quicksort  Memory: 25kB
 ->  Seq Scan on tt (actual time=0.011..0.012 
rows=6 loops=1)

 Planning Time: 0.069 ms
 Execution Time: 1.799 ms

Earlier patch(which included limit clause for optimizations)

explain (analyze ,costs off) select count(*) over (order by id) from tt 
order by id, name limit 1;

 QUERY PLAN

 Limit (actual time=3.766..3.767 rows=1 loops=1)
   ->  WindowAgg (actual time=3.764..3.765 rows=1 loops=1)
 ->  Sort (actual time=3.749..3.750 rows=6 loops=1)
   Sort Key: id, name
   Sort Method: quicksort

Re: [BUG] Logical replica crash if there was an error in a function.

2023-01-07 Thread vignesh C

On Sun, 11 Dec 2022 at 09:21, Anton A. Melnikov  wrote:
>
>
> On 07.12.2022 21:03, Andres Freund wrote:
>
> >
> > This CF entry causes tests to fail on all platforms:
> > https://cirrus-ci.com/build/5755408111894528
> >
> > E.g.
> > https://api.cirrus-ci.com/v1/artifact/task/5298457144459264/testrun/build/testrun/subscription/100_bugs/log/regress_log_100_bugs
> >
> >  Begin standard error
> > psql::1: NOTICE:  dropped replication slot "sub1" on publisher
> >  End standard error
> > timed out waiting for match: ERROR:  relation "error_name" does not exist 
> > at character at /tmp/cirrus-ci-build/src/test/subscription/t/100_bugs.pl 
> > line 115.
> >
> > Greetings,
> >
> > Andres Freund
>
> Thank you for reminding!
>
> There was a conflict when applying v2 on current master.
> Rebased v3 is attached.

Few suggestions:
1) There is a warning:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+   "ERROR:  relation \"error_name\" does not exist at character"
+);

 "my" variable $result masks earlier declaration in same scope at
t/100_bugs.pl line 400.

You can change:
my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
to
$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");

2) Now that the crash is fixed, you could change it to a better message:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+   "ERROR:  relation \"error_name\" does not exist at character"
+);

Regards,
Vignesh

Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE

2023-01-07 Thread vignesh C

On Fri, 6 Jan 2023 at 15:33, Dean Rasheed  wrote:
>
> On Fri, 6 Jan 2023 at 02:38, vignesh C  wrote:
> >
> > On Thu, 5 Jan 2023 at 18:22, Dean Rasheed  wrote:
> > >
> > > That leads to the attached, which barring objections, I'll push shortly.
> >
> > The changes look good to me.
> >
>
> Pushed.

Thanks for pushing this.

Regards,
Vignesh

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

Your email client seems to be adding additional vertical space to your
emails. I've removed the additional newlines in the quotes. Are you
able to fix the client so it does not do that?

On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey  wrote:
>
> On 07/01/23 09:58, David Rowley wrote:
> > You also don't seem to be considering the fact that the query might
> > have a DISTINCT clause.
>
> Major reason for this was that I am not exactly aware of what distinct
> clause means (especially in
>
> context of window functions) and how it is different from other
> sortClauses (partition, order by, group).

I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT.
If you look in the grouping_planner() function, you'll see that
create_distinct_paths() is called between create_window_paths() and
create_ordered_paths().

> Yes, this is a fair point. Multiple sort is actually beneficial in cases
> like this, perhaps limits clause and runCondition should be no op too?

I'm not sure what you mean by "no op". Do you mean not apply the optimization?

> > I think the patch should also be using pathkeys_contained_in() and
> > Lists of pathkeys rather than concatenating lists of SortGroupClauses
> > together.  That should allow things to work correctly when a given
> > pathkey has become redundant due to either duplication or a Const in
> > the Eclass.
>
> Make sense, I actually duplicated that logic from
> make_pathkeys_for_window. We should make this changes there as well because
> if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)
> (weird example but you get the idea), it leads to duplicates in
> window_sortclauses.

It won't lead to duplicate pathkeys. Look in
make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does.
Notice that it checks if the pathkey already exists in the list before
appending.

> Agree with runConditions part but for limit clause, row reduction happens
> at the last, so whether we use patched version or master version,
> none of sorts would benefit/degrade from that, right?

Maybe you're right. Just be aware that a sort done for a query with an
ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to
store the top-n tuples and that can significantly reduce the memory
required for sorting perhaps resulting in the sort fitting in memory
rather than spilling out to disk.

You might want to test this by having the leading sort column as an
INT, and then the 2nd one as a long text column of maybe around two
kilobytes. Make all the leading column values the same so that the
comparison for the text column is always performed. Make the LIMIT
small compared to the total number of rows, that should test the worse
case.  Check the performance with and without the limitCount != NULL
part of the patch that disables the optimization for LIMIT.

> Is it okay
> to add comments in test cases? I don't see it much on existing cases
> so kind of reluctant to add but it makes intentions much more clear.

I think tests should always have a comment to state what they're
testing. Not many people seem to do that, unfortunately. The problem
with not stating what the test is testing is that if, for example, the
test is checking that the EXPLAIN output is showing a Sort, what if at
some point in the future someone adjusts some costing code and the
plan changes to an Index Scan.  If there's no comment to state that
we're looking for a Sort plan, then the author of the patch that's
adjusting the costs might just think it's ok to change the expected
plan to an Index Scan.   I've seen this problem occur even when the
comments *do* exist. There's just about no hope of such a test
continuing to do what it's meant to if the comments don't exist.

David

Re: add \dpS to psql

2023-01-07 Thread Dean Rasheed

On Sat, 7 Jan 2023 at 00:36, Nathan Bossart  wrote:
>
> On Fri, Jan 06, 2023 at 06:52:33PM +, Dean Rasheed wrote:
> >
> > So I think we should use the same SQL clauses as every other psql
> > command that supports "S", namely:
> >
> > if (!showSystem && !pattern)
> > appendPQExpBufferStr(, "  AND n.nspname <> 'pg_catalog'\n"
> >  "  AND n.nspname <> 
> > 'information_schema'\n");
>
> Good catch.  I should have noticed this.  The deleted comment mentions that
> the system/temp tables normally aren't very interesting from a permissions
> perspective, so perhaps there is an argument for always excluding temp
> tables without a pattern.  After all, \dp always excludes indexes and TOAST
> tables.  However, it looks like \dt includes temp tables, and I didn't see
> any other meta-commands that excluded them.
>

It might be true that temp tables aren't usually interesting from a
permissions point of view, but it's not hard to imagine situations
where interesting things do happen. It's also probably the case that
most users won't have many temp tables, so I don't think including
them by default will be particularly intrusive.

Also, from a user perspective, I think it would be something of a POLA
violation for \dp[S] and \dt[S] to include different sets of tables,
though I appreciate that we do that now. There's nothing in the docs
to indicate that that's the case.

Anyway, I've pushed the v2 patch as-is. If anyone feels strongly
enough that we should change its behaviour for temp tables, then we
can still discuss that.

Regards,
Dean

Re: Todo: Teach planner to evaluate multiple windows in the optimal order




On 07/01/23 07:59, David Rowley wrote:

On Thu, 5 Jan 2023 at 04:11, Ankit Kumar Pandey  wrote:

Attaching test cases for this (+ small change in doc).

Tested this in one of WIP branch where I had modified
select_active_windows and it failed

as expected.

Please let me know if something can be improved in this.

Thanks for writing that.

I had a look over the patch and ended up making some adjustments to
the tests. Looking back at 728202b63, I think any tests we add here
should be kept alongside the tests added by that commit rather than
tacked on to the end of the test file.  It also makes sense to me just
to use the same table as the original tests. I also thought the
comment in select_active_windows should be in the sort comparator
function instead. I think that's a more likely place to capture the
attention of anyone making modifications.

Thanks, I will look it through.

I've now pushed the adjusted patch.


I can't seem to find updated patch in the attachment, can you please

forward the patch again.

Thanks.

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order


Thanks for looking into this.

On 07/01/23 09:58, David Rowley wrote:

On Sat, 7 Jan 2023 at 02:11, Ankit Kumar Pandey  wrote:

On 05/01/23 12:53, David Rowley wrote:

We *can* reuse Sorts where a more strict or equivalent sort order is
available.  The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY.  One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.

I attempted this in attached patch.

I had a quick look at this and it's going to need some additional code
to ensure there are no WindowFuncs in the ORDER BY clause.  We can't
sort on those before we evaluate them.

Okay I will add this check.

I also don't think there's any point in adding the additional pathkeys
when the input path is already presorted.

It would be better to leave this case alone and just do the
incremental sort afterwards.


So this will be no operation case well.


You also don't seem to be considering the fact that the query might
have a DISTINCT clause.


Major reason for this was that I am not exactly aware of what distinct 
clause means (especially in


context of window functions) and how it is different from other 
sortClauses (partition, order by, group).


Comments in parsenodes.h didn't help.


   That's evaluated between window functions and
the order by. It would be fairly useless to do a more strict sort when
the sort order is going to be obliterated by a Hash Aggregate. Perhaps
we can just not do this when the query has a DISTINCT clause.

On the other hand, there are also a few reasons why we shouldn't do
this. I mentioned the WindowClause runConditions earlier here.

The patched version produces:

postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
   QUERY PLAN
--
  WindowAgg (actual time=0.488..0.497 rows=9 loops=1)
Run Condition: (row_number() OVER (?) < 10)
->  Sort (actual time=0.466..0.468 rows=10 loops=1)
  Sort Key: pg_class.oid, pg_class.relname
  Sort Method: quicksort  Memory: 67kB
  ->  Seq Scan on pg_class (actual time=0.028..0.170 rows=420 loops=1)
  Planning Time: 0.214 ms
  Execution Time: 0.581 ms
(8 rows)

Whereas master produces:

postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
QUERY PLAN

  Incremental Sort (actual time=0.506..0.508 rows=9 loops=1)
Sort Key: pg_class.oid, pg_class.relname
Presorted Key: pg_class.oid
Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 26kB
Peak Memory: 26kB
->  WindowAgg (actual time=0.475..0.483 rows=9 loops=1)
  Run Condition: (row_number() OVER (?) < 10)
  ->  Sort (actual time=0.461..0.461 rows=10 loops=1)
Sort Key: pg_class.oid
Sort Method: quicksort  Memory: 67kB
->  Seq Scan on pg_class (actual time=0.022..0.178
rows=420 loops=1)
  Planning Time: 0.245 ms
  Execution Time: 0.594 ms
(12 rows)

(slightly bad example since oid is unique but...)

It's not too clear to me that the patched version is a better plan.
The bottom level sort, which sorts 420 rows has a more complex
comparison to do. Imagine the 2nd column is a long text string. That
would make the sort much more expensive.  The top-level sort has far
fewer rows to sort due to the runCondition filtering out anything that
does not match rn1 < 10. The same can be said for a query with a LIMIT
clause.


Yes, this is a fair point. Multiple sort is actually beneficial in cases

like this, perhaps limits clause and runCondition should be no op too?


I think the patch should also be using pathkeys_contained_in() and
Lists of pathkeys rather than concatenating lists of SortGroupClauses
together.  That should allow things to work correctly when a given
pathkey has become redundant due to either duplication or a Const in
the Eclass.


Make sense, I actually duplicated that logic from

make_pathkeys_for_window. We should make this changes there as well because

if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)

(weird example but you get the idea), it leads to duplicates in 
window_sortclauses.



Also, since I seem to be only be able to think of these cases properly
by actually trying them, I ended up with the attached patch.  I opted
to not do the optimisation when there are runConditions or a LIMIT
clause.

Re: [PATCH] Add function to_oct

2023-01-07 Thread vignesh C

On Thu, 22 Dec 2022 at 23:11, Eric Radman  wrote:
>
> On Thu, Dec 22, 2022 at 10:08:17AM +, Dag Lem wrote:
> >
> > The calculation of quotient and remainder can be replaced by less costly 
> > masking and shifting.
> >
> > Defining
> >
> > #define OCT_DIGIT_BITS 3
> > #define OCT_DIGIT_BITMASK 0x7
> >
> > the content of the loop can be replaced by
> >
> >   *--ptr = digits[value & OCT_DIGIT_BITMASK];
> >   value >>= OCT_DIGIT_BITS;
> >
> > Also, the check for ptr > buf in the while loop can be removed. The
> > check is superfluous, since buf cannot possibly be exhausted by a 32
> > bit integer (the maximum octal number being 377).
>
> I integrated these suggestions in the attached -v2 patch and tested
> range of values manually.
>
> Also picked an OID > 8000 as suggested by unused_oids.

Few suggestions
1) We could use to_oct instead of to_oct32 as we don't have multiple
implementations for to_oct
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 98d90d9338..fde0b24563 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3687,6 +3687,9 @@
 { oid => '2090', descr => 'convert int8 number to hex',
   proname => 'to_hex', prorettype => 'text', proargtypes => 'int8',
   prosrc => 'to_hex64' },
+{ oid => '8335', descr => 'convert int4 number to oct',
+  proname => 'to_oct', prorettype => 'text', proargtypes => 'int4',
+  prosrc => 'to_oct32' },

2) Similarly we could change this to "to_oct"
+/*
+ * Convert an int32 to a string containing a base 8 (oct) representation of
+ * the number.
+ */
+Datum
+to_oct32(PG_FUNCTION_ARGS)
+{
+   uint32  value = (uint32) PG_GETARG_INT32(0);

3) I'm not sure if  AS "" is required, but I also noticed it
is done similarly in to_hex too:
+--
+-- test to_oct
+--
+select to_oct(256*256*256 - 1) AS "";
+ 
+--
+ 
+(1 row)

4) You could add a commit message for the patch

Regards,
Vignesh

Re: Generating code for query jumbling through gen_node_support.pl

2023-01-07 Thread Michael Paquier

On Sat, Jan 07, 2023 at 07:37:49AM +0100, Peter Eisentraut wrote:
> The generation script already has a way to identify location fields, by ($t
> eq 'int' && $f =~ 'location$'), so you could use that as well.

I recall that some of the nodes may need renames to map with this
choice.  That could be just one patch on top of the actual feature.

> I'm concerned about the large number of additional field annotations this
> adds.  We have been careful so far to document the use of each attribute,
> e.g., *why* does a field not need to be copied etc.  This patch adds dozens
> and dozens of annotations without any explanation at all.  Now, the code
> this replaces also has no documentation, but maybe this is the time to add
> some.

In most cases, the addition of the node marker would be enough to
self-explain why they are included, but there is a trend for a lot of
the nodes when it comes to collations and typmods where we don't want
to see these in the jumbling calculations.  I'll look at providing
more info for all that.  (Note: I'm out for now.)
--
Michael


signature.asc
Description: PGP signature

Re: Notify downstream to discard the streamed transaction which was aborted due to crash.

2023-01-07 Thread Amit Kapila

On Fri, Jan 6, 2023 at 11:18 AM Dilip Kumar  wrote:
>
> >
> > To fix it, One idea is to send a stream abort message when we are cleaning 
> > up
> > crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here 
> > is
> > a tiny patch which changes the same. I have confirmed that the bug is fixed 
> > and
> > all regression tests pass. I didn't add a testcase because we need to make 
> > sure
> > the crash happens before all the WAL logged transactions data are decoded 
> > which
> > doesn't seem easy to write a stable test for this.
> >
> > Thoughts ?
>
> Fix looks good to me.  Thanks for working on this.
>

Pushed!

-- 
With Regards,
Amit Kapila.

Allow DISTINCT to use Incremental Sort