buildfarm instance bichir stuck

2021-04-06 Thread Robins Tharakan
Hi,

Bichir's been stuck for the past month and is unable to run regression
tests since 6a2a70a02018d6362f9841cc2f499cc45405e86b.

It is interesting that that commit's a month old and probably no other
client has complained since, but diving in, I can see that it's been unable
to even start regression tests after that commit went in.

Note that Bichir is running on WSL1 (not WSL2) - i.e. Windows Subsystem for
Linux inside Windows 10 - and so isn't really production use-case. The only
run that actually got submitted to Buildfarm was from a few days back when
I killed it after a long wait - see [1].

Since yesterday, I have another run that's again stuck on CREATE DATABASE
(see outputs below) and although pstack not working may be a limitation of
the architecture / installation (unsure), a trace shows it is stuck at poll.

Tracing commits, it seems that the commit
6a2a70a02018d6362f9841cc2f499cc45405e86b broke things and I can confirm
that 'make check' works if I rollback to the preceding commit (
83709a0d5a46559db016c50ded1a95fd3b0d3be6 ).

Not sure if many agree but 2 things stood out here:
1) Buildfarm never got the message that a commit broke an instance. Ideally
I'd have expected buildfarm to have an optimistic timeout that could have
helped - for e.g. right now, the CREATE DATABASE is still stuck since 18
hrs.

2) bichir is clearly not a production use-case (it takes 5 hrs to complete
a HEAD run!), so let me know if this change is intentional (I guess I'll
stop maintaining it if so) but thought I'd still put this out in case
it interests someone.

-
thanks
robins

Reference:
1) Last run that I had to kill -
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir=2021-03-31%2012%3A00%3A05

#
The current run is running since yesterday.


postgres@WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$
tail -2 lastcommand.log
running on port 5678 with PID 8715
== creating database "regression" ==


postgres@WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$ date
Wed Apr  7 12:48:26 AEST 2021


postgres@WSLv1:/opt/postgres/bf/v11/buildroot/HEAD/bichir.lastrun-logs$ ls
-la
total 840
drwxrwxr-x 1 postgres postgres   4096 Apr  6 09:00 .
drwxrwxr-x 1 postgres postgres   4096 Apr  6 08:55 ..
-rw-rw-r-- 1 postgres postgres   1358 Apr  6 08:55 SCM-checkout.log
-rw-rw-r-- 1 postgres postgres  91546 Apr  6 08:56 configure.log
-rw-rw-r-- 1 postgres postgres 40 Apr  6 08:55 githead.log
-rw-rw-r-- 1 postgres postgres   2890 Apr  6 09:01 lastcommand.log
-rw-rw-r-- 1 postgres postgres 712306 Apr  6 09:00 make.log


root@WSLv1:~# pstack 8729
8729: psql -X -c CREATE DATABASE "regression" TEMPLATE=template0
LC_COLLATE='C' LC_CTYPE='C' postgres
pstack: Bad address
failed to read target.


root@WSLv1:~# gdb -batch -ex bt -p 8729
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x7f41a8ea4c84 in __GI___poll (fds=fds@entry=0x7fffe13d7be8,
nfds=nfds@entry=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29  ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
#0  0x7f41a8ea4c84 in __GI___poll (fds=fds@entry=0x7fffe13d7be8,
nfds=nfds@entry=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x7f41a9bc8eb1 in poll (__timeout=, __nfds=1,
__fds=0x7fffe13d7be8) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  pqSocketPoll (end_time=-1, forWrite=0, forRead=1, sock=)
at fe-misc.c:1133
#3  pqSocketCheck (conn=0x7fffd979a0b0, forRead=1, forWrite=0, end_time=-1)
at fe-misc.c:1075
#4  0x7f41a9bc8ff0 in pqWaitTimed (forRead=,
forWrite=, conn=0x7fffd979a0b0, finish_time=)
at fe-misc.c:1007
#5  0x7f41a9bc5ac9 in PQgetResult (conn=0x7fffd979a0b0) at
fe-exec.c:1963
#6  0x7f41a9bc5ea3 in PQexecFinish (conn=0x7fffd979a0b0) at
fe-exec.c:2306
#7  0x7f41a9bc5ef2 in PQexec (conn=,
query=query@entry=0x7fffd9799f70
"CREATE DATABASE \"regression\" TEMPLATE=template0 LC_COLLATE='C'
LC_CTYPE='C'") at fe-exec.c:2148
#8  0x7f41aa21e7a0 in SendQuery (query=0x7fffd9799f70 "CREATE DATABASE
\"regression\" TEMPLATE=template0 LC_COLLATE='C' LC_CTYPE='C'") at
common.c:1303
#9  0x7f41aa2160a6 in main (argc=, argv=)
at startup.c:369



#



Here we can see that 83709a0d5a46559db016c50ded1a95fd3b0d3be6 goes past
'CREATE DATABASE'
===
robins@WSLv1:~/proj/postgres/postgres$ git checkout
83709a0d5a46559db016c50ded1a95fd3b0d3be6
Previous HEAD position was 6a2a70a020 Use signalfd(2) for epoll latches.
HEAD is now at 83709a0d5a Use SIGURG rather than SIGUSR1 for latches.

robins@WSLv1:~/proj/postgres/postgres$ cd src/test/regress/

robins@WSLv1:~/proj/postgres/postgres/src/test/regress$ make -j4
NO_LOCALE=1 check
make -C ../../../src/backend generated-headers
rm -rf ./testtablespace
make[1]: Entering directory

RE: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread osumi.takami...@fujitsu.com
On Wednesday, April 7, 2021 7:50 AM Fujii Masao  
wrote:
> On 2021/04/07 5:01, Tom Lane wrote:
> > Fujii Masao  writes:
> >> Thanks! So I pushed the patch.
> >
> > Seems like the test case added by this commit is not working on
> > Windows.
> 
> Thanks for the report! My analysis is [1], and I pushed the proposed patch.
> 
> [1]
> https://postgr.es/m/3cc3909d-f779-7a74-c201-f1f7f62c7...@oss.nttdata.co
> m
Fujii-san, Tom,
thank you so much for your quick analysis and modification.
I appreciate those.


Best Regards,
Takamichi Osumi



Re: Any objections to implementing LogicalDecodeMessageCB for pgoutput?

2021-04-06 Thread Amit Kapila
On Mon, Apr 5, 2021 at 5:45 PM Euler Taveira  wrote:
>
> On Mon, Apr 5, 2021, at 4:06 AM, Amit Kapila wrote:
>
> I have made few minor changes in the attached. (a) Initialize the
> streaming message callback API, (b) update docs to reflect that XID
> can be sent for streaming of in-progress transactions, I see that the
> same information needs to be updated for a few other protocol message
> but we can do that as a separate patch (c) slightly tweaked the commit
> messages
>
> Good catch. I completely forgot the streaming of in progress transactions. I
> agree that the documentation for transaction should be added as a separate
> patch since the scope is beyond this feature.
>

I have pushed this work and updated the CF entry accordingly.

-- 
With Regards,
Amit Kapila.




Re: multi-install PostgresNode fails with older postgres versions

2021-04-06 Thread Mark Dilger


> On Apr 3, 2021, at 11:01 AM, Andrew Dunstan  wrote:
> 
> I've had a look at the first of these patches. I think it's generally
> ok, but:
> 
> 
> -TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
> +TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust',
> +$self->at_least_version("9.3") ? '-N' : (),
>  @{ $params{extra} });
> 
> 
> I'd rather do this in two steps to make it clearer.

Changed.

> I still think just doing pg_ctl -w unconditionally would be simpler.

Changed.

> Prior to 9.3 "unix_socket_directories" was spelled
> "unix_socket_directory". We should just set a variable appropriately and
> use it. That should make the changes around that a whole lot simpler.
> (c.f. buildfarm code)

Ah, good to know.  Changed.


Other changes:

The v1 patch supported postgres versions back to 8.4, but v2 pushes that back 
to 8.1.

The version of PostgresNode currently committed relies on IPC::Run in a way 
that is subtly wrong.  The first time IPC::Run::run(X, ...) is called, it uses 
the PATH as it exists at that time, resolves the path for X, and caches it.  
Subsequent calls to IPC::Run::run(X, ...) use the cached path, without 
respecting changes to $ENV{PATH}.  In practice, this means that:

  use PostgresNode;

  my $a = PostgresNode->get_new_node('a', install_path => '/my/install/8.4');
  my $b = PostgresNode->get_new_node('b', install_path => '/my/install/9.0');

  $a->safe_psql(...)   # <=== Resolves and caches 'psql' as 
/my/install/8.4/bin/psql

  $b->safe_psql(...)   # <=== Executes /my/install/8.4/bin/psql, not 
/my/install/9.0/bin/psql as one might expect

PostgresNode::safe_psql() and PostgresNode::psql() both suffer from this, and 
similarly PostgresNode::pg_recvlogical_upto() because the path to 
pg_recvlogical gets cached.  Calls to initdb and pg_ctl do not appear to suffer 
this problem, as they are ultimately handled by perl's system() call, not by 
IPC::Run::run.

Since postgres commands work fairly similarly from one release to another, this 
can cause subtle and hard to diagnose bugs in regression tests.  The fix in 
v2-0001 works for me, as demonstrated by v2-0002, but whether the fix in the 
attached v2 patch set gets used or not, I think something needs to be done to 
fix this.



v2-0001-Extending-PostgresNode-cross-version-functionalit.patch
Description: Binary data


v2-0002-Adding-modules-test_cross_version.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Autovacuum on partitioned table (autoanalyze)

2021-04-06 Thread Alvaro Herrera
On 2021-Apr-07, yuzuko wrote:

> I'm working on fixing the patch according to the comments.
> I'll send it as soon as I can.

Thanks, I've been giving it a look too.

> I've been thinking about traditional inheritance, I realized that we
> need additional
> handling to support them because unlike declarative partitioning,
> parents may have
> some rows in the case of traditional inheritance as Alvaro mentioned.
> So I think we should support only declarative partitioning in this
> patch for now,
> but what do you think?

Yeah, not fixable at present I think.

> I'm not sure but if we can solve this matter at low cost by using the
> shared memory stats patch, should we wait for the patch?

Let's do that for 15.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"The problem with the future is that it keeps turning into the present"
(Hobbes)




Re: TRUNCATE on foreign table

2021-04-06 Thread Kazutaka Onishi
> One minor thing - I think "mixtured" is not the
> correct word in "+-- partition table mixtured by table and foreign
> table". How about something like "+-- partitioned table with both
> local and foreign table as partitions"?

Sure. I've fixed this.

> The v15 patch basically looks good to me. I have no more comments.
Thank you for checking this many times.

> CF entry https://commitfest.postgresql.org/32/2972/ still says it's
> "waiting on author", do you want to change it to "needs review" if you
> have no open points left so that others can take a look at it?
Yes, please.

2021年4月7日(水) 10:15 Bharath Rupireddy :
>
> On Tue, Apr 6, 2021 at 10:15 PM Kazutaka Onishi  wrote:
> > I've checked v15 patch with "make check-world" and confirmed this passed.
>
> Thanks for the patch. One minor thing - I think "mixtured" is not the
> correct word in "+-- partition table mixtured by table and foreign
> table". How about something like "+-- partitioned table with both
> local and foreign table as partitions"?
>
> The v15 patch basically looks good to me. I have no more comments.
>
> CF entry https://commitfest.postgresql.org/32/2972/ still says it's
> "waiting on author", do you want to change it to "needs review" if you
> have no open points left so that others can take a look at it?
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com


pgsql14-truncate-on-foreign-table.v16.patch
Description: Binary data


Re: New IndexAM API controlling index vacuum strategies

2021-04-06 Thread Peter Geoghegan
On Tue, Apr 6, 2021 at 7:05 AM Matthias van de Meent
 wrote:
> If you have updated patches, I'll try to check them this evening (CEST).

Here is v11, which is not too different from v10 as far as the
truncation stuff goes.

Masahiko should take a look at the last patch again. I renamed the
GUCs to reflect the fact that we do everything possible to advance
relfrozenxid in the case where the fail safe mechanism kicks in -- not
just skipping index vacuuming. It also incorporates your most recent
round of feedback.

Thanks
-- 
Peter Geoghegan


v11-0001-Truncate-line-pointer-array-during-VACUUM.patch
Description: Binary data


v11-0002-Bypass-index-vacuuming-in-some-cases.patch
Description: Binary data


Re: Autovacuum on partitioned table (autoanalyze)

2021-04-06 Thread yuzuko
Hello,

Thank you for reviewing.
I'm working on fixing the patch according to the comments.
I'll send it as soon as I can.

> On 2021-04-06 16:56:49 -0400, Alvaro Herrera wrote:
> > I think there is a good reason to treat them the same: pgstat does not
> > have a provision to keep stats both of the table with children, and the
> > table without children.  It can only have one of those.  For
> > partitioning that doesn't matter: since the table-without-children
> > doesn't have anything on its own (no scans, no tuples, no nothing) then
> > we can just use the entry to store the table-with-children data.  But
> > for the inheritance case, the parent can have its own tuples and counts
> > its own scans and so on; so if we change things, we'll overwrite the
> > stats.  Maybe in the long-term we should allow pgstat to differentiate
> > those cases, but that seems not in scope for this patch.
>
> FWIW, I think it shouldn't be too hard to do that once the shared memory
> stats patch goes in (not 14, unfortunately). The hardest part will be to
> avoid exploding the number of interface functions, but I think we can
> figure out a way to deal with that.
>
I've been thinking about traditional inheritance, I realized that we
need additional
handling to support them because unlike declarative partitioning,
parents may have
some rows in the case of traditional inheritance as Alvaro mentioned.
So I think we should support only declarative partitioning in this
patch for now,
but what do you think?  I'm not sure but if we can solve this matter
at low cost by
using the shared memory stats patch, should we wait for the patch?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: shared-memory based stats collector

2021-04-06 Thread Kyotaro Horiguchi
At Tue, 6 Apr 2021 09:32:16 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2021-04-05 02:29:14 -0700, Andres Freund wrote:
..
> I'm inclined to push patches
> [PATCH v60 05/17] pgstat: split bgwriter and checkpointer messages.
> [PATCH v60 06/17] pgstat: Split out relation stats handling from 
> AtEO[Sub]Xact_PgStat() etc.
> [PATCH v60 09/17] pgstat: xact level cleanups / consolidation.
> [PATCH v60 10/17] pgstat: Split different types of stats into separate files.
> [PATCH v60 12/17] pgstat: reorder file pgstat.c / pgstat.h contents.

FWIW..

05 is a straight forward code-rearrange and reasonable to apply.

06 is same as above and it seems to make things cleaner.

09 mainly adds ensure_tabtat_xact_level() to remove repeated code
  blocks a straight-forward way. I wonder if
  pgstat_xact_stack_level_get() might better be
  pgstat_get_xact_stack_level(), but I'm fine with the name in the
  patch.

10 I found that the kind in "pgstat_kind" meant the placeholder for
  specific types.  It looks good to separate them into smaller pieces.
  It is also a simple rearrangement of code.

> pgstat.c is very long, and it's hard to find an order that makes sense
> and is likely to be maintained over time. Splitting the different

  I deeply agree to "hard to find an order that makes sense".

12 I'm not sure how it looks after this patch (I failed to apply 09 at
  my hand.), but it is also a simple rearrangement of code blocks.

> to v14. They're just moving things around, so are fairly low risk. But
> they're going to be a pain to maintain. And I think 10 and 12 make
> pgstat.c a lot easier to understand.

I think that pgstat.c doesn't get frequent back-patching.  It seems to
me that at least 10 looks good.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-04-06 Thread Japin Li


On Tue, 06 Apr 2021 at 17:56, Peter Eisentraut 
 wrote:
> On 06.04.21 07:24, Japin Li wrote:
 I think this patch is about ready to commit, but please provide a final
 version in good time.
>>> I took the liberty to address all the review comments and provide a v9
>>> patch on top of Japin's v8 patch-set.
>>>
 (Also, please combine your patches into a single patch.)
>>> Done.
>>>
>>> Attaching v9 patch, please review it.
>>>
>> Sorry for the late reply! Thanks for your updating the new patch, and it 
>> looks
>> good to me.
>
> Committed.
>
> Note that you can use syntax like "ADD|DROP|SET" in the tab completion
> code.  I have simplified some of your code like that.
>
> I also deduplicated the documentation additions a bit.

Thanks!

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: policies with security definer option for allowing inline optimization

2021-04-06 Thread Noah Misch
On Tue, Apr 06, 2021 at 01:16:16PM -0700, Dan Lynch wrote:
> Final question: do you think using procedures vs writing inline queries for
> RLS quals/checks has a big difference in performance (assuming functions
> are sql)?

If the function meets the criteria for inlining (see inline_function()),
there's negligible performance difference.  Otherwise, the performance
difference may be large.




Re: Allow batched insert during cross-partition updates

2021-04-06 Thread Amit Langote
On Tue, Apr 6, 2021 at 10:52 PM Bharath Rupireddy
 wrote:
>
> On Tue, Apr 6, 2021 at 6:37 PM Amit Langote  wrote:
> > > 3) will the cmin in the output always be the same?
> > > +SELECT cmin, * FROM batch_cp_upd_test3;
> >
> > TBH, I am not so sure.  Maybe it's not a good idea to rely on cmin
> > after all.  I've rewritten the tests to use a different method of
> > determining if a single or multiple insert commands were used in
> > moving rows into foreign partitions.
>
> Thanks! It looks good!

Thanks for checking.  I'll mark this as RfC.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Proposal: Save user's original authenticated identity for logging

2021-04-06 Thread Michael Paquier
On Tue, Apr 06, 2021 at 06:31:16PM +, Jacob Champion wrote:
> On Tue, 2021-04-06 at 14:15 +0900, Michael Paquier wrote:
> > Hmm.  You are making a good point here, but is that really the best
> > thing we can do?  We lose the context of the authentication type being
> > done with this implementation, and the client would know that it did a
> > re-authentication even if the logdetail goes only to the backend's
> > logs.  Wouldn't it be better, for instance, to generate a LOG message
> > in this code path, switch to STATUS_ERROR to let auth_failed()
> > generate the FATAL message?  set_authn_id() could just return a
> > boolean to tell if it was OK with the change in authn_id or not. 
> 
> My concern there is that we already know the code is wrong in this
> (hypothetical future) case, and then we'd be relying on that wrong code
> to correctly bubble up an error status. I think that, once you hit this
> code path, the program flow should be interrupted immediately -- do not
> pass Go, collect $200, or let the bad implementation continue to do
> more damage.

Sounds fair to me.

> I agree that losing the context is not ideal. To avoid that, I thought
> it might be nice to add errbacktrace() to the ereport() call -- but
> since the functions we're interested in are static, the backtrace
> doesn't help. (I should check to see whether libbacktrace is better in
> this situation. Later.)

Perhaps, but that does not seem strongly necessary to me either here.

> If that's a major concern, we could call auth_failed() directly from
> this code. But that means that the auth_failed() logic must not give
> them more ammunition, in this hypothetical scenario where the authn
> system is already messed up. Obscuring the failure mode helps buy
> people time to update Postgres, which definitely has value, but it
> won't prevent any actual exploit by the time we get to this check. A
> tricky trade-off.

Nah.  I don't like much a solution that involves calling auth_failed()
in more code paths than now.

>> This requires a perltidy run from what I can see, but that's no big
>> deal.
> 
> Is that done per-patch? It looks like there's a large amount of
> untidied code in src/test in general, and in the files being touched.

Committers take care of that usually, but if you can do it that
helps :)

From what I can see, most of the indent diffs are coming from the
tests added with the addition of the log_(un)like parameters.  See 
pgindent's README for all the details related to the version of
perltidy, for example.  The trick is that some previous patches may
not have been indented, causing the apparitions of extra diffs
unrelated to a patch.  Usually that's easy enough to fix on a
file-basis.

Anyway, using a FATAL in this code path is fine by me at the end, so I
have applied the patch.  Let's see now what the buildfarm thinks about
it.
--
Michael


signature.asc
Description: PGP signature


Re: TRUNCATE on foreign table

2021-04-06 Thread Bharath Rupireddy
On Tue, Apr 6, 2021 at 10:15 PM Kazutaka Onishi  wrote:
> I've checked v15 patch with "make check-world" and confirmed this passed.

Thanks for the patch. One minor thing - I think "mixtured" is not the
correct word in "+-- partition table mixtured by table and foreign
table". How about something like "+-- partitioned table with both
local and foreign table as partitions"?

The v15 patch basically looks good to me. I have no more comments.

CF entry https://commitfest.postgresql.org/32/2972/ still says it's
"waiting on author", do you want to change it to "needs review" if you
have no open points left so that others can take a look at it?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Can we remove extra memset in BloomInitPage, GinInitPage and SpGistInitPage when we have it in PageInit?

2021-04-06 Thread Bharath Rupireddy
On Wed, Apr 7, 2021 at 12:07 AM Mahendra Singh Thalor
 wrote:
> +++ b/src/backend/storage/page/bufpage.c
> @@ -51,7 +51,7 @@ PageInit(Page page, Size pageSize, Size specialSize)
>  /* Make sure all fields of page are zero, as well as unused space */
>  MemSet(p, 0, pageSize);
>
> -p->pd_flags = 0;
> +/* p->pd_flags = 0;done by above MemSet */
>
> I think, for readability we can keep old code here or we can remove
> new added comment also.

Setting p->pd_flags = 0; is unnecessary and redundant after memsetting
the page to zeros. Also, see the existing code for pd_prune_xid,
similarly I've done that for pd_flags. I think it's okay with /*
p->pd_flags = 0;done by above MemSet */.

> Apart from this, all other changes looks good to me.

Thanks for taking a look at the patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-04-06 Thread James Coleman
On Mon, Apr 5, 2021 at 11:58 PM David Rowley  wrote:
>
> On Sat, 20 Mar 2021 at 09:41, James Coleman  wrote:
> > I've attached a cleaned up patch. Last CF it was listed in is
> > https://commitfest.postgresql.org/29/2542/ -- what's the appropriate
> > step to take here given it's an already existing patch, but not yet
> > moved into recent CFs?
>
> I had a look at this patch.  I like the idea of using a simplehash.h
> hash table to hash the constant values so that repeat lookups can be
> performed much more quickly, however, I'm a bit concerned that there
> are quite a few places in the code where we often just execute a
> ScalarArrayOpExpr once and I'm a bit worried that we'll slow down
> expression evaluation of those cases.
>
> The two cases that I have in mind are:
>
> 1. eval_const_expressions() where we use the executor to evaluate the
> ScalarArrayOpExpr to see if the result is Const.
> 2. CHECK constraints with IN clauses and single-row INSERTs.

This is a good point I hadn't considered; now that you mention it, I
think another case would be expression evaluation in pl/pgsql.

> I tried to benchmark both of these but I'm struggling to get stable
> enough performance for #2, even with fsync=off.  Sometimes I'm getting
> results 2.5x slower than other runs.
>
> For benchmarking #1 I'm also not too sure I'm getting stable enough
> results for them to mean anything.
>
> I was running:
>
> create table a (a int);
>
> bench.sql: explain select * from a where a
> in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);
>
> drowley@amd3990x:~$ pgbench -n -T 60 -j 64 -c 64 -f bench.sql -P 10 postgres
> Master (6d41dd045):
> tps = 992586.991045 (without initial connection time)
> tps = 987964.990483 (without initial connection time)
> tps = 994309.670918 (without initial connection time)
>
> Master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch
> tps = 956022.557626 (without initial connection time)
> tps = 963043.352823 (without initial connection time)
> tps = 968582.600100 (without initial connection time)
>
> This puts the patched version about 3% slower. I'm not sure how much
> of that is changes in the binary and noise and how much is the
> needless hashtable build done for eval_const_expressions().
>
> I wondered if we should make it the query planner's job of deciding if
> the ScalarArrayOpExpr should be hashed or not.  I ended up with the
> attached rough-cut patch that introduces HashedScalarArrayOpExpr and
> has the query planner decide if it's going to replace
> ScalarArrayOpExpr with these HashedScalarArrayOpExpr during
> preprocess_expression().  I do think that we might want to consider
> being a bit selective about when we do these replacements.  It seems
> likely that we'd want to do this for EXPRKIND_QUAL and maybe
> EXPRKIND_TARGET, but I imagine that converting ScalarArrayOpExpr to
> HashedScalarArrayOpExpr for EXPRKIND_VALUES would be a waste of time
> since those will just be executed once.

In theory we might want to cost them differently as well, though I'm
slightly hesitant to do so at this point to avoid causing plan changes
(I'm not sure how we would balance that concern with the potential
that the best plan isn't chosen).

> I tried the same above test with the
> v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch plus the
> attached rough-cut patch and got:
>
> master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch
> + v5-0002-Rough-cut-patch-for-HashedScalarArrayOpExpr.patch
> tps = 1167969.983173 (without initial connection time)
> tps = 1199636.793314 (without initial connection time)
> tps = 1190690.939963 (without initial connection time)
>
> I can't really explain why this became faster. I was expecting it just
> to reduce that slowdown of the v4 patch a little. I don't really see
> any reason why it would become faster.  It's almost 20% faster which
> seems like too much to just be fluctuations in code alignment in the
> binary.

I'm not at a place where I can do good perf testing right now (just on
my laptop for the moment), unfortunately, so I can't confirm one way
or the other.

> The attached patch is still missing the required changes to
> llvmjit_expr.c. I think that was also missing from the original patch
> too, however.

Ah, I didn't realize that needed to be changed as well. I'll take a
look at that.

> Also, I added HashedScalarArrayOpExpr to plannodes.h.  All other Expr
> type nodes are in primnodes.h. However, I put HashedScalarArrayOpExpr
> in plannodes.h because the parser does not generate this and it's not
> going to be stored in the catalogue files anywhere. I'm not so sure
> inventing a new Expr type node that only can be generated by the
> planner is a good thing to do.

I don't know what the positives and negatives are of this.

> Anyway, wondering what you think of the idea of allowing the planner
> to choose if it's going to hash or not?

In general I think it's very reasonable. I kinda wonder if
HashedScalarArrayOpExpr 

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-04-06 Thread Andy Fan
>
> However the thing becomes complex with the below 2 cases.
>
> 1. SELECT * FROM t INNER JOIN j on t.nullable = q.b;
> We know t.b will be not null **finally**. But the current plan may
> something
> like this:
>
> QUERY PLAN
> --
>  Merge Join
>Merge Cond: (t.nullable = j.something)
>->  Sort
>  Sort Key: t.nullable
>  ->  Seq Scan on t
>->  Sort
>  Sort Key: j.something
>  ->  Seq Scan on j
> (8 rows)
>
> which means the Path "Seq Scan on t" still contains some null values. At
> least,
> we should not assume t.nullable is "not nullable" the base relation stage.
>
> 2. SELECT t.a FROM j LEFT JOIN t ON t.b = t.a;
> Even the t.a is not null by definition, but it may have null **finally**
> due to
> the outer join.
>

The above 2 cases have been addressed by defining the notnullattrs on
every RelOptInfo, and maintaining them on every join. However,  per offline
discussion with David, IIUC,  there is a more case to think about.

CREATE TABLE t (a INT, b INT);
SELECT * FROM t WHERE a = 1 and b = 2;

We know b is not null after we evaluate the qual b = 2,  but it may still
nullable when we just evaluate a = 1;

I prefer to not handle it by saying the semantics of notnullattrs is correct
after we evaluate all the quals on its RelOptInfo.



> It would be good to agree on the correct representation for Vars that
>> cannot produce NULLs so that we don't shut the door on classes of
>> optimisation that require something other than what you need for your
>> case.
>>
>>
> Looks we have to maintain not null on the general RelOptInfo level rather
> than Base
> RelOptInfo.  But I don't want to teach Var about the notnull so far.  The
> reasons are: 1).
> We need to maintain the Planner version and Parser version due to the VIEW
> case.
> 2). We have to ignore the extra part  for equal(Var, Var) . 3). Var is
> usually shared among
> different RelOptInfo. which means we have to maintain different copies for
> this purpose IIUC.
>
> I assume we want to know if a Var is nullable with a function like.
> is_var_notnullable(Var *var,  Relids relids).  If so, we can define the
> data as below:
>
> struct RelOptInfo {
>
> Bitmapset** notnullattrs;
> ..
> };
>
> After this we can implement the function as:
>

/*
 * is_var_notnullable
 *   Check if the var is nullable for a given RelOptIno after
 * all the quals on it have been evaluated.
 *
 * var is the var to check,  relids is the ids of a RelOptInfo
 * we will check on.
 */
bool
is_var_notnullable(Var* var, Relids relids)
{
  RelOptInfo *rel = find_rel_by_relids(reldis);
  return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Do you think this is a reasonable solution?


>
bool
> is_var_notnullable(Var* var, Relids relids)
> {
>   RelOptInfo *rel = find_rel_by_relids(reldis);
>   return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
> }
>
> Probably we can make some hackers to reduce the notnullattrs's memory usage
> overhead.
>
> --
Best Regards
Andy Fan (https://www.aliyun.com/)


Re: ModifyTable overheads in generic plans

2021-04-06 Thread Andres Freund
Hi,

On 2021-04-06 19:24:11 -0400, Tom Lane wrote:
> I also could not get excited about postponing initialization of RETURNING
> or WITH CHECK OPTIONS expressions.  I grant that that can be helpful
> when those features are used, but I doubt that RETURNING is used that
> heavily, and WITH CHECK OPTIONS is surely next door to nonexistent
> in performance-critical queries.

FWIW, there's a number of ORMs etc that use it on every insert (there's
not really a better way to get the serial when you also want to do
pipelining).

> npartsTPS
> 0 12152
> 108672
> 100   2753
> 1000  314
> 
> and after the two patches I just pushed, it looks like:
> 
> 0 12105
> 109928
> 100   5433
> 1000  938
> 
> So while there's certainly work left to do, that's not bad for
> some low-hanging-fruit grabbing.

Nice. 3x at the upper end is pretty good.

Greetings,

Andres Freund




Re: Autovacuum on partitioned table (autoanalyze)

2021-04-06 Thread Andres Freund
Hi,

On 2021-04-06 16:56:49 -0400, Alvaro Herrera wrote:
> I think there is a good reason to treat them the same: pgstat does not
> have a provision to keep stats both of the table with children, and the
> table without children.  It can only have one of those.  For
> partitioning that doesn't matter: since the table-without-children
> doesn't have anything on its own (no scans, no tuples, no nothing) then
> we can just use the entry to store the table-with-children data.  But
> for the inheritance case, the parent can have its own tuples and counts
> its own scans and so on; so if we change things, we'll overwrite the
> stats.  Maybe in the long-term we should allow pgstat to differentiate
> those cases, but that seems not in scope for this patch.

FWIW, I think it shouldn't be too hard to do that once the shared memory
stats patch goes in (not 14, unfortunately). The hardest part will be to
avoid exploding the number of interface functions, but I think we can
figure out a way to deal with that.

Greetings,

Andres Freund




Re: Remove page-read callback from XLogReaderState.

2021-04-06 Thread Alvaro Herrera
On 2021-Apr-07, Thomas Munro wrote:

> On Wed, Apr 7, 2021 at 11:18 AM Alvaro Herrera  
> wrote:
> > BTRW it's funny that after these patches, "xlogreader" no longer reads
> > anything.  It's more an "xlog interpreter" -- the piece of code that
> > splits individual WAL records from a stream of WAL bytes that's caller's
> > responsibility to obtain somehow.  But (and, again, I haven't read this
> > patch recently) it still offers pieces that support a reader, in
> > addition to its main interface as the interpreter.  Maybe it's not a
> > totally stupid idea to split it in even more different files.
> 
> Yeah, I like "decoder", and it's already called that in some places...

Yeah, that works ...

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Los trabajadores menos efectivos son sistematicamente llevados al lugar
donde pueden hacer el menor daño posible: gerencia."  (El principio Dilbert)




Re: Remove page-read callback from XLogReaderState.

2021-04-06 Thread Thomas Munro
On Wed, Apr 7, 2021 at 11:18 AM Alvaro Herrera  wrote:
> BTRW it's funny that after these patches, "xlogreader" no longer reads
> anything.  It's more an "xlog interpreter" -- the piece of code that
> splits individual WAL records from a stream of WAL bytes that's caller's
> responsibility to obtain somehow.  But (and, again, I haven't read this
> patch recently) it still offers pieces that support a reader, in
> addition to its main interface as the interpreter.  Maybe it's not a
> totally stupid idea to split it in even more different files.

Yeah, I like "decoder", and it's already called that in some places...




Re: ModifyTable overheads in generic plans

2021-04-06 Thread Tom Lane
Amit Langote  writes:
> On Mon, Apr 5, 2021 at 1:43 AM Tom Lane  wrote:
>> OK.  Do you want to pull out the bits of the patch that we can still
>> do without postponing BeginDirectModify?

> I ended up with the attached, whereby ExecInitResultRelation() is now
> performed for all relations before calling ExecInitNode() on the
> subplan.  As mentioned, I moved other per-result-rel initializations
> into the same loop that does ExecInitResultRelation(), while moving
> code related to some initializations into initialize-on-first-access
> accessor functions for the concerned fields.  I chose to do that for
> ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew.

I pushed the parts of this that I thought were safe and productive.

The business about moving the subplan tree initialization to after
calling FDWs' BeginForeignModify functions seems to me to be a
nonstarter.  Existing FDWs are going to expect their scan initializations
to have been done first.  I'm surprised that postgres_fdw seemed to
need only a one-line fix; it could have been far worse.  The amount of
trouble that could cause is absolutely not worth it to remove one loop
over the result relations.

I also could not get excited about postponing initialization of RETURNING
or WITH CHECK OPTIONS expressions.  I grant that that can be helpful
when those features are used, but I doubt that RETURNING is used that
heavily, and WITH CHECK OPTIONS is surely next door to nonexistent
in performance-critical queries.  If the feature isn't used, the cost
of the existing code is about zero.  So I couldn't see that it was worth
the amount of code thrashing and risk of new bugs involved.  The bit you
noted about EXPLAIN missing a subplan is pretty scary in this connection;
I'm not at all sure that that's just cosmetic.

(Having said that, I'm wondering if there are bugs in these cases for
cross-partition updates that target a previously-not-used partition.
So we might have things to fix anyway.)

Anyway, looking at the test case you posted at the very top of this
thread, I was getting this with HEAD on Friday:

nparts  TPS
0   12152
10  8672
100 2753
1000314

and after the two patches I just pushed, it looks like:

0   12105
10  9928
100 5433
1000938

So while there's certainly work left to do, that's not bad for
some low-hanging-fruit grabbing.

regards, tom lane




Re: Remove page-read callback from XLogReaderState.

2021-04-06 Thread Alvaro Herrera
On 2021-Apr-07, Thomas Munro wrote:

> I wonder if it would be better to have the client code access these
> values through functions (even if they just access the variables in a
> static inline function), to create a bit more separation?  Something
> like XLogReaderGetWanted(_lsn, _wanted), and then
> XLogReaderSetAvailable(state, 42)?  Just an idea.

I think more opacity is good in this area, generally speaking.  There
are way too many globals, and they interact in nontrivial ways across
the codebase.  Just look at the ThisTimeLineID recent disaster.  I
don't have this patch sufficiently paged-in to say that bytes_wanted/
bytes_available is precisely the thing we need, but if it makes for a
cleaner interface, I'm for it.  This module keeps some state inside
itself, and others part of the state is in its users; that's not good,
and any cleanup on that is welcome.

BTRW it's funny that after these patches, "xlogreader" no longer reads
anything.  It's more an "xlog interpreter" -- the piece of code that
splits individual WAL records from a stream of WAL bytes that's caller's
responsibility to obtain somehow.  But (and, again, I haven't read this
patch recently) it still offers pieces that support a reader, in
addition to its main interface as the interpreter.  Maybe it's not a
totally stupid idea to split it in even more different files.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Remove page-read callback from XLogReaderState.

2021-04-06 Thread Andres Freund
Hi,

On 2021-04-07 05:09:53 +1200, Thomas Munro wrote:
> From 560cdfa444a3b05a0e6b8054f3cfeadf56e059fc Mon Sep 17 00:00:00 2001
> From: Kyotaro Horiguchi 
> Date: Thu, 5 Sep 2019 20:21:55 +0900
> Subject: [PATCH v18 1/3] Move callback-call from ReadPageInternal to
>  XLogReadRecord.
> 
> The current WAL record reader reads page data using a call back
> function.  Redesign the interface so that it asks the caller for more
> data when required.  This model works better for proposed projects that
> encryption, prefetching and other new features that would require
> extending the callback interface for each case.
> 
> As the first step of that change, this patch moves the page reader
> function out of ReadPageInternal(), then the remaining tasks of the
> function are taken over by the new function XLogNeedData().

> -static int
> +static bool
>  XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int 
> reqLen,
>XLogRecPtr targetRecPtr, char *readBuf)
>  {
> @@ -12170,7 +12169,8 @@ retry:
>   readLen = 0;
>   readSource = XLOG_FROM_ANY;
>  
> - return -1;
> + xlogreader->readLen = -1;
> + return false;
>   }
>   }

It seems a bit weird to assign to XlogReaderState->readLen inside the
callbacks. I first thought it was just a transient state, but it's
not. I think it'd be good to wrap the xlogreader->readLen assignment an
an inline function. That we can add more asserts etc over time.



> -/* pg_waldump's XLogReaderRoutine->page_read callback */
> +/*
> + * pg_waldump's WAL page rader, also used as page_read callback for
> + * XLogFindNextRecord
> + */
>  static bool
> -WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
> - XLogRecPtr targetPtr, char *readBuff)
> +WALDumpReadPage(XLogReaderState *state, void *priv)
>  {
> - XLogDumpPrivate *private = state->private_data;
> + XLogRecPtr  targetPagePtr = state->readPagePtr;
> + int reqLen= state->readLen;
> + char   *readBuff  = state->readBuf;
> + XLogDumpPrivate *private  = (XLogDumpPrivate *) priv;

It seems weird to pass a void *priv to a function that now doesn't at
all need the type punning anymore.

Greetings,

Andres Freund




Re: Remove page-read callback from XLogReaderState.

2021-04-06 Thread Thomas Munro
On Wed, Apr 7, 2021 at 5:09 AM Thomas Munro  wrote:
> 0001 + 0002 get rid of the callback interface and replace it with a
> state machine, making it the client's problem to supply data when it
> returns XLREAD_NEED_DATA.  I found this interface nicer to work with,
> for my WAL decoding buffer patch (CF 2410), and I understand that the
> encryption patch set can also benefit from it.  Certainly when I
> rebased my project on this patch set, I prefered the result.

+state->readLen = pageHeaderSize;

This variable is used for the XLogPageReader to say how much data it
wants, but also for the caller to indicate how much data is loaded.
Wouldn't it be better to split this into two variables: bytesWanted
and bytesAvailable?  (I admit that I spent a whole afternoon debugging
after confusing myself about that, when rebasing my WAL readahead
patch recently).

I wonder if it would be better to have the client code access these
values through functions (even if they just access the variables in a
static inline function), to create a bit more separation?  Something
like XLogReaderGetWanted(_lsn, _wanted), and then
XLogReaderSetAvailable(state, 42)?  Just an idea.




Re: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread Fujii Masao




On 2021/04/07 5:01, Tom Lane wrote:

Fujii Masao  writes:

Thanks! So I pushed the patch.


Seems like the test case added by this commit is not
working on Windows.


Thanks for the report! My analysis is [1], and I pushed the proposed patch.

[1]
https://postgr.es/m/3cc3909d-f779-7a74-c201-f1f7f62c7...@oss.nttdata.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread Fujii Masao




On 2021/04/07 3:03, Fujii Masao wrote:



On 2021/04/06 23:01, Fujii Masao wrote:



On 2021/04/06 20:44, David Steele wrote:

On 4/6/21 7:13 AM, Fujii Masao wrote:


On 2021/04/06 15:59, osumi.takami...@fujitsu.com wrote:

I just wanted to write why the error was introduced,
but it was not necessary.
We should remove and fix the first part of the sentence.


So the consensus is almost the same as the latest patch?
If they are not so different, I'm thinking to push the latest version atfirst.
Then we can improve the docs if required.


+1


Thanks! So I pushed the patch.


The buildfarm members "drongo" and "fairywren" reported that the regression test 
(024_archive_recovery.pl) commit 9de9294b0c added failed with the following error. ISTM the cause of this failure 
is that the test calls $node->init() without "allows_streaming => 1" and which doesn't add 
pg_hba.conf entry for TCP/IP connection from pg_basebackup. So I think that the attached patch needs to be applied.


Pushed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: policies with security definer option for allowing inline optimization

2021-04-06 Thread Dan Lynch
>
>
> > I suppose if the possibility exists that this could happen, perhaps using
> > RLS for selects is not quite "production ready"?
>
> I would not draw that conclusion.
>
>
This is great to hear! I'm betting a lot on RLS and have been investing a
lot into it.


> > Or perhaps if the RLS
> > qual/check is written well-enough, then maybe the performance hit
> wouldn't
> > be noticed?
>
> Yes.
>

Amazing to hear. Sounds like the path I'm on is good to go and will only
improve over time :)

Final question: do you think using procedures vs writing inline queries for
RLS quals/checks has a big difference in performance (assuming functions
are sql)?

Appreciate your info here!


Re: pgbench - add pseudo-random permutation function

2021-04-06 Thread Fabien COELHO



Hello Dean,


Pushed.

I decided not to go with the "joke" randu64() function, but instead
used getrand() directly. This at least removes any *direct* use of
doubles in permute() (though of course they're still used indirectly),
and means that if getrand() is improved in the future, permute() will
benefit too.


Good idea, at least it is hidden and in one place.

Thanks for the push!

--
Fabien.




Re: psql - add SHOW_ALL_RESULTS option

2021-04-06 Thread Fabien COELHO


Hello Peter,

My 0.02€: I tested this updated version and do not have any comment on this 
version. From my point of view it could be committed. I would not bother to 
separate the test style ajustments.


Committed.  The last thing I fixed was the diff in the copy2.out regression 
test.  The order of the notices with respect to the error messages was wrong. 
I fixed that by switching back to the regular notice processor during COPY 
handling.


Btw., not sure if that was mentioned before, but a cool use of this is to 
\watch multiple queries at once, like


select current_date \; select current_time \watch


Indeed, that was one of the things I tested on the patch. I'm wondering 
whether the documentation should point this out explicitely.


Anyway Thanks for the push!

--
Fabien.

Re: Autovacuum on partitioned table (autoanalyze)

2021-04-06 Thread Alvaro Herrera
On 2021-Apr-04, Tomas Vondra wrote:

> 1) I still don't understand why inheritance and declarative partitioning
> are treated differently. Seems unnecessary nad surprising, but maybe
> there's a good reason?

I think there is a good reason to treat them the same: pgstat does not
have a provision to keep stats both of the table with children, and the
table without children.  It can only have one of those.  For
partitioning that doesn't matter: since the table-without-children
doesn't have anything on its own (no scans, no tuples, no nothing) then
we can just use the entry to store the table-with-children data.  But
for the inheritance case, the parent can have its own tuples and counts
its own scans and so on; so if we change things, we'll overwrite the
stats.  Maybe in the long-term we should allow pgstat to differentiate
those cases, but that seems not in scope for this patch.

I'm working on the code to fix the other issues.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Parallel Full Hash Join

2021-04-06 Thread Zhihong Yu
On Tue, Apr 6, 2021 at 11:59 AM Melanie Plageman 
wrote:

> On Fri, Apr 2, 2021 at 3:06 PM Zhihong Yu  wrote:
> >
> > Hi,
> > For v6-0003-Parallel-Hash-Full-Right-Outer-Join.patch
> >
> > +* current_chunk_idx: index in current HashMemoryChunk
> >
> > The above comment seems to be better fit for
> ExecScanHashTableForUnmatched(), instead of
> ExecParallelPrepHashTableForUnmatched.
> > I wonder where current_chunk_idx should belong (considering the above
> comment and what the code does).
> >
> > +   while (hashtable->current_chunk_idx <
> hashtable->current_chunk->used)
> > ...
> > +   next = hashtable->current_chunk->next.unshared;
> > +   hashtable->current_chunk = next;
> > +   hashtable->current_chunk_idx = 0;
> >
> > Each time we advance to the next chunk, current_chunk_idx is reset. It
> seems current_chunk_idx can be placed inside chunk.
> > Maybe the consideration is that, with the current formation we save
> space by putting current_chunk_idx field at a higher level.
> > If that is the case, a comment should be added.
> >
>
> Thank you for the review. I think that moving the current_chunk_idx into
> the HashMemoryChunk would probably take up too much space.
>
> Other places that we loop through the tuples in the chunk, we are able
> to just keep a local idx, like here in
> ExecParallelHashIncreaseNumBuckets():
>
> case PHJ_GROW_BUCKETS_REINSERTING:
> ...
> while ((chunk = ExecParallelHashPopChunkQueue(hashtable,
> _s)))
> {
> size_tidx = 0;
>
> while (idx < chunk->used)
>
> but, since we cannot do that while also emitting tuples, I thought,
> let's just stash the index in the hashtable for use in serial hash join
> and the batch accessor for parallel hash join. A comment to this effect
> sounds good to me.
>

>From the way HashJoinTable is used, I don't have better idea w.r.t. the
location of current_chunk_idx.
It is not worth introducing another level of mapping between HashJoinTable
and the chunk index.

So the current formation is fine with additional comment
in ParallelHashJoinBatchAccessor (current comment doesn't explicitly
mention current_chunk_idx).

Cheers


Re: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread Tom Lane
Fujii Masao  writes:
> Thanks! So I pushed the patch.

Seems like the test case added by this commit is not
working on Windows.

regards, tom lane




Re: PostgreSQL log query's result size

2021-04-06 Thread Justin Pryzby
On Tue, Apr 06, 2021 at 02:03:03PM -0500, Hellmuth Vargas wrote:
> Could you tell me if it is possible that as well as the configuration that
> the log presents the duration of the delayed queries, it can also present
> the size of the result data? especially those who want to return a lot of
> information

I think you can get what you want by with auto_explain.
https://www.postgresql.org/docs/current/auto-explain.html

You can set:
auto_explain.log_analyze

And then the "width" and "rows" are logged:
 Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.004 rows=1 
loops=1)

PS, you should first ask on the pgsql-general list, rather than this
development list.

-- 
Justin




PostgreSQL log query's result size

2021-04-06 Thread Hellmuth Vargas
Hello

Excuse me in advance for my English, I'm improving :-)

Could you tell me if it is possible that as well as the configuration that
the log presents the duration of the delayed queries, it can also present
the size of the result data? especially those who want to return a lot of
information


-- 
Cordialmente,

Ing. Hellmuth I. Vargas S.


Re: Parallel Full Hash Join

2021-04-06 Thread Melanie Plageman
On Fri, Apr 2, 2021 at 3:06 PM Zhihong Yu  wrote:
>
> Hi,
> For v6-0003-Parallel-Hash-Full-Right-Outer-Join.patch
>
> +* current_chunk_idx: index in current HashMemoryChunk
>
> The above comment seems to be better fit for ExecScanHashTableForUnmatched(), 
> instead of ExecParallelPrepHashTableForUnmatched.
> I wonder where current_chunk_idx should belong (considering the above comment 
> and what the code does).
>
> +   while (hashtable->current_chunk_idx < hashtable->current_chunk->used)
> ...
> +   next = hashtable->current_chunk->next.unshared;
> +   hashtable->current_chunk = next;
> +   hashtable->current_chunk_idx = 0;
>
> Each time we advance to the next chunk, current_chunk_idx is reset. It seems 
> current_chunk_idx can be placed inside chunk.
> Maybe the consideration is that, with the current formation we save space by 
> putting current_chunk_idx field at a higher level.
> If that is the case, a comment should be added.
>

Thank you for the review. I think that moving the current_chunk_idx into
the HashMemoryChunk would probably take up too much space.

Other places that we loop through the tuples in the chunk, we are able
to just keep a local idx, like here in
ExecParallelHashIncreaseNumBuckets():

case PHJ_GROW_BUCKETS_REINSERTING:
...
while ((chunk = ExecParallelHashPopChunkQueue(hashtable, _s)))
{
size_tidx = 0;

while (idx < chunk->used)

but, since we cannot do that while also emitting tuples, I thought,
let's just stash the index in the hashtable for use in serial hash join
and the batch accessor for parallel hash join. A comment to this effect
sounds good to me.




Re: Can we remove extra memset in BloomInitPage, GinInitPage and SpGistInitPage when we have it in PageInit?

2021-04-06 Thread Mahendra Singh Thalor
On Tue, 6 Apr 2021 at 19:14, Bharath Rupireddy
 wrote:
>
> On Tue, Apr 6, 2021 at 6:09 PM Michael Paquier  wrote:
> >
> > On Mon, Mar 22, 2021 at 10:58:17AM +0530, Mahendra Singh Thalor wrote:
> > > Your changes look to fine me and I am also not getting any failure. I
> > > think we should back-patch all the branches.
> > >
> > > Patch is applying to all the branches(till v95) and there is no failure.
> >
> > Er, no.  This is just some duplicated code with no extra effect.  I
> > have no objection to simplify a bit the whole on readability and
> > consistency grounds (will do so tomorrow), including the removal of
> > the commented-out memset call in gistinitpage, but this is not
> > something that should be backpatched.
>
> +1 to not backport this patch because it's not a bug or not even a
> critical issue. Having said that removal of these unnecessary memsets
> would not only be better for readability and consistency but also can
> reduce few extra function call costs(although minimal) while adding
> new index pages.
>
> Please find the v3 patch that removed the commented-out memset call in
> gistinitpage.

Thanks Bharath for updated patch.

+++ b/src/backend/storage/page/bufpage.c
@@ -51,7 +51,7 @@ PageInit(Page page, Size pageSize, Size specialSize)
 /* Make sure all fields of page are zero, as well as unused space */
 MemSet(p, 0, pageSize);

-p->pd_flags = 0;
+/* p->pd_flags = 0;done by above MemSet */

I think, for readability we can keep old code here or we can remove
new added comment also.

Apart from this, all other changes looks good to me.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: Proposal: Save user's original authenticated identity for logging

2021-04-06 Thread Jacob Champion
On Tue, 2021-04-06 at 14:15 +0900, Michael Paquier wrote:
> On Mon, Apr 05, 2021 at 04:40:41PM +, Jacob Champion wrote:
> > This loses the test fixes I made in my v19 [1]; some of the tests on
> > HEAD aren't testing anything anymore. I've put those fixups into 0001,
> > attached.
> 
> Argh, Thanks.  The part about not checking after the error output when
> the connection should pass is wanted to be more consistent with the
> other test suites.  So I have removed this part and applied the rest
> of 0001.

I assumed Tom added those checks to catch a particular failure mode for
the GSS encryption case. (I guess Tom would know for sure.)

> > An assertion seems like the wrong way to go; in the event that a future
> > code path accidentally performs a duplicated authentication, the FATAL
> > will just kill off an attacker's connection, while an assertion will
> > DoS the server.
> 
> Hmm.  You are making a good point here, but is that really the best
> thing we can do?  We lose the context of the authentication type being
> done with this implementation, and the client would know that it did a
> re-authentication even if the logdetail goes only to the backend's
> logs.  Wouldn't it be better, for instance, to generate a LOG message
> in this code path, switch to STATUS_ERROR to let auth_failed()
> generate the FATAL message?  set_authn_id() could just return a
> boolean to tell if it was OK with the change in authn_id or not. 

My concern there is that we already know the code is wrong in this
(hypothetical future) case, and then we'd be relying on that wrong code
to correctly bubble up an error status. I think that, once you hit this
code path, the program flow should be interrupted immediately -- do not
pass Go, collect $200, or let the bad implementation continue to do
more damage.

I agree that losing the context is not ideal. To avoid that, I thought
it might be nice to add errbacktrace() to the ereport() call -- but
since the functions we're interested in are static, the backtrace
doesn't help. (I should check to see whether libbacktrace is better in
this situation. Later.)

As for the client knowing: an active attacker is probably going to know
that they're triggering the reauthentication anyway. So the primary
disadvantage I see is that a more passive attacker could scan for some
vulnerability by looking for that error message.

If that's a major concern, we could call auth_failed() directly from
this code. But that means that the auth_failed() logic must not give
them more ammunition, in this hypothetical scenario where the authn
system is already messed up. Obscuring the failure mode helps buy
people time to update Postgres, which definitely has value, but it
won't prevent any actual exploit by the time we get to this check. A
tricky trade-off.

> > v21 attached, which is just a rebase of my original v19.
> 
> This requires a perltidy run from what I can see, but that's no big
> deal.

Is that done per-patch? It looks like there's a large amount of
untidied code in src/test in general, and in the files being touched.

> +   my (@log_like, @log_unlike);
> +   if (defined($params{log_like}))
> +   {
> +   @log_like = @{ delete $params{log_like} };
> +   }
> +   if (defined($params{log_unlike}))
> +   {
> +   @log_unlike = @{ delete $params{log_unlike} };
> +   }
> There is no need for that?  This removal was done as %params was
> passed down directly as-is to PostgresNode::psql, but that's not the
> case anymore.

Fixed in v22, thanks.

--Jacob
From 731112be1a29e873b96f982311c7bc35b6d864ec Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Thu, 1 Apr 2021 16:01:27 -0700
Subject: [PATCH v22] Log authenticated identity from all auth backends

The "authenticated identity" is the string used by an auth method to
identify a particular user. In many common cases this is the same as the
Postgres username, but for some third-party auth methods, the identifier
in use may be shortened or otherwise translated (e.g. through pg_ident
mappings) before the server stores it.

To help DBAs see who has actually interacted with the system, store the
original identity when authentication succeeds, and expose it via the
log_connections setting. The log entries look something like this
example (where a local user named "pchampion" is connecting to the
database as the "admin" user):

LOG:  connection received: host=[local]
LOG:  connection authenticated: identity="pchampion" method=peer (/data/pg_hba.conf:88)
LOG:  connection authorized: user=admin database=postgres application_name=psql

port->authn_id is set according to the auth method:

bsd: the Postgres username (which is the local username)
cert: the client's Subject DN
gss: the user principal
ident: the remote username
ldap: the final bind DN
pam: the Postgres username (which is the PAM username)
password (and all pw-challenge methods): the Postgres username
peer: the peer's pw_name
radius: the Postgres 

Re: GSoC 2021 - Student looking for a mentor - Magzum Assanbayev

2021-04-06 Thread Ilaria
Hello Magzum,

Thank you for the email! I am really glad you are interested in working with 
Postgres. 

Have you tried looking at the project ideas? You can find them here: 
https://wiki.postgresql.org/wiki/GSoC_2021

If you have any preference, you are more than welcome to ask questions and 
clarifications in this mailing list. On the other side, if you have any 
specific idea about a project you could do which isn’t listed there, we can 
discuss it. However, it is a bit late to come up with projects now, since the 
deadline for applications is approaching, so I would recommend you to try one 
of the proposed ones. 

Ilaria

> Am 06.04.2021 um 19:59 schrieb Magzum Assanbayev 
> :
> 
> 
> Dear Sirs,
> 
> My name is Magzum Assanbayev, I am a Master Student at KIMEP University in 
> Kazakhstan, expected to graduate in Spring 2022.
> 
> Having made some research into your organization I have deduced that my 
> current skill set might be suitable to your needs.
> 
> Out of what I can offer, I have been practicing data analytics for 2.5 years 
> at PwC Competency Centre in Audit and Assurance department dealing with audit 
> automation with an Alteryx data analytics software. The software allows 
> seamless big data manipulation and output, and has its own community sharing 
> the ideas among others, please see link: 
> https://community.alteryx.com/?category.id=external
> 
> As a track record, I can state that the workflows developed under my 
> supervision have cut significant hours of repetitive work for audit teams, 
> ranging from 5-20% per audit engagement with positive user feedback. The 
> range of work done varies from automating mathematical accuracy check of 
> consolidation reports to disclosure recompilation as well as journal entry 
> analysis based on a predefined audit criteria. The workflows developed used 
> Alteryx macros and RegEx to solve the problems in case.
> 
> The software is popular among largest corporate brands in the world which 
> proves its value for cost and actuality in a competitive market of data 
> analytics software. The evident users of the software are Big 4 audit 
> companies, Google itself, Coca-Cola, Deutsche Bank, etc.
> 
> In addition, I have an entry-level acquaintance with Python and Excel VBA, 
> having completed 'Crash Course on Python' and 'Excel/VBA for Creative Problem 
> Solving, Part 1' courses on Coursera (see certificates attached). 
> 
> Please note that both courses were completed during the busy season under 
> heavy audit workload being full-time employed at PwC. My eagerness and 
> ability to learn is backed up by my Bachelor GPA of 4.31/4.33 at KIMEP, where 
> I studied Finance and Accounting. I was also awarded scholarships and 
> stipends for academic achievement for several years.
> 
> If this letter has caught your eye and made you interested, I am happy to 
> brainstorm any potential projects that we can do together during the upcoming 
> summer!
> 
> Please let me know by replying to this email.
> 
> Thank you!
> 
>  certificate.pdf>


Re: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread Fujii Masao



On 2021/04/06 23:01, Fujii Masao wrote:



On 2021/04/06 20:44, David Steele wrote:

On 4/6/21 7:13 AM, Fujii Masao wrote:


On 2021/04/06 15:59, osumi.takami...@fujitsu.com wrote:

I just wanted to write why the error was introduced,
but it was not necessary.
We should remove and fix the first part of the sentence.


So the consensus is almost the same as the latest patch?
If they are not so different, I'm thinking to push the latest version atfirst.
Then we can improve the docs if required.


+1


Thanks! So I pushed the patch.


The buildfarm members "drongo" and "fairywren" reported that the regression test 
(024_archive_recovery.pl) commit 9de9294b0c added failed with the following error. ISTM the cause of this failure 
is that the test calls $node->init() without "allows_streaming => 1" and which doesn't add 
pg_hba.conf entry for TCP/IP connection from pg_basebackup. So I think that the attached patch needs to be applied.

pg_basebackup: error: connection to server at "127.0.0.1", port 52316 failed: FATAL:  no 
pg_hba.conf entry for replication connection from host "127.0.0.1", user "pgrunner", no 
encryption
Bail out!  system pg_basebackup failed

I guess that only "drongo" and "fairywren" reported this issue because they are 
running on Windows and pg_basebackup uses TCP/IP connection. OTOH I guess that other buildfarm 
members running on non-Windows use Unix-domain for pg_basebackup and pg_hba.conf entry for that 
exists by default. So ISTM they reported no such issue.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/test/recovery/t/024_archive_recovery.pl 
b/src/test/recovery/t/024_archive_recovery.pl
index c139db2ede..2d8d59454e 100644
--- a/src/test/recovery/t/024_archive_recovery.pl
+++ b/src/test/recovery/t/024_archive_recovery.pl
@@ -9,7 +9,7 @@ use Time::HiRes qw(usleep);
 # Initialize and start node with wal_level = replica and WAL archiving
 # enabled.
 my $node = get_new_node('orig');
-$node->init(has_archiving => 1);
+$node->init(has_archiving => 1, allows_streaming => 1);
 my $replica_config = q[
 wal_level = replica
 archive_mode = on


Re: [GSoC 2021 Proposal] Develop Performance Farm Benchmarks and Website

2021-04-06 Thread Ilaria
Just a quick heads up - this is being followed up by myself in private messages 
:) 

If anyone has any other inputs for the proposal, feel free to share!

Ilaria

> Am 06.04.2021 um 19:59 schrieb YoungHwan Joo :
> 
> 
> Hello, team PostgreSQL!
> 
> I am YoungHwan Joo, a student who is interested in GSoC 2021.
> I was drawn to the project "Develop Performance Farm Benchmarks and Website".
> 
> I send you the first draft of my proposal as an attachment.
> Please give me any feedback or review.
> 
> I am in the Postgres Slack channel #gsoc2021-students.
> 
> Thank you.
> 
> Regards,
> YoungHwan
> 
>  Website - YoungHwan Joo.pdf>




Re: Minimal logical decoding on standbys

2021-04-06 Thread Andres Freund
Hi,

On 2021-04-06 14:30:29 +0200, Drouvot, Bertrand wrote:
> From 827295f74aff9c627ee722f541a6c7cc6d4133cf Mon Sep 17 00:00:00 2001
> From: bdrouvotAWS 
> Date: Tue, 6 Apr 2021 11:59:23 +
> Subject: [PATCH v15 1/5] Allow logical decoding on standby.
> 
> Allow a logical slot to be created on standby. Restrict its usage
> or its creation if wal_level on primary is less than logical.
> During slot creation, it's restart_lsn is set to the last replayed
> LSN. Effectively, a logical slot creation on standby waits for an
> xl_running_xact record to arrive from primary. Conflicting slots
> would be handled in next commits.
>
> Andres Freund and Amit Khandekar.

I think more people have worked on this by now...

Does this strike you as an accurate description?

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas

> --- a/src/backend/replication/logical/logical.c
> +++ b/src/backend/replication/logical/logical.c
> @@ -119,23 +119,22 @@ CheckLogicalDecodingRequirements(void)
>   
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>errmsg("logical decoding requires a database 
> connection")));
>  
> - /* 
> -  * TODO: We got to change that someday soon...
> -  *
> -  * There's basically three things missing to allow this:
> -  * 1) We need to be able to correctly and quickly identify the timeline 
> a
> -  *LSN belongs to
> -  * 2) We need to force hot_standby_feedback to be enabled at all times 
> so
> -  *the primary cannot remove rows we need.
> -  * 3) support dropping replication slots referring to a database, in
> -  *dbase_redo. There can't be any active ones due to HS recovery
> -  *conflicts, so that should be relatively easy.
> -  * 
> -  */
>   if (RecoveryInProgress())
> - ereport(ERROR,
> - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -  errmsg("logical decoding cannot be used while 
> in recovery")));

Maybe I am just missing something right now, and maybe I'm being a bit
overly pedantic, but I don't immediately see how 0001 is correct without
0002 and 0003? I think it'd be better to first introduce the conflict
information, then check for conflicts, and only after that allow
decoding on standbys?


> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index 6f8810e149..6a21cba362 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -5080,6 +5080,17 @@ LocalProcessControlFile(bool reset)
>   ReadControlFile();
>  }
>  
> +/*
> + * Get the wal_level from the control file. For a standby, this value should 
> be
> + * considered as its active wal_level, because it may be different from what
> + * was originally configured on standby.
> + */
> +WalLevel
> +GetActiveWalLevel(void)
> +{
> + return ControlFile->wal_level;
> +}
> +

This strikes me as error-prone - there's nothing in the function name
that this should mainly (only?) be used during recovery...


> + if (SlotIsPhysical(slot))
> + restart_lsn = GetRedoRecPtr();
> + else if (RecoveryInProgress())
> + {
> + restart_lsn = GetXLogReplayRecPtr(NULL);
> + /*
> +  * Replay pointer may point one past the end of the 
> record. If that
> +  * is a XLOG page boundary, it will not be a valid LSN 
> for the
> +  * start of a record, so bump it up past the page 
> header.
> +  */
> + if (!XRecOffIsValid(restart_lsn))
> + {
> + if (restart_lsn % XLOG_BLCKSZ != 0)
> + elog(ERROR, "invalid replay pointer");
> +
> + /* For the first page of a segment file, it's a 
> long header */
> + if (XLogSegmentOffset(restart_lsn, 
> wal_segment_size) == 0)
> + restart_lsn += SizeOfXLogLongPHD;
> + else
> + restart_lsn += SizeOfXLogShortPHD;
> + }
> + }

This seems like a layering violation to me. I don't think stuff like
this should be outside of xlog[reader].c, and definitely not in
ReplicationSlotReserveWal().

Relevant discussion (which totally escaped my mind):
https://postgr.es/m/CAJ3gD9csOr0LoYoMK9NnfBk0RZmvHXcJAFWFd2EuL%3DNOfz7PVA%40mail.gmail.com


> + else
> + restart_lsn = GetXLogInsertRecPtr();
> +
> + SpinLockAcquire(>mutex);
> + slot->data.restart_lsn = restart_lsn;
> + SpinLockRelease(>mutex);
> +
>   if (!RecoveryInProgress() 

Re: Force lookahead in COPY FROM parsing

2021-04-06 Thread Heikki Linnakangas

On 02/04/2021 20:21, John Naylor wrote:
I have nothing further so it's RFC. The patch is pretty simple compared 
to the earlier ones, but is worth running the fuzzer again as added 
insurance?


Good idea. I did that, and indeed it revealed bugs. If the client sent 
just a single byte in one CopyData message, we only loaded that one byte 
into the buffer, instead of the full 4 bytes needed for lookahead. 
Attached is a new version that fixes that.


Unfortunately, that's not the end of it. Consider the byte sequence 
"\." appearing at the end of the input. We 
should detect the end-of-copy marker \. and stop reading without 
complaining about the garbage after the end-of-copy marker. That doesn't 
work if we force 4 bytes of lookahead; the invalid byte sequence fits in 
the lookahead window, so we will try to convert it.


I'm sure that can be fixed, for example by adding special handling for 
the last few bytes of the input. But it needs some more thinking, this 
patch isn't quite ready to be committed yet :-(.


- Heikki
>From e011702cdd2b6ce86687870db06b88fb8daf5d62 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 6 Apr 2021 20:48:19 +0300
Subject: [PATCH v4 1/1] Simplify COPY FROM parsing by forcing lookahead.

Now that we don't support the old-style COPY protocol anymore, we can
force four bytes of lookahead like was suggested in an existing comment,
and simplify the loop in CopyReadLineText().

FIXME: This fails with sequence "\.". We
should detect the end-of-copy marker \. and stop reading without
complaining about the garbage after the end-of-copy marker. That
doesn't work if we force 4 bytes of lookahead; the invalid byte
sequence fits in the lookahead window, so we will try to convert it.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/89627a2a-c123-a8aa-267e-5d168feb73dd%40iki.fi
---
 src/backend/commands/copyfromparse.c | 112 +--
 1 file changed, 36 insertions(+), 76 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0813424768..6b140531ed 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -90,21 +90,6 @@
  * empty statements.  See http://www.cit.gu.edu.au/~anthony/info/C/C.macros.
  */
 
-/*
- * This keeps the character read at the top of the loop in the buffer
- * even if there is more than one read-ahead.
- */
-#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \
-if (1) \
-{ \
-	if (input_buf_ptr + (extralen) >= copy_buf_len && !hit_eof) \
-	{ \
-		input_buf_ptr = prev_raw_ptr; /* undo fetch */ \
-		need_data = true; \
-		continue; \
-	} \
-} else ((void) 0)
-
 /* This consumes the remainder of the buffer and breaks */
 #define IF_NEED_REFILL_AND_EOF_BREAK(extralen) \
 if (1) \
@@ -159,7 +144,7 @@ static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
-		int minread, int maxread);
+		int maxread);
 static inline bool CopyGetInt32(CopyFromState cstate, int32 *val);
 static inline bool CopyGetInt16(CopyFromState cstate, int16 *val);
 static void CopyLoadInputBuf(CopyFromState cstate);
@@ -230,9 +215,8 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
 /*
  * CopyGetData reads data from the source (file or frontend)
  *
- * We attempt to read at least minread, and at most maxread, bytes from
- * the source.  The actual number of bytes read is returned; if this is
- * less than minread, EOF was detected.
+ * We attempt to read at most maxread bytes from the source.  The actual
+ * number of bytes read is returned; if this is 0, EOF was detected.
  *
  * Note: when copying from the frontend, we expect a proper EOF mark per
  * protocol; if the frontend simply drops the connection, we raise error.
@@ -241,7 +225,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
  * NB: no data conversion is applied here.
  */
 static int
-CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
+CopyGetData(CopyFromState cstate, void *databuf, int maxread)
 {
 	int			bytesread = 0;
 
@@ -257,7 +241,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 cstate->raw_reached_eof = true;
 			break;
 		case COPY_FRONTEND:
-			while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
+			while (maxread > 0 && bytesread == 0 && !cstate->raw_reached_eof)
 			{
 int			avail;
 
@@ -321,7 +305,9 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 			}
 			break;
 		case COPY_CALLBACK:
-			bytesread = cstate->data_source_cb(databuf, minread, maxread);
+			bytesread = cstate->data_source_cb(databuf, 1, maxread);
+			if (bytesread == 0)
+cstate->raw_reached_eof = true;
 			break;
 	}
 
@@ -605,7 +591,7 @@ CopyLoadRawBuf(CopyFromState cstate)
 
 	/* Load more data */
 	inbytes = CopyGetData(cstate, cstate->raw_buf + 

Re: GSoc Applicant

2021-04-06 Thread Mark Wong
Hello,

On Mon, Apr 05, 2021 at 11:46:36PM +0200, Mohamed Mansour wrote:
> Greetings,
> 
> I'm Mohamed Mansour, a Data Engineer at IBM and a Master's degree student
> in the Computer Engineering Department - Faculty of Engineering - Cairo
> University.
> 
> I would like to apply to google summer of code to work on the following
> project:
> 
> Database Load Stress Benchmark
> 
> Kindly find my attached CV and tell me if there is a place for me related
> to this project or if you see another project that fits me better, then I
> will build the proposal as soon as possible

I don't see anything in your CV that suggest you couldn't be successful
in this project, but we'd like you to put together a proposal for the
projects are you interested in.

If this is the only project that is most interesting to you, then please
go ahead and submit a draft for the mentors to review and we will offer
our comments/suggestios on your proposal.

Regards,
Mark




Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)

2021-04-06 Thread Fujii Masao




On 2021/04/06 22:02, Michael Paquier wrote:

On Tue, Apr 06, 2021 at 02:39:54PM +0200, Matthias van de Meent wrote:

On Tue, 6 Apr 2021 at 14:29, Fujii Masao  wrote:

Attached is the updated version of the patch. Barring any objection,
I'm thinking to commit this.


Sorry for the late reply.  The approach to use LIMIT TO for this
purpose looks sensible from here, and I agree that it can have its
uses.  So what you have here LGTM.


Pushed. Thanks all!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove page-read callback from XLogReaderState.

2021-04-06 Thread Thomas Munro
On Wed, Mar 31, 2021 at 7:17 PM Kyotaro Horiguchi
 wrote:
> At Wed, 31 Mar 2021 10:00:02 +1300, Thomas Munro  
> wrote in
> > On Thu, Mar 4, 2021 at 3:29 PM Kyotaro Horiguchi
> >  wrote:
> > > A recent commot about LSN_FORMAT_ARGS conflicted this.
> > > Just rebased.
> >
> > FYI I've been looking at this, and I think it's a very nice
> > improvement.  I'll post some review comments and a rebase shortly.
>
> Thanks for looking at this!

I rebased and pgindent-ed the first three patches and did some
testing.  I think it looks pretty good, though I still need to check
the code coverage when running the recovery tests.  There are three
compiler warnings from GCC when not using --enable-cassert, including
uninitialized variables: pageHeader and targetPagePtr.  It looks like
they could be silenced as follows, or maybe you see a better way?

-   XLogPageHeader pageHeader;
+   XLogPageHeader pageHeader = NULL;
uint32  pageHeaderSize;
-   XLogRecPtr  targetPagePtr;
+   XLogRecPtr  targetPagePtr =
InvalidXLogRecPtr;

To summarise the patches:

0001 + 0002 get rid of the callback interface and replace it with a
state machine, making it the client's problem to supply data when it
returns XLREAD_NEED_DATA.  I found this interface nicer to work with,
for my WAL decoding buffer patch (CF 2410), and I understand that the
encryption patch set can also benefit from it.  Certainly when I
rebased my project on this patch set, I prefered the result.

0003 is nice global variable cleanup.

I haven't looked at 0004.
From 560cdfa444a3b05a0e6b8054f3cfeadf56e059fc Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 5 Sep 2019 20:21:55 +0900
Subject: [PATCH v18 1/3] Move callback-call from ReadPageInternal to
 XLogReadRecord.

The current WAL record reader reads page data using a call back
function.  Redesign the interface so that it asks the caller for more
data when required.  This model works better for proposed projects that
encryption, prefetching and other new features that would require
extending the callback interface for each case.

As the first step of that change, this patch moves the page reader
function out of ReadPageInternal(), then the remaining tasks of the
function are taken over by the new function XLogNeedData().

Author: Kyotaro HORIGUCHI 
Author: Heikki Linnakangas 
Reviewed-by: Antonin Houska 
Reviewed-by: Alvaro Herrera 
Reviewed-by: Takashi Menjo 
Reviewed-by: Thomas Munro 
Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp
---
 src/backend/access/transam/xlog.c   |  16 +-
 src/backend/access/transam/xlogreader.c | 277 ++--
 src/backend/access/transam/xlogutils.c  |  12 +-
 src/backend/replication/walsender.c |  10 +-
 src/bin/pg_rewind/parsexlog.c   |  21 +-
 src/bin/pg_waldump/pg_waldump.c |   8 +-
 src/include/access/xlogreader.h |  33 +--
 src/include/access/xlogutils.h  |   2 +-
 8 files changed, 224 insertions(+), 155 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f8810e149..449e17d2fa 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -920,7 +920,7 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 static int	XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 		 XLogSource source, bool notfoundOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source);
-static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
+static bool	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		bool fetching_ckpt, XLogRecPtr tliRecPtr);
@@ -4375,7 +4375,6 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 	XLogRecord *record;
 	XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data;
 
-	/* Pass through parameters to XLogPageRead */
 	private->fetching_ckpt = fetching_ckpt;
 	private->emode = emode;
 	private->randAccess = (xlogreader->ReadRecPtr == InvalidXLogRecPtr);
@@ -12111,7 +12110,7 @@ CancelBackup(void)
  * XLogPageRead() to try fetching the record from another source, or to
  * sleep and retry.
  */
-static int
+static bool
 XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
 			 XLogRecPtr targetRecPtr, char *readBuf)
 {
@@ -12170,7 +12169,8 @@ retry:
 			readLen = 0;
 			readSource = XLOG_FROM_ANY;
 
-			return -1;
+			xlogreader->readLen = -1;
+			return false;
 		}
 	}
 
@@ -12265,7 +12265,8 @@ retry:
 		goto next_record_is_invalid;
 	}
 
-	return readLen;
+	xlogreader->readLen = readLen;
+	return true;
 
 next_record_is_invalid:
 	lastSourceFailed = true;

Re: TRUNCATE on foreign table

2021-04-06 Thread Kazutaka Onishi
I've attached v15.

> I still feel that the above bunch of code is duplicate of what
> do_sql_command function already has. I would recommend that we just
> make that function non-static(it's easy to do) and keep the
> declaration in postgres_fdw.h and use it in the
> postgresExecForeignTruncate.

I've tried this on v15.

> Another minor comment:
> We could move +ForeignServer  *serv = NULL; within foreach (lc,
> frels_list), because it's not being used outside.

I've moved it.

> cfbot failure on v14 - https://cirrus-ci.com/task/4772360931770368.
> Looks like it is not related to this patch, please re-confirm it.

I've checked v15 patch with "make check-world" and confirmed this passed.



2021年4月6日(火) 23:25 Bharath Rupireddy :
>
> On Tue, Apr 6, 2021 at 5:36 PM Kazutaka Onishi  wrote:
> >
> > Thank you for checking v13, and here is v14 patch.
>
> cfbot failure on v14 - https://cirrus-ci.com/task/4772360931770368.
> Looks like it is not related to this patch, please re-confirm it.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com


pgsql14-truncate-on-foreign-table.v15.patch
Description: Binary data


Re: shared-memory based stats collector

2021-04-06 Thread Andres Freund
Hi,

On 2021-04-05 02:29:14 -0700, Andres Freund wrote:
> I've spent most of the last 2 1/2 weeks on this now. Unfortunately I think
> that, while it has gotten a lot closer, it's still about a week's worth of
> work away from being committable.
> 
> My main concerns are:
> 
> - Test Coverage:
> 
>   I've added a fair bit of tests, but it's still pretty bad. There were a lot
>   of easy-to-hit bugs in earlier versions that nevertheless passed the test
>   just fine.
> 
>   Due to the addition of pg_stat_force_next_flush(), and that there's no need
>   to wait for the stats collector to write out files, it's now a lot more
>   realistic to have proper testing of a lot of the pgstat.c code.
> 
> - Architectural Review
> 
>   I rejiggered the patchset pretty significantly, and I think it needs more
>   review than I see as realistic in the next two days. In particular I don't
>   think
> 
> - Performance Testing
> 
>   I did a small amount, but given that this touches just about every query
>   etc, I think that's not enough. My changes unfortunately are substantial
>   enough to invalidate Horiguchi-san's earlier tests.
> 
> - Currently there's a corner case in which function (but not table!)  stats
>   for a dropped function may not be removed. That possibly is not too bad,
> 
> - Too many FIXMEs still open
> 
> 
> It is quite disappointing to not have the patch go into v14 :(. But I just
> don't quite see the path right now. But maybe I am just too tired right now,
> and it'll look better tomorrow (err today, in a few hours).
> [...]
> I now changed the split so that there is
> utils/activity/pgstat_{database,functions,global,relation}.c
> 
> For me that makes the code a lot more readable. Before this change I found it
> really hard to know where code should best be put etc, or where to find
> code. I found it to be pretty nice to work with after the new split.

I'm inclined to push patches
[PATCH v60 05/17] pgstat: split bgwriter and checkpointer messages.
[PATCH v60 06/17] pgstat: Split out relation stats handling from 
AtEO[Sub]Xact_PgStat() etc.
[PATCH v60 09/17] pgstat: xact level cleanups / consolidation.
[PATCH v60 10/17] pgstat: Split different types of stats into separate files.
[PATCH v60 12/17] pgstat: reorder file pgstat.c / pgstat.h contents.

to v14. They're just moving things around, so are fairly low risk. But
they're going to be a pain to maintain. And I think 10 and 12 make
pgstat.c a lot easier to understand.

Greetings,

Andres Freund




Re: subtransaction performance regression [kind of] due to snapshot caching

2021-04-06 Thread Andres Freund
Hi,

On 2021-04-05 21:35:21 -0700, Andres Freund wrote:
> See the attached fix. I did include a test that verifies that the
> kill_prior_tuples optimization actually prevents the index from growing,
> when subtransactions are involved. I think it should be stable, even
> with concurrent activity. But I'd welcome a look.

Pushed that now, after trying and failing to make the test spuriously
fail due to concurrent activity.

Greetings,

Andres Freund




GSoc Applicant

2021-04-06 Thread Mohamed Mansour
Greetings,

I'm Mohamed Mansour, a Data Engineer at IBM and a Master's degree student
in the Computer Engineering Department - Faculty of Engineering - Cairo
University.

I would like to apply to google summer of code to work on the following
project:

Database Load Stress Benchmark

Kindly find my attached CV and tell me if there is a place for me related
to this project or if you see another project that fits me better, then I
will build the proposal as soon as possible

Thanks in advance



*Eng. Mohamed Mansour(EG+) 20 112 003 3329(EG+) 20 106 352 6328Data
Engineer*


Re: [EXTERNAL] Any objection to documenting pg_sequence_last_value()?

2021-04-06 Thread James Coleman
On Tue, Mar 30, 2021 at 4:37 AM Hanefi Onaldi
 wrote:
>
> Hi All,
>
> I recently used pg_sequence_last_value() when working on a feature in an 
> extension, and it would have been easier for me if there were some 
> documentation for this function.
>
> I'd like to help document this function if there are no objections.
>
> Best,
> Hanefi
>
> -Original Message-
> From: James Coleman 
> Sent: 6 Ağustos 2020 Perşembe 16:14
> To: pgsql-hackers 
> Subject: [EXTERNAL] Any objection to documenting pg_sequence_last_value()?
>
> The function pg_sequence_last_value() was added to underlie the pg_sequences 
> view, and it's the only way I'm aware of from userspace to directly get the 
> last value of a sequence globally (i.e., not within the current session like 
> currval()/lastval()). Obviously you can join to the pg_sequences view, but 
> that's sometimes unnecessarily cumbersome since it doesn't expose the relid 
> of the sequence.
>
> When that function got added it apparently wasn't added to the docs, though 
> I'm not sure if that was intentional or not.
>
> Does anyone have any objections to documenting
> pg_sequence_last_value() in the sequence manipulation functions doc page?
>
> James
>
>

Given there's been no objection, I think it'd be worth submitting a
patch (and I'd be happy to review if you're willing to author one).

James




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2021-04-06 Thread Justin Pryzby
On Wed, Dec 23, 2020 at 01:17:10PM -0600, Justin Pryzby wrote:
> On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote:
> > * I noticed that you did s/stat/lstat/.  That's fine on Unix systems,
> > but it won't have any effect on Windows systems (cf bed90759f),
> > which means that we'll have to document a platform-specific behavioral
> > difference.  Do we want to go there?
> >
> > Maybe this patch needs to wait on somebody fixing our lack of real lstat() 
> > on Windows.
> 
> I think only the "top" patches depend on lstat (for the "type" column and
> recursion, to avoid loops).  The initial patches are independently useful, and
> resolve the original issue of hiding tmpdirs.  I've rebased and re-arranged 
> the
> patches to reflect this.

I said that, but then failed to attach the re-arranged patches.
Now I also renumbered OIDs following best practice.

The first handful of patches address the original issue, and I think could be
"ready":

$ git log --oneline origin..pg-ls-dir-new |tac
... Document historic behavior of links to directories..
... Add tests on pg_ls_dir before changing it
... Add pg_ls_dir_metadata to list a dir with file metadata..
... pg_ls_tmpdir to show directories and "isdir" argument..
... pg_ls_*dir to show directories and "isdir" column..

These others are optional:
... pg_ls_logdir to ignore error if initial/top dir is missing..
... pg_ls_*dir to return all the metadata from pg_stat_file..

..and these maybe requires more work for lstat on windows:
... pg_stat_file and pg_ls_dir_* to use lstat()..
... pg_ls_*/pg_stat_file to show file *type*..
... Preserve pg_stat_file() isdir..
... Add recursion option in pg_ls_dir_files..

-- 
Justin
>From c8a7cbd4ebbe903a8f373d637543b86b0ffa6dfd Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 16 Mar 2020 14:12:55 -0500
Subject: [PATCH v27 01/11] Document historic behavior of links to
 directories..

Backpatch to 9.5: pg_stat_file
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c6a45d9e55..977b75a531 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26990,7 +26990,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 Returns a record containing the file's size, last access time stamp,
 last modification time stamp, last file status change time stamp (Unix
 platforms only), file creation time stamp (Windows only), and a flag
-indicating if it is a directory.
+indicating if it is a directory (or a symbolic link to a directory).


 This function is restricted to superusers by default, but other users
-- 
2.17.0

>From c571e40d0938d897500fbd6fb7a88cb52139da92 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 17 Mar 2020 13:16:24 -0500
Subject: [PATCH v27 02/11] Add tests on pg_ls_dir before changing it

---
 src/test/regress/expected/misc_functions.out | 24 
 src/test/regress/sql/misc_functions.sql  |  8 +++
 2 files changed, 32 insertions(+)

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index e845042d38..ea0fc48dbd 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -214,6 +214,30 @@ select count(*) > 0 from
  t
 (1 row)
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+ name 
+--
+ .
+(1 row)
+
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+ name 
+--
+(0 rows)
+
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+ pg_ls_dir 
+---
+(0 rows)
+
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+ERROR:  could not open directory "does not exist": No such file or directory
+-- Check that expected columns are present
+select * from pg_stat_file('.') limit 0;
+ size | access | modification | change | creation | isdir 
+--++--++--+---
+(0 rows)
+
 --
 -- Test adding a support function to a subject function
 --
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index a398349afc..eb6ac12ab4 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -69,6 +69,14 @@ select count(*) > 0 from
where spcname = 'pg_default') pts
   join pg_database db on pts.pts = db.oid;
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+
+-- Check that expected columns are present

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-06 Thread Julien Rouhaud
On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> On 2021-Apr-06, Nitin Jadhav wrote:
> 
> > I have reviewed the code. Here are a few minor comments.
> > 
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> > 
> > The above two conditions can be clubbed together in a single condition.
> 
> I wonder if it wouldn't make more sense to put the assignment *after* we
> have checked the second condition.

All other pgstat_report_* functions do the assignment before doing any test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-06 Thread Alvaro Herrera
On 2021-Apr-06, Nitin Jadhav wrote:

> I have reviewed the code. Here are a few minor comments.
> 
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
> 
> The above two conditions can be clubbed together in a single condition.

I wonder if it wouldn't make more sense to put the assignment *after* we
have checked the second condition.

-- 
Álvaro Herrera   Valdivia, Chile




Re: psql - add SHOW_ALL_RESULTS option

2021-04-06 Thread Peter Eisentraut

On 14.03.21 10:54, Fabien COELHO wrote:


Hello Peter,


This reply was two months ago, and nothing has happened, so I have
marked the patch as RwF.


Given the ongoing work on returning multiple result sets from stored 
procedures[0], I went to dust off this patch.


Based on the feedback, I put back the titular SHOW_ALL_RESULTS option, 
but set the default to on.  I fixed the test failure in 
013_crash_restart.pl.  I also trimmed back the test changes a bit so 
that the resulting test output changes are visible better.  (We could 
make those stylistic changes separately in another patch.)  I'll put 
this back into the commitfest for another look.


Thanks a lot for the fixes and pushing it forward!

My 0.02€: I tested this updated version and do not have any comment on 
this version. From my point of view it could be committed. I would not 
bother to separate the test style ajustments.


Committed.  The last thing I fixed was the diff in the copy2.out 
regression test.  The order of the notices with respect to the error 
messages was wrong.  I fixed that by switching back to the regular 
notice processor during COPY handling.


Btw., not sure if that was mentioned before, but a cool use of this is 
to \watch multiple queries at once, like


select current_date \; select current_time \watch




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-06 Thread Julien Rouhaud
On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote:
> 
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
> 
> The above two conditions can be clubbed together in a single condition.

Right, I just kept it separate as the comment is only relevant for the 2nd
test.  I'm fine with merging both if needed.

> 2.
> +/* --
> + * pgstat_get_my_queryid() -
> + *
> + * Return current backend's query identifier.
> + */
> +uint64
> +pgstat_get_my_queryid(void)
> +{
> + if (!MyBEEntry)
> + return 0;
> +
> + return MyBEEntry->st_queryid;
> +}
> 
> Is it safe to directly read the data from MyBEEntry without
> calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
> ref pgstat_get_backend_current_activity() for more information. Kindly let
> me know if I am wrong.

This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the calls
to read that field shouldn't have any risk of reading torn values on platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it.  It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.




Re: ALTER TABLE ADD COLUMN fast default

2021-04-06 Thread Tom Lane
Andrew Gierth  writes:
> This gitlab ticket refers to the same incident:
> https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6047
> (which actually contains a new relevant fact that hadn't been mentioned
> in the IRC discussion, which is that the problem affected multiple
> tables, not just one.)

Hmm.  If you assume that the problem is either a corrupted index on
pg_attrdef, or some block(s) of pg_attrdef got zeroed out, then it
would be entirely unsurprising for multiple tables to be affected.

I've pushed the v2 patch to HEAD.  Not sure if we want to consider
back-patching.

regards, tom lane




[PATCH] PREPARE TRANSACTION unexpected behavior with TEMP TABLE

2021-04-06 Thread Himanshu Upadhyaya
Hi,

While working on one of the issue, I have noticed below unexpected behavior
with "PREPARE TRANSACTION".

We are getting this unexpected behavior with PREPARE TRANSACTION when it is
mixed with Temporary Objects. Please consider the below setup and SQL block.

set max_prepared_transactions to 1 (or any non zero value), this is to
enable the “prepare transaction”.

Now please try to run the below set of statements.
[BLOCK-1:]
postgres=# create temp table fullname (first text, last text);
CREATE TABLE
postgres=# BEGIN;
BEGIN
postgres=*# create function longname(fullname) returns text language sql
postgres-*# as $$select $1.first || ' ' || $1.last$$;
CREATE FUNCTION
postgres=*# prepare transaction 'mytran';
ERROR:  cannot PREPARE a transaction that has operated on temporary objects

Above error is expected.

The problem is if we again try to create the same function in the “PREPARE
TRANSACTION” as below.

[BLOCK-2:]
postgres=# BEGIN;
BEGIN
postgres=*# create function longname(fullname) returns text language sql
as $$select $1.first || ' ' || $1.last$$;
CREATE FUNCTION
postgres=*# PREPARE transaction 'mytran';
PREPARE TRANSACTION

Now, this time we succeed and not getting the above error (”ERROR:  cannot
PREPARE a transaction that has operated on temporary objects), like the way
we were getting with BLOCK-1

This is happening because we set MyXactFlags in relation_open function
call, and here relation_open is getting called from load_typcache_tupdesc,
but in the second run of “create function…” in the above #2 block will not
call load_typcache_tupdesc because of the below condition(typentry->tupDesc
== NULL) in  lookup_type_cache().

/*
 * If it's a composite type (row type), get tupdesc if requested
 */
if ((flags & TYPECACHE_TUPDESC) &&
typentry->tupDesc == NULL &&
typentry->typtype == TYPTYPE_COMPOSITE)
{
load_typcache_tupdesc(typentry);
}

We set typentry->tupDesc to non NULL(and populates it with proper tuple
descriptor in the cache) value during our first call to “create function…”
in BLOCK-1.
We have logic in file xact.c::PrepareTransaction() to simply error out if
we have accessed any temporary object in the current transaction but
because of the above-described issue of not setting
XACT_FLAGS_ACCESSEDTEMPNAMESPACE in MyXactFlags second run of “create
function..” Works and PREPARE TRANSACTION succeeds(but it should fail).

Please find attached the proposed patch to FIX this issue.

Thoughts?

Thanks,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From 6e2a6bdfb77870213d051f31b2b6683ccea45ab2 Mon Sep 17 00:00:00 2001
From: Himanshu Upadhyaya 
Date: Mon, 5 Apr 2021 21:04:56 +0530
Subject: [PATCH v1] PREPARE TRANSACTION must fail when mixed with temporary
 object.

---
 src/backend/utils/cache/typcache.c | 30 --
 src/test/regress/expected/temp.out | 14 ++
 src/test/regress/sql/temp.sql  | 14 ++
 3 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 4915ef5934..36fac6150d 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -813,7 +813,6 @@ lookup_type_cache(Oid type_id, int flags)
 	 * If it's a composite type (row type), get tupdesc if requested
 	 */
 	if ((flags & TYPECACHE_TUPDESC) &&
-		typentry->tupDesc == NULL &&
 		typentry->typtype == TYPTYPE_COMPOSITE)
 	{
 		load_typcache_tupdesc(typentry);
@@ -881,21 +880,25 @@ load_typcache_tupdesc(TypeCacheEntry *typentry)
 
 	/*
 	 * Link to the tupdesc and increment its refcount (we assert it's a
-	 * refcounted descriptor).  We don't use IncrTupleDescRefCount() for this,
-	 * because the reference mustn't be entered in the current resource owner;
+	 * refcounted descriptor), do this only if it was not populated previously.
+	 * We don't use IncrTupleDescRefCount() for this, because the reference
+	 * mustn't be entered in the current resource owner;
 	 * it can outlive the current query.
 	 */
-	typentry->tupDesc = RelationGetDescr(rel);
+	if (typentry->tupDesc == NULL)
+	{
+		typentry->tupDesc = RelationGetDescr(rel);
 
-	Assert(typentry->tupDesc->tdrefcount > 0);
-	typentry->tupDesc->tdrefcount++;
+		Assert(typentry->tupDesc->tdrefcount > 0);
+		typentry->tupDesc->tdrefcount++;
 
-	/*
-	 * In future, we could take some pains to not change tupDesc_identifier if
-	 * the tupdesc didn't really change; but for now it's not worth it.
-	 */
-	typentry->tupDesc_identifier = ++tupledesc_id_counter;
+		/*
+		 * In future, we could take some pains to not change tupDesc_identifier if
+		 * the tupdesc didn't really change; but for now it's not worth it.
+		 */
+		typentry->tupDesc_identifier = ++tupledesc_id_counter;
 
+	}
 	relation_close(rel, AccessShareLock);
 }
 
@@ -1529,9 +1532,8 @@ cache_record_field_properties(TypeCacheEntry 

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-06 Thread Nitin Jadhav
I have reviewed the code. Here are a few minor comments.

1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;

The above two conditions can be clubbed together in a single condition.

2.
+/* --
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+ if (!MyBEEntry)
+ return 0;
+
+ return MyBEEntry->st_queryid;
+}

Is it safe to directly read the data from MyBEEntry without
calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
ref pgstat_get_backend_current_activity() for more information. Kindly let
me know if I am wrong.

Thanks and Regards,
Nitin Jadhav

On Mon, Apr 5, 2021 at 10:46 PM Bruce Momjian  wrote:

> On Sun, Apr  4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote:
> > On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:
> > > On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> > > >
> > > > OK, I am happy with your design decisions, thanks.
> > >
> > > Thanks!  While double checking I noticed that I failed to remove a
> (now)
> > > useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
> > > attaching v22 to fix that, no other change.
> >
> > There was a conflict since e1025044c (Split backend status and progress
> related
> > functionality out of pgstat.c).
> >
> > Attached v23 is a rebase against current HEAD, and I also added a few
> > UINT64CONST() macro usage for consistency.
>
> Thanks.  I struggled with merging the statistics collection changes into
> my cluster file encryption branches because my patch made changes to
> code that moved to another C file.
>
> I plan to apply this tomorrow.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>
>
>


Re: TRUNCATE on foreign table

2021-04-06 Thread Bharath Rupireddy
On Tue, Apr 6, 2021 at 5:36 PM Kazutaka Onishi  wrote:
>
> Thank you for checking v13, and here is v14 patch.

cfbot failure on v14 - https://cirrus-ci.com/task/4772360931770368.
Looks like it is not related to this patch, please re-confirm it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: TRUNCATE on foreign table

2021-04-06 Thread Bharath Rupireddy
On Tue, Apr 6, 2021 at 5:36 PM Kazutaka Onishi  wrote:
>
> Thank you for checking v13, and here is v14 patch.
>
> > 1) Are we using all of these macros? I see that we are setting them
> > but we only use TRUNCATE_REL_CONTEXT_ONLY. If not used, can we remove
> > them?
>
> These may be needed for the foreign data handler other than postgres_fdw.

I'm not sure about this, but if it's discussed upthread and agreed
upon, I'm fine with it.

> > 4) I have a basic question: If I have a truncate statement with a mix
> > of local and foreign tables, IIUC, the patch is dividing up a single
> > truncate statement into two truncate local tables, truncate foreign
> > tables. Is this transaction safe at all?
>
> According to this discussion, we can revert both tables in the local
> and the server.
> https://www.postgresql.org/message-id/CAOP8fzbuJ5GdKa%2B%3DGtizbqFtO2xsQbn4mVjjzunmsNVJMChSMQ%40mail.gmail.com

On giving more thought on this, it looks like we are safe i.e. local
truncation will get reverted. Because if an error occurs on foreign
table truncation, the control in the local server would go to
pgfdw_report_error which generates an error in the local server which
aborts the local transaction and so the local table truncations would
get reverted.

+/* run remote query */
+if (!PQsendQuery(conn, sql.data))
+pgfdw_report_error(ERROR, NULL, conn, false, sql.data);
+
+res = pgfdw_get_result(conn, sql.data);
+
+if (PQresultStatus(res) != PGRES_COMMAND_OK)
+pgfdw_report_error(ERROR, res, conn, true, sql.data);

I still feel that the above bunch of code is duplicate of what
do_sql_command function already has. I would recommend that we just
make that function non-static(it's easy to do) and keep the
declaration in postgres_fdw.h and use it in the
postgresExecForeignTruncate.

Another minor comment:
We could move +ForeignServer  *serv = NULL; within foreach (lc,
frels_list), because it's not being used outside.

The v14 patch mostly looks good to me other than the above comments.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: New IndexAM API controlling index vacuum strategies

2021-04-06 Thread Matthias van de Meent
On Tue, 6 Apr 2021 at 05:13, Peter Geoghegan  wrote:
>
> On Mon, Apr 5, 2021 at 2:44 PM Matthias van de Meent
>  wrote:
> > This assertion is false, and should be a guarding if-statement. HOT
> > redirect pointers are not updated if the tuple they're pointing to is
> > vacuumed (i.e. when it was never committed) so this nextoffnum might
> > in a correctly working system point past maxoff.
>
> I will need to go through this in detail soon.
>
> > > Line pointer truncation doesn't happen during pruning, as it did in
> > > Matthias' original patch. In this revised version, line pointer
> > > truncation occurs during the second phase of VACUUM. There are several
> > > reasons to prefer this approach. It seems both safer and more useful
> > > that way (compared to the approach of doing line pointer truncation
> > > during pruning). It also makes intuitive sense to do it this way, at
> > > least to me -- the second pass over the heap is supposed to be for
> > > "freeing" LP_DEAD line pointers.
> >
> > Good catch for running a line pointer truncating pass at the second
> > pass over the heap in VACUUM, but I believe that it is also very
> > useful for pruning. Line pointer bloat due to excessive HOT chains
> > cannot be undone until the 2nd run of VACUUM happens with this patch,
> > which is a missed chance for all non-vacuum pruning.
>
> Maybe - I have my doubts about it having much value outside of the
> more extreme cases. But let's assume that I'm wrong about that, for
> the sake of argument.
>
> The current plan is to no longer require a super-exclusive lock inside
> lazy_vacuum_heap_page(), which means that we can no longer safely call
> PageRepairFragmentation() at that point. This will mean that
> PageRepairFragmentation() is 100% owned by pruning. And so the
> question of whether or not line pointer truncation should also happen
> in PageRepairFragmentation() to cover pruning is (or will be) a
> totally separate question to the question of how
> lazy_vacuum_heap_page() does it. Nothing stops you from independently
> pursuing that as a project for Postgres 15.

Ah, then I misunderstood your intentions when you mentioned including
a modified version of my patch. In which case, I agree that improving
HOT pruning is indeed out of scope.

> > What difference is there between opportunistically pruned HOT line
> > pointers, and VACUUMed line pointers?
>
> The fact that they are physically identical to each other isn't
> everything. The "life cycle" of an affected page is crucially
> important.
>
> I find that there is a lot of value in thinking about how things look
> at the page level moment to moment, and even over hours and days.
> Usually with a sample workload and table in mind. I already mentioned
> the new_order table from TPC-C, which is characterized by continual
> churn from more-or-less even amounts of range deletes and bulk inserts
> over time. That seems to be the kind of workload where you see big
> problems with line pointer bloat. Because there is constant churn of
> unrelated logical rows (it's not a bunch of UPDATEs).
>
> It's possible for very small effects to aggregate into large and
> significant effects -- I know this from my experience with indexing.
> Plus the FSM is probably not very smart about fragmentation, which
> makes it even more complicated. And so it's easy to be wrong if you
> predict that some seemingly insignificant extra intervention couldn't
> possibly help. For that reason, I don't want to predict that you're
> wrong now. It's just a question of time, and of priorities.
>
> > Truncating during pruning has
> > the benefit of keeping the LP array short where possible, and seeing
> > that truncating the LP array allows for more applied
> > PD_HAS_FREE_LINES-optimization, I fail to see why you wouldn't want to
> > truncate the LP array whenever clearing up space.
>
> Truncating the line pointer array is not an intrinsic good. I hesitate
> to do it during pruning in the absence of clear evidence that it's
> independently useful. Pruning is a very performance sensitive
> operation. Much more so than VACUUM's second heap pass.
>
> > Other than those questions, some comments on the other patches:
> >
> > 0002:
> > > +appendStringInfo(, _("There were %lld dead item identifiers.\n"),
> > > + (long long) vacrel->lpdead_item_pages);
> >
> > I presume this should use vacrel->lpdead_items?.
>
> It should have been, but as it happens I have decided to not do this
> at all in 0002-*. Better to not report on LP_UNUSED *or* LP_DEAD items
> at this point of VACUUM VERBOSE output.
>
> > 0003:
> > > + * ...  Aborted transactions
> > > + * have tuples that we can treat as DEAD without caring about where there
> > > + * tuple header XIDs ...
> >
> > This should be '... where their tuple header XIDs...'
>
> Will fix.
>
> > > +retry:
> > > +
> > > ...
> > > +res = HeapTupleSatisfiesVacuum(, vacrel->OldestXmin, buf);
> > > +
> > > +if (unlikely(res == 

Re: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread Fujii Masao




On 2021/04/06 20:44, David Steele wrote:

On 4/6/21 7:13 AM, Fujii Masao wrote:


On 2021/04/06 15:59, osumi.takami...@fujitsu.com wrote:

I just wanted to write why the error was introduced,
but it was not necessary.
We should remove and fix the first part of the sentence.


So the consensus is almost the same as the latest patch?
If they are not so different, I'm thinking to push the latest version atfirst.
Then we can improve the docs if required.


+1


Thanks! So I pushed the patch.


On 2021/04/06 20:48, osumi.takami...@fujitsu.com wrote:

Yes, please. What I suggested is almost same as your idea.


Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Allow batched insert during cross-partition updates

2021-04-06 Thread Bharath Rupireddy
On Tue, Apr 6, 2021 at 6:37 PM Amit Langote  wrote:
> > 3) will the cmin in the output always be the same?
> > +SELECT cmin, * FROM batch_cp_upd_test3;
>
> TBH, I am not so sure.  Maybe it's not a good idea to rely on cmin
> after all.  I've rewritten the tests to use a different method of
> determining if a single or multiple insert commands were used in
> moving rows into foreign partitions.

Thanks! It looks good!

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Can we remove extra memset in BloomInitPage, GinInitPage and SpGistInitPage when we have it in PageInit?

2021-04-06 Thread Bharath Rupireddy
On Tue, Apr 6, 2021 at 6:09 PM Michael Paquier  wrote:
>
> On Mon, Mar 22, 2021 at 10:58:17AM +0530, Mahendra Singh Thalor wrote:
> > Your changes look to fine me and I am also not getting any failure. I
> > think we should back-patch all the branches.
> >
> > Patch is applying to all the branches(till v95) and there is no failure.
>
> Er, no.  This is just some duplicated code with no extra effect.  I
> have no objection to simplify a bit the whole on readability and
> consistency grounds (will do so tomorrow), including the removal of
> the commented-out memset call in gistinitpage, but this is not
> something that should be backpatched.

+1 to not backport this patch because it's not a bug or not even a
critical issue. Having said that removal of these unnecessary memsets
would not only be better for readability and consistency but also can
reduce few extra function call costs(although minimal) while adding
new index pages.

Please find the v3 patch that removed the commented-out memset call in
gistinitpage.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From fe18cdc930933415c02f6eda82fbb1646831ba0a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 6 Apr 2021 19:06:51 +0530
Subject: [PATCH v3] Remove extra memset in
 BloomInitPage,GinInitPage,SpGistInitPage

We are memset-ting the special space page that's already set to
zeros by PageInit in BloomInitPage, GinInitPage, SpGistInitPage
and initCachedPage. We have already removed the memset after
PageInit in gistinitpage (see the comment there). It looks like
they are redundant. This patch that gets rid of the extra memset
calls.

While on it, this patch also removed MAXALIGN(sizeof(SpGistPageOpaqueData))
in SpGistInitPage because the PageInit will anyways align the
specialSize. This change is inline with other places (such as
BloomInitPage, brin_page_init GinInitPage, gistinitpage,
_hash_pageinit and so on) where we just pass the size of special
space data structure.

While on it, this patch also removed an unnecessary assignment of
pd_flags to 0 in PageInit as it is anyways being done by MemSet.
---
 contrib/bloom/blinsert.c | 1 -
 contrib/bloom/blutils.c  | 1 -
 src/backend/access/gin/ginutil.c | 1 -
 src/backend/access/gist/gistutil.c   | 2 --
 src/backend/access/spgist/spgutils.c | 3 +--
 src/backend/storage/page/bufpage.c   | 2 +-
 6 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index d37ceef753..c34a640d1c 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -63,7 +63,6 @@ flushCachedPage(Relation index, BloomBuildState *buildstate)
 static void
 initCachedPage(BloomBuildState *buildstate)
 {
-	memset(buildstate->data.data, 0, BLCKSZ);
 	BloomInitPage(buildstate->data.data, 0);
 	buildstate->count = 0;
 }
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 1e505b1da5..754de008d4 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -411,7 +411,6 @@ BloomInitPage(Page page, uint16 flags)
 	PageInit(page, BLCKSZ, sizeof(BloomPageOpaqueData));
 
 	opaque = BloomPageGetOpaque(page);
-	memset(opaque, 0, sizeof(BloomPageOpaqueData));
 	opaque->flags = flags;
 	opaque->bloom_page_id = BLOOM_PAGE_ID;
 }
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 6b9b04cf42..cdd626ff0a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -348,7 +348,6 @@ GinInitPage(Page page, uint32 f, Size pageSize)
 	PageInit(page, pageSize, sizeof(GinPageOpaqueData));
 
 	opaque = GinPageGetOpaque(page);
-	memset(opaque, 0, sizeof(GinPageOpaqueData));
 	opaque->flags = f;
 	opaque->rightlink = InvalidBlockNumber;
 }
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 1ff1bf816f..8dcd53c457 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -761,8 +761,6 @@ gistinitpage(Page page, uint32 f)
 	PageInit(page, pageSize, sizeof(GISTPageOpaqueData));
 
 	opaque = GistPageGetOpaque(page);
-	/* page was already zeroed by PageInit, so this is not needed: */
-	/* memset(&(opaque->nsn), 0, sizeof(GistNSN)); */
 	opaque->rightlink = InvalidBlockNumber;
 	opaque->flags = f;
 	opaque->gist_page_id = GIST_PAGE_ID;
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 72cbde7e0b..8d99c9b762 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -677,9 +677,8 @@ SpGistInitPage(Page page, uint16 f)
 {
 	SpGistPageOpaque opaque;
 
-	PageInit(page, BLCKSZ, MAXALIGN(sizeof(SpGistPageOpaqueData)));
+	PageInit(page, BLCKSZ, sizeof(SpGistPageOpaqueData));
 	opaque = SpGistPageGetOpaque(page);
-	memset(opaque, 0, sizeof(SpGistPageOpaqueData));
 	opaque->flags = f;
 	opaque->spgist_page_id = SPGIST_PAGE_ID;
 }
diff --git a/src/backend/storage/page/bufpage.c 

Re: [PATCH] New default role allowing to change per-role/database settings

2021-04-06 Thread Michael Banck
Hi,

Am Montag, den 05.04.2021, 14:33 -0400 schrieb Stephen Frost:
> * Michael Banck (michael.ba...@credativ.de) wrote:
> > Am Montag, den 08.03.2021, 20:54 +0500 schrieb Ibrar Ahmed:
> > > On Thu, Dec 31, 2020 at 6:16 PM Michael Banck  
> > > wrote:
> > > > in today's world, some DBAs have no superuser rights, but we can
> > > > delegate them additional powers like CREATEDB or the pg_monitor default
> > > > role etc. Usually, the DBA can also view the database logs, either via
> > > > shell access or some web interface.
> > > > 
> > > > One thing that I personally find lacking is that it is not possible to
> > > > change role-specific log settings (like log_statement = 'mod' for a
> > > > security sensitive role) without being SUPERUSER as their GUC context is
> > > > "superuser". This makes setup auditing much harder if there is no
> > > > SUPERUSER access, also pgaudit then only allows to configure object-
> > > > based auditing. Amazon RDS e.g. has patched Postgres to allow the
> > > > cluster owner/pseudo-superuser `rds_superuser' to change those log
> > > > settings that define what/when we log something, while keeping the
> > > > "where to log" entries locked down.
> > > > 
> > > > The attached patch introduces a new guc context "administrator" (better
> > > > names/bikeshedding for this welcome) that is meant as a middle ground
> > > > between "superuser" and "user". It also adds a new default role
> > > > "pg_change_role_settings" (better names also welcome) that can be
> > > > granted to DBAs so that they can change the "administrator"-context GUCs
> > > > on a per-role (or per-database) basis. Whether the latter should be
> > > > included is maybe debatable, but I added both on the basis that they are
> > > > the same "source".
> 
> If we're going to make the context be called 'administrator' then it
> would seem like we should make the predefined role be
> "pg_change_admin_settings"...?  Thoughts on that?

Well, I think the "administrator" guc context would be much less visible
than a "pg_change_admin_settings" predefined role. As a user, I would
assume such a predefined role would provide something more universal
than what is proposed here. That could still be the case later on, of
course.

Maybe somebody else has another idea on naming things?

> > > > The initial set of "administrator" GUCs are all current GUCs with
> > > > "superuser" context from these categories:
> > > > 
> > > > * Reporting and Logging / What to Log
> > > > * Reporting and Logging / When to Log
> > > > * Statistics / Query and Index Statistics Collector
> > > > * Statistics / Monitoring
> 
> Just to be clear- the predefined role being added here actually allows a
> user with this role to change both 'admin' and 'user' GUCs across all
> databases and across all non-superuser roles, right?  (A bit
> disappointed at the lack of any documentation included in this patch..
> presumably that would have made it clear).

About all databases, see below for your suggestion to limit it to the
intersection of this predefined role and the database owner.

Not exactly sure what you mean with "both 'admin' and 'user' GUCs", but
yeah, one could classify the above categories as such I guess. Other
SUSET-GUCs like the developer options are not touched, though.

I did not
include documentation in the initial patch because I wasn't sure whether
there was any interest in it (and admittedly, the commitfest deadline
came up I think), I can work on that now.

> > > > Of course, further categories (or particular GUCs) could be added now or
> > > > in the future, e.g. RDS also patches the following GUCs in their v12
> > > > offering:
> > > > 
> > > > * temp_file_limit
> 
> Being able to set temp_file_limit certainly seems appropriate.
> 
> > > > * session_replication_role
> 
> I'm less sure about session_replication_role ...

I'll keep those two out for now.

> > > > RDS does *not* patch log_transaction_sample_rate from "Reporting and
> > > > Logging / When to Log", but that might be more of an oversight than a
> > > > security consideration, or does anybody see a big problem with that
> > > > (compared to the others in that set)?
> 
> I tend to agree that it should be included, I don't see any particular
> reason why that would be worse than, say, log_statement.

Ok.

[...]

> > This commit introduces `administrator' as a middle ground between 
> > `superuser'
> > and `user' for guc contexts.
> > 
> > Those settings initially include all previously `superuser'-context settings
> > from the 'Reporting and Logging' ('What/When to Log' but not 'Where to Log')
> > and both 'Statistics' categories. Further settings could be added in 
> > follow-up
> > commits.
> > 
> > Also, a new default role pg_change_role_settings is introduced.  This allows
> > administrators (that are not superusers) that have been granted this role to
> > change the above PGC_ADMINSET settings on a per-role/database basis like 
> > "ALTER
> > ROLE [...] SET log_statement".
> > 

Re: Allow batched insert during cross-partition updates

2021-04-06 Thread Amit Langote
On Tue, Apr 6, 2021 at 6:49 PM Bharath Rupireddy
 wrote:
> On Tue, Apr 6, 2021 at 3:08 PM Amit Langote  wrote:
> > Updated patch attached.  I had to adjust the test case a little bit to
> > account for the changes of 86dc90056d, something I failed to notice
> > yesterday.  Also, I expanded the test case added in postgres_fdw.sql a
> > bit to show the batching in action more explicitly.
>
> Some minor comments:

Thanks for the review.

> 1) don't we need order by clause for the selects in the tests added?
> +SELECT tableoid::regclass, * FROM batch_cp_upd_test;

Good point.  It wasn't necessary before, but it is after the test
expansion, so added.

> 3) will the cmin in the output always be the same?
> +SELECT cmin, * FROM batch_cp_upd_test3;

TBH, I am not so sure.  Maybe it's not a good idea to rely on cmin
after all.  I've rewritten the tests to use a different method of
determining if a single or multiple insert commands were used in
moving rows into foreign partitions.

--
Amit Langote
EDB: http://www.enterprisedb.com


v5-0001-Allow-batching-of-inserts-during-cross-partition-.patch
Description: Binary data


Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)

2021-04-06 Thread Michael Paquier
On Tue, Apr 06, 2021 at 02:39:54PM +0200, Matthias van de Meent wrote:
> On Tue, 6 Apr 2021 at 14:29, Fujii Masao  wrote:
>> Attached is the updated version of the patch. Barring any objection,
>> I'm thinking to commit this.

Sorry for the late reply.  The approach to use LIMIT TO for this
purpose looks sensible from here, and I agree that it can have its 
uses.  So what you have here LGTM.
--
Michael


signature.asc
Description: PGP signature


Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)

2021-04-06 Thread Matthias van de Meent
On Tue, 6 Apr 2021 at 14:29, Fujii Masao  wrote:
>
> Thanks! So I applied all the changes that I suggested upthread to the patch.
> I also updated the comment as follows.
>
>  * Import table data for partitions only when they are 
> explicitly
> -* specified in LIMIT TO clause. Otherwise ignore them and
> -* only include the definitions of the root partitioned 
> tables to
> -* allow access to the complete remote data set locally in
> -* the schema imported.
> +* specified in LIMIT TO clause. Otherwise ignore them and 
> only
> +* include the definitions of the root partitioned tables to 
> allow
> +* access to the complete remote data set locally in the 
> schema
> +* imported.
>
> Attached is the updated version of the patch. Barring any objection,
> I'm thinking to commit this.

Thanks, this was on my to-do list for today, but you were faster.

No objections on my part, and thanks for picking this up.

With regards,

Matthias van de Meent




Re: Can we remove extra memset in BloomInitPage, GinInitPage and SpGistInitPage when we have it in PageInit?

2021-04-06 Thread Michael Paquier
On Mon, Mar 22, 2021 at 10:58:17AM +0530, Mahendra Singh Thalor wrote:
> Your changes look to fine me and I am also not getting any failure. I
> think we should back-patch all the branches.
> 
> Patch is applying to all the branches(till v95) and there is no failure.

Er, no.  This is just some duplicated code with no extra effect.  I
have no objection to simplify a bit the whole on readability and
consistency grounds (will do so tomorrow), including the removal of
the commented-out memset call in gistinitpage, but this is not
something that should be backpatched.
--
Michael


signature.asc
Description: PGP signature


Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)

2021-04-06 Thread Fujii Masao



On 2021/04/06 16:05, Amit Langote wrote:

On Tue, Apr 6, 2021 at 8:34 AM Fujii Masao  wrote:

For now I have no objection to this feature.

-IMPORT FOREIGN SCHEMA import_source EXCEPT (t1, "x 4", nonesuch)
+IMPORT FOREIGN SCHEMA import_source EXCEPT (t1, "x 4", nonesuch, t4_part)

Isn't it better to create also another partition like "t4_part2"?
If we do this, for example, the above test can confirm that both
partitions in EXCEPT and not in are excluded.

+All tables or foreign tables which are partitions of some other table
+are automatically excluded from 
+unless they are explicitly included in the LIMIT TO

IMO it's better to document that partitions are imported when they are
included in LIMIT TO, instead. What about the following?

  Tables or foreign tables which are partitions of some other table are
  imported only when they are explicitly specified in
  LIMIT TO clause.  Otherwise they are automatically
  excluded from .

+clause.  Since all data can be accessed through the partitioned table
+which is the root of the partitioning hierarchy, this approach should
+allow access to all the data without creating extra objects.

Now "this approach" in the above is not clear? What about replacing it with
something like "importing only partitioned tables"?


+1, that wording is better.


Thanks! So I applied all the changes that I suggested upthread to the patch.
I also updated the comment as follows.

 * Import table data for partitions only when they are 
explicitly
-* specified in LIMIT TO clause. Otherwise ignore them and
-* only include the definitions of the root partitioned tables 
to
-* allow access to the complete remote data set locally in
-* the schema imported.
+* specified in LIMIT TO clause. Otherwise ignore them and only
+* include the definitions of the root partitioned tables to 
allow
+* access to the complete remote data set locally in the schema
+* imported.

Attached is the updated version of the patch. Barring any objection,
I'm thinking to commit this.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 59e4e27ffb..9d472d2d3d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8228,6 +8228,8 @@ ALTER TABLE import_source."x 5" DROP COLUMN c1;
 CREATE TABLE import_source.t4 (c1 int) PARTITION BY RANGE (c1);
 CREATE TABLE import_source.t4_part PARTITION OF import_source.t4
   FOR VALUES FROM (1) TO (100);
+CREATE TABLE import_source.t4_part2 PARTITION OF import_source.t4
+  FOR VALUES FROM (100) TO (200);
 CREATE SCHEMA import_dest1;
 IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_dest1;
 \det+ import_dest1.*
@@ -8419,27 +8421,29 @@ FDW options: (schema_name 'import_source', table_name 
'x 5')
 
 -- Check LIMIT TO and EXCEPT
 CREATE SCHEMA import_dest4;
-IMPORT FOREIGN SCHEMA import_source LIMIT TO (t1, nonesuch)
+IMPORT FOREIGN SCHEMA import_source LIMIT TO (t1, nonesuch, t4_part)
   FROM SERVER loopback INTO import_dest4;
 \det+ import_dest4.*
- List of foreign tables
-Schema| Table |  Server  |  FDW options
   | Description 
---+---+--++-
- import_dest4 | t1| loopback | (schema_name 'import_source', table_name 
't1') | 
-(1 row)
+List of foreign tables
+Schema|  Table  |  Server  | FDW options   
  | Description 
+--+-+--+-+-
+ import_dest4 | t1  | loopback | (schema_name 'import_source', table_name 
't1')  | 
+ import_dest4 | t4_part | loopback | (schema_name 'import_source', table_name 
't4_part') | 
+(2 rows)
 
-IMPORT FOREIGN SCHEMA import_source EXCEPT (t1, "x 4", nonesuch)
+IMPORT FOREIGN SCHEMA import_source EXCEPT (t1, "x 4", nonesuch, t4_part)
   FROM SERVER loopback INTO import_dest4;
 \det+ import_dest4.*
- List of foreign tables
-Schema| Table |  Server  |   FDW options   
| Description 
---+---+--+-+-
- import_dest4 | t1| loopback | (schema_name 'import_source', table_name 
't1')  | 
- import_dest4 | t2| loopback | (schema_name 'import_source', table_name 
't2')  | 
- import_dest4 | t3| loopback | (schema_name 'import_source', table_name 
't3')  | 
- import_dest4 | t4| loopback | (schema_name 

Re: Reference Leak with type

2021-04-06 Thread Michael Paquier
On Tue, Apr 06, 2021 at 11:09:13AM +0530, Rohit Bhogate wrote:
>  I found the below reference leak on master.

Thanks for the report.  This is indeed a new problem as of HEAD,
coming from c9d52984 as far as I can see, and 13 does not support this
grammar.  From what I can see, there seems to be an issue with the
reference count of the TupleDesc here, your test case increments two
times a TupleDesc for this custom type in a portal, and tries to
decrement it three times, causing what looks like a leak.
--
Michael


signature.asc
Description: PGP signature


Re: TRUNCATE on foreign table

2021-04-06 Thread Kazutaka Onishi
Thank you for checking v13, and here is v14 patch.

> 1) Are we using all of these macros? I see that we are setting them
> but we only use TRUNCATE_REL_CONTEXT_ONLY. If not used, can we remove
> them?

These may be needed for the foreign data handler other than postgres_fdw.

> 2) Why is this change for? The existing comment properly says the
> behaviour i.e. all foreign tables are updatable by default.

This is just a mistake. I've fixed it.

> 3) In the docs, let's not combine updatable and truncatable together.
> Have a separate section for Truncatability Options, all
> the documentation related to it be under this new section.

Sure. I've added new section.

> 4) I have a basic question: If I have a truncate statement with a mix
> of local and foreign tables, IIUC, the patch is dividing up a single
> truncate statement into two truncate local tables, truncate foreign
> tables. Is this transaction safe at all?

According to this discussion, we can revert both tables in the local
and the server.
https://www.postgresql.org/message-id/CAOP8fzbuJ5GdKa%2B%3DGtizbqFtO2xsQbn4mVjjzunmsNVJMChSMQ%40mail.gmail.com

> 6) Again v13 has white space errors, please ensure to run git diff
> --check on every patch.

Umm..  I'm sure I've checked it on v13.
I've confirmed it on v14.

2021年4月6日(火) 13:33 Bharath Rupireddy :
>
> On Mon, Apr 5, 2021 at 8:47 PM Kazutaka Onishi  wrote:
> >
> > > Did you check that hash_destroy is not reachable when an error occurs
> > > on the remote server while executing truncate command?
> >
> > I've checked it and hash_destroy doesn't work on error.
> >
> > I just adding elog() to check this:
> > + elog(NOTICE,"destroyed");
> > + hash_destroy(ft_htab);
> >
> > Then I've checked by the test.
> >
> > + -- 'truncatable' option
> > + ALTER SERVER loopback OPTIONS (ADD truncatable 'false');
> > + TRUNCATE tru_ftable; -- error
> > + ERROR:  truncate on "tru_ftable" is prohibited
> > <-   hash_destroy doesn't work.
> > + ALTER FOREIGN TABLE tru_ftable OPTIONS (ADD truncatable 'true');
> > + TRUNCATE tru_ftable; -- accepted
> > + NOTICE:  destroyed <-  hash_destroy works.
> >
> > Of course, the elog() is not included in v13 patch.
>
> Few more comments on v13:
>
> 1) Are we using all of these macros? I see that we are setting them
> but we only use TRUNCATE_REL_CONTEXT_ONLY. If not used, can we remove
> them?
> +#define TRUNCATE_REL_CONTEXT_NORMAL   0x01
> +#define TRUNCATE_REL_CONTEXT_ONLY 0x02
> +#define TRUNCATE_REL_CONTEXT_CASCADING 0x04
>
> 2) Why is this change for? The existing comment properly says the
> behaviour i.e. all foreign tables are updatable by default.
> @@ -2216,7 +2223,7 @@ postgresIsForeignRelUpdatable(Relation rel)
>  ListCell   *lc;
>
>  /*
> - * By default, all postgres_fdw foreign tables are assumed updatable. 
> This
> + * By default, all postgres_fdw foreign tables are assumed NOT
> truncatable. This
>
> And the below comment is wrong, by default foreign tables are assumed
> truncatable.
> + * By default, all postgres_fdw foreign tables are NOT assumed
> truncatable. This
> + * can be overridden by a per-server setting, which in turn can be
> + * overridden by a per-table setting.
> + */
>
> 3) In the docs, let's not combine updatable and truncatable together.
> Have a separate section for Truncatability Options, all
> the documentation related to it be under this new section.
> 
>  By default all foreign tables using
> postgres_fdw are assumed
> -to be updatable.  This may be overridden using the following option:
> +to be updatable and truncatable.  This may be overridden using
> the following options:
> 
>
> 4) I have a basic question: If I have a truncate statement with a mix
> of local and foreign tables, IIUC, the patch is dividing up a single
> truncate statement into two truncate local tables, truncate foreign
> tables. Is this transaction safe at all?
> A better illustration: TRUNCATE local_rel1, local_rel2, local_rel3,
> foreign_rel1, foreign_rel2, foreign_rel3;
> Your patch executes TRUNCATE local_rel1, local_rel2, local_rel3; on
> local server and TRUNCATE foreign_rel1, foreign_rel2, foreign_rel3; on
> remote server. Am I right?
> Now the question is: if any failure occurs either in local server
> execution or in remote server execution, the other truncate command
> would succeed right? Isn't this non-transactional and we are breaking
> the transactional guarantee of the truncation.
> Looks like the order of execution is - first local rel truncation and
> then foreign rel truncation, so what happens if foreign rel truncation
> fails? Can we revert the local rel truncation?
>
> 6) Again v13 has white space errors, please ensure to run git diff
> --check on every patch.
> bharath@ubuntu:~/workspace/postgres$ git apply
> /mnt/hgfs/Shared/pgsql14-truncate-on-foreign-table.v13.patch
> /mnt/hgfs/Shared/pgsql14-truncate-on-foreign-table.v13.patch:41:
> trailing whitespace.
> 

Re: Replication slot stats misgivings

2021-04-06 Thread vignesh C
On Tue, Apr 6, 2021 at 12:19 PM Amit Kapila  wrote:
>
> On Mon, Apr 5, 2021 at 8:51 PM vignesh C  wrote:
> >
>
> Few comments on the latest patches:
> Comments on 0001
> 
> 1.
> @@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
> TransactionId xid, bool create,
>   dlist_push_tail(>toplevel_by_lsn, >node);
>   AssertTXNLsnOrder(rb);
>   }
> +
> + rb->totalTxns++;
>   }
>   else
>   txn = NULL; /* not found and not asked to create */
> @@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
>   {
>   txn->size += sz;
>   rb->size += sz;
> + rb->totalBytes += sz;
>
> I think this will include the txns that are aborted and for which we
> don't send anything. It might be better to update these stats in
> ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
> have sent the data. We can probably use size/total_size in txn. We
> need to be careful to not double include the totaltxn or totalBytes
> for streaming xacts as we might process the same txn multiple times.
>
> 2.
> +Amount of decoded transactions data sent to the decoding output 
> plugin
> +while decoding the changes from WAL for this slot. This and 
> total_txns
> +for this slot can be used to gauge the total amount of data during
> +logical decoding.
>
> I think we can slightly modify the second line here: "This can be used
> to gauge the total amount of data sent during logical decoding.". Why
> we need to include total_txns along with it.
>
> 0002
> --
> 3.
> +  -- we don't want to wait forever; loop will exit after 30 seconds
> +  FOR i IN 1 .. 5 LOOP
> +
> ...
> ...
> +
> +-- wait a little
> +perform pg_sleep_for('100 milliseconds');
>
> I think this loop needs to be executed 300 times instead of 5 times,
> if the above comments and code needs to do what is expected here?
>
>
> 4.
> +# Test to drop one of the subscribers and verify replication statistics data 
> is
> +# fine after publisher is restarted.
> +$node->safe_psql('postgres', "SELECT
> pg_drop_replication_slot('regression_slot4')");
> +
> +$node->stop;
> +$node->start;
> +
> +# Verify statistics data present in pg_stat_replication_slots are sane after
> +# publisher is restarted
> +$result = $node->safe_psql('postgres',
> + "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS 
> total_bytes
> + FROM pg_stat_replication_slots ORDER BY slot_name"
>
> Various comments in the 0002 refer to publisher/subscriber which is
> not what we are using here.
>
> 5.
> +# Create table.
> +$node->safe_psql('postgres',
> +"CREATE TABLE test_repl_stat(col1 int)");
> +$node->safe_psql('postgres',
> +"SELECT data FROM
> pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> +"SELECT data FROM
> pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> +"SELECT data FROM
> pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> +"SELECT data FROM
> pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
>
> I think we can save the above calls to pg_logical_slot_get_changes if
> we create table before creating the slots in this test.
>
> 0003
> -
> 6. In the tests/code, publisher is used at multiple places. I think
> that is not required because this can happen via plugin as well.
> 7.
> + if (max_replication_slots == nReplSlotStats)
> + {
> + ereport(pgStatRunningInCollector ? LOG : WARNING,
> + (errmsg("skipping \"%s\" replication slot statistics as
> pg_stat_replication_slots does not have enough slots",
> + NameStr(replSlotStats[nReplSlotStats].slotname;
> + memset([nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
>
> Do we need memset here? Isn't this location is past the max location?

Thanks for the comments, I will fix and post a patch for this soon.

Regards,
Vignesh




RE: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread osumi.takami...@fujitsu.com
On Tuesday, April 6, 2021 8:44 PM David Steele  wrote:
> On 4/6/21 7:13 AM, Fujii Masao wrote:
> >
> > On 2021/04/06 15:59, osumi.takami...@fujitsu.com wrote:
> >> I just wanted to write why the error was introduced, but it was not
> >> necessary.
> >> We should remove and fix the first part of the sentence.
> >
> > So the consensus is almost the same as the latest patch?
> > If they are not so different, I'm thinking to push the latest version
> > at first.
> > Then we can improve the docs if required.
> 
> +1
Yes, please. What I suggested is almost same as your idea.
Thank you for your confirmation.


Best Regards,
Takamichi Osumi





Re: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread David Steele

On 4/6/21 7:13 AM, Fujii Masao wrote:


On 2021/04/06 15:59, osumi.takami...@fujitsu.com wrote:

I just wanted to write why the error was introduced,
but it was not necessary.
We should remove and fix the first part of the sentence.


So the consensus is almost the same as the latest patch?
If they are not so different, I'm thinking to push the latest version at 
first.

Then we can improve the docs if required.


+1

--
-David
da...@pgmasters.net




Re: pgbench - add pseudo-random permutation function

2021-04-06 Thread Dean Rasheed
On Mon, 5 Apr 2021 at 13:07, Fabien COELHO  wrote:
>
> Attached a v28 which I hope fixes the many above issues and stays with
> ints. The randu64 is still some kind of a joke, I artificially reduced the
> cost by calling jrand48 once and extending it to 64 bits, so it could give
> an idea of the cost endured if a 64-bit prng was used.
>
> Now you are the committer, you can do as you please, I'm just stating my
> (mathematical) opinions about using floating point computations for that.
> I think that apart from this point of principle/philosophy the permute
> performance and implementation are reasonable, and better than my initial
> version because it avoids int128 computations and the large prime number
> business.
>

Pushed.

I decided not to go with the "joke" randu64() function, but instead
used getrand() directly. This at least removes any *direct* use of
doubles in permute() (though of course they're still used indirectly),
and means that if getrand() is improved in the future, permute() will
benefit too.

Regards,
Dean




Re: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread Fujii Masao




On 2021/04/06 15:59, osumi.takami...@fujitsu.com wrote:

I just wanted to write why the error was introduced,
but it was not necessary.
We should remove and fix the first part of the sentence.


So the consensus is almost the same as the latest patch?
If they are not so different, I'm thinking to push the latest version at first.
Then we can improve the docs if required.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: UniqueKey on Partitioned table.

2021-04-06 Thread David Rowley
On Tue, 6 Apr 2021 at 22:31, Andy Fan  wrote:
> On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat  
> wrote:
>> I think the reason we add ECs for sort expressions is to use
>> transitive relationship. The EC may start with a single member but
>> later in the planning that member might find partners which are all
>> equivalent. Result ordered by one is also ordered by the other. The
>> same logic applies to UniqueKey as well, isn't it. In a result if a
>> set of columns makes a row unique, the set of columns represented by
>> the other EC member should be unique. Though a key will start as a
>> singleton it might EC partners later and thus thus unique key will
>> transition to all the members. With that logic UniqueKey should use
>> just ECs instead of bare expressions.
>
>
> TBH, I haven't thought about this too hard, but I think when we build the
> UniqueKey, all the ECs have been built already.  So can you think out an
> case we start with an EC with a single member at the beginning and
> have more members later for UniqueKey cases?

I don't really know if it matters which order things happen. We still
end up with a single EC containing {a,b} whether we process ORDER BY b
or WHERE a=b first.

In any case, the reason we want PathKeys to be ECs rather than just
Exprs is to allow cases such as the following to use an index to
perform the sort.

# create table ab (a int, b int);
# create index on ab(a);
# set enable_seqscan=0;
# explain select * from ab where a=b order by b;
 QUERY PLAN
-
 Index Scan using ab_a_idx on ab  (cost=0.15..83.70 rows=11 width=8)
   Filter: (a = b)
(2 rows)

Of course, we couldn't use this index to provide pre-sorted results if
"where a=b" hadn't been there.

David




Re: UniqueKey on Partitioned table.

2021-04-06 Thread Andy Fan
On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat 
wrote:

> > b).  How to present the element
> > in UniqueKey.  Prue EquivalenceClasses or Mix of Expr and
> EquivalenceClass as
> > we just talked about.
> I think the reason we add ECs for sort expressions is to use
> transitive relationship. The EC may start with a single member but
> later in the planning that member might find partners which are all
> equivalent. Result ordered by one is also ordered by the other. The
> same logic applies to UniqueKey as well, isn't it. In a result if a
> set of columns makes a row unique, the set of columns represented by
> the other EC member should be unique. Though a key will start as a
> singleton it might EC partners later and thus thus unique key will
> transition to all the members. With that logic UniqueKey should use
> just ECs instead of bare expressions.
>

TBH, I haven't thought about this too hard, but I think when we build the
UniqueKey, all the ECs have been built already.  So can you think out an
case we start with an EC with a single member at the beginning and
have more members later for UniqueKey cases?

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


Re: Asynchronous Append on postgres_fdw nodes.

2021-04-06 Thread Etsuro Fujita
On Tue, Apr 6, 2021 at 5:45 PM Etsuro Fujita  wrote:
> Also, we should improve the code to avoid
> the consistency mentioned above,

Sorry, s/consistency/inconsistency/.

> I'll apply the patch.

Done.  Let's see if this works.

Best regards,
Etsuro Fujita




Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-04-06 Thread Peter Eisentraut

On 06.04.21 07:24, Japin Li wrote:

I think this patch is about ready to commit, but please provide a final
version in good time.

I took the liberty to address all the review comments and provide a v9
patch on top of Japin's v8 patch-set.


(Also, please combine your patches into a single patch.)

Done.

Attaching v9 patch, please review it.


Sorry for the late reply! Thanks for your updating the new patch, and it looks
good to me.


Committed.

Note that you can use syntax like "ADD|DROP|SET" in the tab completion 
code.  I have simplified some of your code like that.


I also deduplicated the documentation additions a bit.





Re: Allow batched insert during cross-partition updates

2021-04-06 Thread Bharath Rupireddy
On Tue, Apr 6, 2021 at 3:08 PM Amit Langote  wrote:
>
> On Mon, Apr 5, 2021 at 1:16 AM Zhihong Yu  wrote:
> >
> > Hi,
> > In the description:
> >
> > cross-partition update of partitioned tables can't use batching
> > because ExecInitRoutingInfo() which initializes the insert target
> >
> > 'which' should be dropped since 'because' should start a sentence.
> >
> > +-- Check that batched inserts also works for inserts made during
> >
> > inserts also works -> inserts also work
> >
> > +   Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
> > +  RELKIND_PARTITIONED_TABLE);
> >
> > The level of nested field accesses is quite deep. If the assertion fails, 
> > it would be hard to know which field is null.
> > Maybe use several assertions:
> >Assert(node->rootResultRelInfo)
> >Assert(node->rootResultRelInfo->ri_RelationDesc)
> >Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind == 
> > ...
>
> Thanks for taking a look at this.
>
> While I agree about having the 1st Assert you suggest, I don't think
> this code needs the 2nd one, because its failure would very likely be
> due to a problem in some totally unrelated code.
>
> Updated patch attached.  I had to adjust the test case a little bit to
> account for the changes of 86dc90056d, something I failed to notice
> yesterday.  Also, I expanded the test case added in postgres_fdw.sql a
> bit to show the batching in action more explicitly.

Some minor comments:
1) don't we need order by clause for the selects in the tests added?
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+SELECT cmin, * FROM batch_cp_upd_test1;

2) typo - it is "should" not "shoud"
+-- cmin shoud be different across rows, because each one would be inserted

3) will the cmin in the output always be the same?
+SELECT cmin, * FROM batch_cp_upd_test3;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Allow batched insert during cross-partition updates

2021-04-06 Thread Amit Langote
On Mon, Apr 5, 2021 at 1:16 AM Zhihong Yu  wrote:
>
> Hi,
> In the description:
>
> cross-partition update of partitioned tables can't use batching
> because ExecInitRoutingInfo() which initializes the insert target
>
> 'which' should be dropped since 'because' should start a sentence.
>
> +-- Check that batched inserts also works for inserts made during
>
> inserts also works -> inserts also work
>
> +   Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
> +  RELKIND_PARTITIONED_TABLE);
>
> The level of nested field accesses is quite deep. If the assertion fails, it 
> would be hard to know which field is null.
> Maybe use several assertions:
>Assert(node->rootResultRelInfo)
>Assert(node->rootResultRelInfo->ri_RelationDesc)
>Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind == ...

Thanks for taking a look at this.

While I agree about having the 1st Assert you suggest, I don't think
this code needs the 2nd one, because its failure would very likely be
due to a problem in some totally unrelated code.

Updated patch attached.  I had to adjust the test case a little bit to
account for the changes of 86dc90056d, something I failed to notice
yesterday.  Also, I expanded the test case added in postgres_fdw.sql a
bit to show the batching in action more explicitly.

--
Amit Langote
EDB: http://www.enterprisedb.com


v4-0001-Allow-batching-of-inserts-during-cross-partition-.patch
Description: Binary data


Re: Confusing behavior of psql's \e

2021-04-06 Thread Laurenz Albe
On Sat, 2021-04-03 at 17:43 -0400, Tom Lane wrote:
> Laurenz Albe  writes:
> > Attached is version 6.
> 
> Pushed with some mostly-cosmetic fiddling.
> 
> One thing I changed that wasn't cosmetic is that as you had it,
> the behavior of "\e file" varied depending on whether the query
> buffer had been empty, which surely seems like a bad idea.
> I made it do discard_on_quit always in that case.  I think there
> might be a case for discard_on_quit = false always, ie maybe
> the user wanted to load the file into the query buffer as-is.
> But it seems like a pretty weak case --- you'd be more likely
> to just use \i for that situation.

Thanks for that!

Yours,
Laurenz Albe





Re: Key management with tests

2021-04-06 Thread Neil Chen
Hi Bruce,

I went through these patches and executed the test script you added for the
KMS section, which looks all good.

This is a point that looks like a bug - in patch 10, you changed the
location and use of *RelFileNodeSkippingWAL()*, but the modified code logic
seems different from the original when encryption is not enabled. After
applying this patch, it still will execute the set LSN code flow when
RelFileNodeSkippingWAL returns true, and encryption not enabled.



On Thu, Apr 1, 2021 at 2:47 PM Bruce Momjian  wrote:

> On Thu, Mar 11, 2021 at 10:31:28PM -0500, Bruce Momjian wrote:
> > I have made significant progress on the cluster file encryption feature
> so
> > it is time for me to post a new set of patches.
>
> Here is a rebase, to keep the cfbot green.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>

-- 
There is no royal road to learning.
HighGo Software Co.


Re: Asynchronous Append on postgres_fdw nodes.

2021-04-06 Thread Etsuro Fujita
On Tue, Apr 6, 2021 at 12:01 PM Kyotaro Horiguchi
 wrote:
> At Mon, 5 Apr 2021 17:15:47 +0900, Etsuro Fujita  
> wrote in
> > On Fri, Apr 2, 2021 at 12:09 AM Tom Lane  wrote:
> > > The buildfarm points out that this fails under valgrind.
> > > I easily reproduced it here:
> > >
> > > ==00:00:03:42.115 3410499== Syscall param epoll_wait(events) points to 
> > > unaddressable byte(s)
> > > ==00:00:03:42.115 3410499==at 0x58E926B: epoll_wait (epoll_wait.c:30)
> > > ==00:00:03:42.115 3410499==by 0x7FC903: WaitEventSetWaitBlock 
> > > (latch.c:1452)
> ...
> > The reason for this would be that epoll_wait() is called with
> > maxevents exceeding the size of the input event array in the test
> > case.  To fix, I adjusted the parameters to call the caller function
>
> # s/input/output/ event array? (occurrred_events)

Sorry, my explanation was not enough.  I think I was in a hurry.  I
mean by "the input event array" the epoll_event array given to
epoll_wait() (i.e., the epoll_ret_events array).

> # I couldn't reproduce it, so sorry in advance if the following
> # discussion is totally bogus..

I produced this failure by running the following simple query in async
mode on a valgrind-enabled build:

select * from ft1 union all select * from ft2

where ft1 and ft2 are postgres_fdw foreign tables.  For this query, we
would call WaitEventSetWait() with nevents=16 in
ExecAppendAsyncEventWait() as EVENT_BUFFER_SIZE=16, and then
epoll_wait() with maxevents=16 in WaitEventSetWaitBlock(); but
maxevents would exceed the input event array as the array size is
three.  I think this inconsitency would cause the valgrind failure.
I'm not 100% sure about that, but the patch fixing this inconsistency
I posted fixed the failure in my environment.

> I have nothing to say if it actually corrects the error, but the only
> restriction of maxevents is that it must be positive, and in any case
> epoll_wait returns no more than set->nevents events. So I'm a bit
> wondering if that's the reason. In the first place I'm wondering if
> valgrind is aware of that depth..

Yeah, the failure might actually be harmless, but anyway, we should
make the buildfarm green.  Also, we should improve the code to avoid
the consistency mentioned above, so I'll apply the patch.

Thanks for the comments!

Best regards,
Etsuro Fujita




Re: Idea: Avoid JOINs by using path expressions to follow FKs

2021-04-06 Thread Joel Jacobson
In a Hacker News discussion [2] on using foreign keys for joins,
the author of PostgREST, Steve Chavez, mentioned they are actually already
using this idea in PostgREST:

Steve Chavez wrote:
>The idea about using FK as a JOIN target is interesting.
>While developing a syntax for PostgREST resource embedding[1],
>I also reached the conclusion that FKs would be a convenient way to join tables
>(also suggested renaming them as you do here).
>
>IIRC, self joins are still an issue with FK joining.
>
>[1]: https://postgrest.org/en/v7.0.0/api.html#embedding-disambiguation

I think this idea looks very promising and fruitful.

Maybe we can think of some other existing/new operator which would be 
acceptable,
instead of using "->" (which is potentially in conflict with the SQL standard's 
"REF" thing)?

Not saying we should move forward on our own with this idea,
but if we can come up with a complete proposal,
maybe it can be presented as an idea to the SQL committee?

[2] https://news.ycombinator.com/item?id=26693705

/Joel

Re: A reloption for partitioned tables - parallel_workers

2021-04-06 Thread Amit Langote
On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe  wrote:
> On Wed, 2021-03-24 at 14:14 +1300, David Rowley wrote:
> > On Fri, 19 Mar 2021 at 02:07, Amit Langote  wrote:
> > > Attached a new version rebased over c8f78b616, with the grouping
> > > relation partitioning enhancements as a separate patch 0001.  Sorry
> > > about the delay.
> >
> > I had a quick look at this and wondered if the partitioned table's
> > parallel workers shouldn't be limited to the sum of the parallel
> > workers of the Append's subpaths?
> >
> > It seems a bit weird to me that the following case requests 4 workers:
> >
> > # create table lp (a int) partition by list(a);
> > # create table lp1 partition of lp for values in(1);
> > # insert into lp select 1 from generate_series(1,1000) x;
> > # alter table lp1 set (parallel_workers = 2);
> > # alter table lp set (parallel_workers = 4);
> > # set max_parallel_workers_per_Gather = 8;
> > # explain select count(*) from lp;
> > QUERY PLAN
> > ---
> >  Finalize Aggregate  (cost=97331.63..97331.64 rows=1 width=8)
> >->  Gather  (cost=97331.21..97331.62 rows=4 width=8)
> >  Workers Planned: 4
> >  ->  Partial Aggregate  (cost=96331.21..96331.22 rows=1 width=8)
> >->  Parallel Seq Scan on lp1 lp  (cost=0.00..85914.57
> > rows=4166657 width=0)
> > (5 rows)
> >
> > I can see a good argument that there should only be 2 workers here.
>
> Good point, I agree.
>
> > If someone sets the partitioned table's parallel_workers high so that
> > they get a large number of workers when no partitions are pruned
> > during planning, do they really want the same number of workers in
> > queries where a large number of partitions are pruned?
> >
> > This problem gets a bit more complex in generic plans where the
> > planner can't prune anything but run-time pruning prunes many
> > partitions. I'm not so sure what to do about that, but the problem
> > does exist today to a lesser extent with the current method of
> > determining the append parallel workers.
>
> Also a good point.  That would require changing the actual number of
> parallel workers at execution time, but that is tricky.
> If we go with your suggestion above, we'd have to disambiguate if
> the number of workers is set because a partition is large enough
> to warrant a parallel scan (then it shouldn't be reduced if the executor
> prunes partitions) or if it is because of the number of partitions
> (then it should be reduced).

Maybe we really want a parallel_append_workers for partitioned tables,
instead of piggybacking on parallel_workers?

> I don't know if Seamus is still working on that; if not, we might
> mark it as "returned with feedback".

I have to agree given the time left.

> Perhaps Amit's patch 0001 should go in independently.

Perhaps, but maybe we should wait until something really needs that.

--
Amit Langote
EDB: http://www.enterprisedb.com




Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)

2021-04-06 Thread Amit Langote
On Tue, Apr 6, 2021 at 8:34 AM Fujii Masao  wrote:
> For now I have no objection to this feature.
>
> -IMPORT FOREIGN SCHEMA import_source EXCEPT (t1, "x 4", nonesuch)
> +IMPORT FOREIGN SCHEMA import_source EXCEPT (t1, "x 4", nonesuch, t4_part)
>
> Isn't it better to create also another partition like "t4_part2"?
> If we do this, for example, the above test can confirm that both
> partitions in EXCEPT and not in are excluded.
>
> +All tables or foreign tables which are partitions of some other table
> +are automatically excluded from 
> +unless they are explicitly included in the LIMIT TO
>
> IMO it's better to document that partitions are imported when they are
> included in LIMIT TO, instead. What about the following?
>
>  Tables or foreign tables which are partitions of some other table are
>  imported only when they are explicitly specified in
>  LIMIT TO clause.  Otherwise they are automatically
>  excluded from .
>
> +clause.  Since all data can be accessed through the partitioned table
> +which is the root of the partitioning hierarchy, this approach should
> +allow access to all the data without creating extra objects.
>
> Now "this approach" in the above is not clear? What about replacing it with
> something like "importing only partitioned tables"?

+1, that wording is better.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)

2021-04-06 Thread Amit Langote
Hi Matthias,

On Wed, Mar 24, 2021 at 9:23 PM Matthias van de Meent
 wrote:
> On Mon, 22 Mar 2021 at 21:16, Bernd Helmle  wrote:
> > The patch changes IMPORT FOREIGN SCHEMA to explicitely allow partition
> > child tables in the LIMIT TO clause of the IMPORT FOREIGN SCHEMA
> > command by relaxing the checks introduced with commit [1]. The reason
> > behind [1] are discussed in [2].
>
> I should've included potentially interested parties earlier, but never
> too late. Stephen, Michael, Amit, would you have an opinion on lifting
> this restriction for the LIMIT TO clause, seeing your involvement in
> the implementation of removing partitions from IFS?

Sorry that I'm replying to this a bit late.

> > So the original behavior this patch wants to address was done
> > intentionally, so what needs to be discussed here is whether we want to
> > relax that a little. One argument for the original behavior since then
> > was that it is cleaner to just automatically import the parent, which
> > allows access to the childs through the foreign table anways and
> > exclude partition childs when querying pg_class.
>
> Yes, but it should be noted that the main reason that was mentioned as
> a reason to exclude partitions is to not cause table catalog bloat,
> and I argue that this argument is not as solid in the case of the
> explicitly named tables of the LIMIT TO clause. Except if SQL standard
> prescribes otherwise, I think allowing partitions in LIMIT TO clauses
> is an improvement overall.
>
> > I haven't seen demand for the implemented feature here myself, but i
> > could imagine use cases where just a single child or a set of child
> > tables are candidates. For example, i think it's possible that users
> > can query only specific childs and want them to have imported on
> > another foreign server.
>
> I myself have had this need, in that I've had to import some
> partitions manually as a result of this limitation. IMPORT FORAIGN
> SCHEMA really is great when it works, but limitations like these are
> crippling for some more specific use cases (e.g. allowing
> long-duration read-only access to one partition in the partition tree
> while also allowing the partition layout of the parents to be
> modified).

FWIW, I agree that it would be nice to have this.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




RE: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread osumi.takami...@fujitsu.com
On Tuesday, April 6, 2021 3:24 PM Kyotaro Horiguchi  
wrote:
> At Tue, 6 Apr 2021 04:11:35 +, "osumi.takami...@fujitsu.com"
>  wrote in
> > On Tuesday, April 6, 2021 9:41 AM Fujii Masao
> > 
> > > On 2021/04/05 23:54, osumi.takami...@fujitsu.com wrote:
> > > >> This makes me think that we should document this risk Thought?
> > > > +1. We should notify the risk when user changes
> > > > the wal_level higher than minimal to minimal to invoke a
> > > > carefulness of user for such kind of operation.
> > >
> > > I removed the HINT message "or recover to the point in ..." and
> > > added the following note into the docs.
> > >
> > >  Note that changing wal_level to
> > >  minimal makes any base backups taken before
> > >  unavailable for archive recovery and standby server, which may
> > >  lead to database loss.
> > Thank you for updating the patch. Let's make the sentence more strict.
> >
> > My suggestion for this explanation is
> > "In order to prevent database corruption, changing wal_level to
> > minimal from higher level in the middle of WAL archiving requires
> > careful attention. It makes any base backups taken before the
> > operation unavailable for archive recovery and standby server. Also,
> > it may lead to whole database loss when archive recovery fails with an
> > error for that change.
> > Take a new base backup immediately after making wal_level back to higher
> level."
> 
> The first sentense looks like somewhat nanny-ish.  The database is not
> corrupt at the time of this error. 
Yes. Excuse me for misleading sentence.
I just wanted to write why the error was introduced,
but it was not necessary.
We should remove and fix the first part of the sentence.

> We just lose updates after the last read
> segment at this point.  As Fujii-san said, we can continue recoverying using
> crash recovery and we will reach having a corrupt database after that.
OK. Thank you for explanation.


> About the last sentence, I prefer more flat wording, such as "You need to take
> a new base backup..."
Either is fine to me.

> > Then, we can be consistent with our new hint message, "Use a backup
> > taken after setting wal_level to higher than minimal.".
> 
> > Is it better to add something similar to "Take an offline backup when
> > you stop the server and change the wal_level" around the end of this part as
> another option for safeguard, also?
> 
> Backup policy is completely a matter of DBAs. 
OK. No problem. No need to add it.

> If flipping wal_level alone
> highly causes unstartable corruption,,, I think it is a bug.
> > For the performance technique part, what we need to explain is same.
>
> Might be good, but in simpler wording.
Yeah, I agree.
 
> > Another minor thing I felt we need to do might be to add double quotes to
> wrap minimal in errhint.
> 
> Since the error about hot_standby has gone, either will do for me.
Thanks for sharing your thoughts.


Best Regards,
Takamichi Osumi





Re: Replication slot stats misgivings

2021-04-06 Thread Amit Kapila
On Mon, Apr 5, 2021 at 8:51 PM vignesh C  wrote:
>

Few comments on the latest patches:
Comments on 0001

1.
@@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
TransactionId xid, bool create,
  dlist_push_tail(>toplevel_by_lsn, >node);
  AssertTXNLsnOrder(rb);
  }
+
+ rb->totalTxns++;
  }
  else
  txn = NULL; /* not found and not asked to create */
@@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
  {
  txn->size += sz;
  rb->size += sz;
+ rb->totalBytes += sz;

I think this will include the txns that are aborted and for which we
don't send anything. It might be better to update these stats in
ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
have sent the data. We can probably use size/total_size in txn. We
need to be careful to not double include the totaltxn or totalBytes
for streaming xacts as we might process the same txn multiple times.

2.
+Amount of decoded transactions data sent to the decoding output plugin
+while decoding the changes from WAL for this slot. This and total_txns
+for this slot can be used to gauge the total amount of data during
+logical decoding.

I think we can slightly modify the second line here: "This can be used
to gauge the total amount of data sent during logical decoding.". Why
we need to include total_txns along with it.

0002
--
3.
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 5 LOOP
+
...
...
+
+-- wait a little
+perform pg_sleep_for('100 milliseconds');

I think this loop needs to be executed 300 times instead of 5 times,
if the above comments and code needs to do what is expected here?


4.
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node->safe_psql('postgres', "SELECT
pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node->safe_psql('postgres',
+ "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+ FROM pg_stat_replication_slots ORDER BY slot_name"

Various comments in the 0002 refer to publisher/subscriber which is
not what we are using here.

5.
+# Create table.
+$node->safe_psql('postgres',
+"CREATE TABLE test_repl_stat(col1 int)");
+$node->safe_psql('postgres',
+"SELECT data FROM
pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+"SELECT data FROM
pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+"SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+"SELECT data FROM
pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

I think we can save the above calls to pg_logical_slot_get_changes if
we create table before creating the slots in this test.

0003
-
6. In the tests/code, publisher is used at multiple places. I think
that is not required because this can happen via plugin as well.
7.
+ if (max_replication_slots == nReplSlotStats)
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots",
+ NameStr(replSlotStats[nReplSlotStats].slotname;
+ memset([nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));

Do we need memset here? Isn't this location is past the max location?


-- 
With Regards,
Amit Kapila.




Re: Stronger safeguard for archive recovery not to miss data

2021-04-06 Thread Kyotaro Horiguchi
At Tue, 6 Apr 2021 04:11:35 +, "osumi.takami...@fujitsu.com" 
 wrote in 
> On Tuesday, April 6, 2021 9:41 AM Fujii Masao 
> > On 2021/04/05 23:54, osumi.takami...@fujitsu.com wrote:
> > >> This makes me think that we should document this risk Thought?
> > > +1. We should notify the risk when user changes
> > > the wal_level higher than minimal to minimal to invoke a carefulness
> > > of user for such kind of operation.
> > 
> > I removed the HINT message "or recover to the point in ..." and added the
> > following note into the docs.
> > 
> >  Note that changing wal_level to
> >  minimal makes any base backups taken before
> >  unavailable for archive recovery and standby server, which may
> >  lead to database loss.
> Thank you for updating the patch. Let's make the sentence more strict.
> 
> My suggestion for this explanation is
> "In order to prevent database corruption, changing
> wal_level to minimal from higher level in the middle of
> WAL archiving requires careful attention. It makes any base backups
> taken before the operation unavailable for archive recovery
> and standby server. Also, it may lead to whole database loss when
> archive recovery fails with an error for that change.
> Take a new base backup immediately after making wal_level back to higher 
> level."

The first sentense looks like somewhat nanny-ish.  The database is not
corrupt at the time of this error. We just lose updates after the last
read segment at this point.  As Fujii-san said, we can continue
recoverying using crash recovery and we will reach having a corrupt
database after that.

About the last sentense, I prefer more flat wording, such as "You need
to take a new base backup..."

> Then, we can be consistent with our new hint message,
> "Use a backup taken after setting wal_level to higher than minimal.".
> 
> Is it better to add something similar to "Take an offline backup when you 
> stop the server
> and change the wal_level" around the end of this part as another option for 
> safeguard, also?

Backup policy is completely a matter of DBAs.  If flipping wal_level
alone highly causes unstartable corruption,,, I think it is a bug.

> For the performance technique part, what we need to explain is same.

Might be good, but in simpler wording.

> Another minor thing I felt we need to do might be to add double quotes to 
> wrap minimal in errhint.

Since the error about hot_standby has gone, either will do for me.

> Other errhints do so when we use it in a sentence.
> 
> There is no more additional comment from me !

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: policies with security definer option for allowing inline optimization

2021-04-06 Thread Noah Misch
On Mon, Apr 05, 2021 at 07:51:46PM -0700, Dan Lynch wrote:
> > > I suppose if the
> > > get_group_ids_of_current_user() function is marked as STABLE, would the
> > > optimizer cache this value for every row in a SELECT that returned
> > > multiple rows?
> >
> > While there was a patch to implement caching, it never finished.  The
> > optimizer is allowed to, and sometimes does, choose plan shapes that reduce
> > the number of function calls.
> 
> So for multiple rows, it's possible that the same query could happen for
> each row? Even if it's clearly stable and only a read operation is
> happening?

Yes.  The caching patch thread gives some example queries:
https://postgr.es/m/flat/CABRT9RA-RomVS-yzQ2wUtZ%3Dm-eV61LcbrL1P1J3jydPStTfc6Q%40mail.gmail.com

> I suppose if the possibility exists that this could happen, perhaps using
> RLS for selects is not quite "production ready"?

I would not draw that conclusion.

> Or perhaps if the RLS
> qual/check is written well-enough, then maybe the performance hit wouldn't
> be noticed?

Yes.




RE: Table refer leak in logical replication

2021-04-06 Thread tanghy.f...@fujitsu.com
On Tuesday, April 6, 2021 2:25 PM Amit Langote  wrote:
>While updating the patch to do so, it occurred to me that maybe we
>could move the ExecInitResultRelation() call into
>create_estate_for_relation() too, in the spirit of removing
>duplicative code.  See attached updated patch.

Thanks for your fixing. The code LGTM.
Made a confirmation right now, the problem has been solved after applying your 
patch.

Regards,
Tang