Re: missing GRANT on pg_subscription columns

2021-06-27 Thread vignesh C
On Mon, Jun 7, 2021 at 2:38 PM Amit Kapila  wrote:
>
> On Thu, Jun 3, 2021 at 10:39 PM Tom Lane  wrote:
> >
> > "Euler Taveira"  writes:
> > > I was checking the GRANT on pg_subscription and noticed that the command 
> > > is not
> > > correct. There is a comment that says "All columns of pg_subscription 
> > > except
> > > subconninfo are readable". However, there are columns that aren't 
> > > included: oid
> > > and subsynccommit. It seems an oversight in the commits 6f236e1eb8c and
> > > 887227a1cc8.
> >
> > Ugh.
> >
> > > There are monitoring tools and data collectors that aren't using a
> > > superuser to read catalog information (I usually recommend using 
> > > pg_monitor).
> > > Hence, you cannot join pg_subscription with relations such as
> > > pg_subscription_rel or pg_stat_subscription because column oid has no
> > > column-level privilege. I'm attaching a patch to fix it (indeed, 2 patches
> > > because of additional columns for v14). We should add instructions in the 
> > > minor
> > > version release notes too.
> >
> > I agree with fixing this in HEAD.  But given that this has been wrong
> > since v10 with zero previous complaints, I doubt that it is worth the
> > complication of trying to do something about it in the back branches.
> > Maybe we could just adjust the docs there, instead.
> >
>
> This sounds reasonable to me. Euler, can you provide the doc updates
> for back-branches?

Attached patch has the documentation changes for the back-branches. As
there is no specific reason for this, I have just mentioned
"Additionally normal users can't access columns oid and
subsynccommit." The same patch applies till V10 branch.

Regards,
Vignesh
From 37f9fc48baa2c233d16ee8ac1e8547680cd05b04 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Mon, 28 Jun 2021 10:06:58 +0530
Subject: [PATCH v1] Documentation for normal users not having permission for
 columns oid and subsynccommit in pg_subscription catalog table.

Documentation for normal users not having permission for columns oid
and subsynccommit in pg_subscription catalog table.
---
 doc/src/sgml/catalogs.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 4dff3f60a2..f6b5c2e562 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7457,7 +7457,9 @@ SCRAM-SHA-256$iteration count:
 
   
Access to the column subconninfo is revoked from
-   normal users, because it could contain plain-text passwords.
+   normal users, because it could contain plain-text passwords. Additionally
+   normal users can't access columns oid and
+   subsynccommit.
   
 
   
-- 
2.25.1



Re: Optionally automatically disable logical replication subscriptions on error

2021-06-27 Thread Masahiko Sawada
On Mon, Jun 21, 2021 at 11:26 AM Mark Dilger
 wrote:
>
>
>
> > On Jun 20, 2021, at 7:17 PM, Masahiko Sawada  wrote:
> >
> > I will submit the patch.
>
> Great, thanks!

I've submitted the patches on that thread[1]. There are three patches:
skipping the transaction on the subscriber side, reporting error
details in the errcontext, and reporting the error details to the
stats collector. Feedback is very welcome.

[1] 
https://www.postgresql.org/message-id/CAD21AoBU4jGEO6AXcykQ9y7tat0RrB5--8ZoJgfcj%2BLPs7nFZQ%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-06-27 Thread Masahiko Sawada
On Thu, Jun 17, 2021 at 6:20 PM Masahiko Sawada  wrote:
>
> On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada  wrote:
> >
> > > Now, if this function is used by super
> > > users then we can probably trust that they provide the XIDs that we
> > > can trust to be skipped but OTOH making a restriction to allow these
> > > functions to be used by superusers might restrict the usage of this
> > > repair tool.
> >
> > If we specify the subscription id or name, maybe we can allow also the
> > owner of subscription to do that operation?
>
> Ah, the owner of the subscription must be superuser.

I've attached PoC patches.

0001 patch introduces the ability to skip transactions on the
subscriber side. We can specify XID to the subscription by like ALTER
SUBSCRIPTION test_sub SET SKIP TRANSACTION 100. The implementation
seems straightforward except for setting origin state. After skipping
the transaction we have to update the session origin state so that we
can start streaming the transaction next to the one that we just
skipped in case of the server crash or restarting the apply worker. We
set origin state to the commit WAL record. However, since we skip all
changes we don’t write any WAL even if we call CommitTransaction() at
the end of the skipped transaction. So the patch sets the origin state
to the transaction that updates the pg_subscription system catalog to
reset the skip XID. I think we need a discussion of this part.

With 0002 and 0003 patches, we report the error information in server
logs and the stats view, respectively. 0002 patch adds errcontext for
messages that happened during applying the changes:

ERROR:  duplicate key value violates unique constraint "hoge_pkey"
DETAIL:  Key (c)=(1) already exists.
CONTEXT:  during apply of "INSERT" for relation "public.hoge" in
transaction with xid 736 committs 2021-06-27 12:12:30.053887+09

0003 patch adds pg_stat_logical_replication_error statistics view
discussed on another thread[1]. The apply worker sends the error
information to the stats collector if an error happens during applying
changes. We can check those errors as follow:

postgres(1:25250)=# select * from pg_stat_logical_replication_error;
 subname  | relid | action | xid | last_failure
--+---++-+---
 test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
(1 row)

I added only columns required for the skipping transaction feature to
the view for now.

Please note that those patches are meant to evaluate the concept we've
discussed so far. Those don't have the doc update yet.

Regards,

[1] 
https://www.postgresql.org/message-id/DB35438F-9356-4841-89A0-412709EBD3AB%40enterprisedb.com


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v1-0003-Add-pg_stat_logical_replication_error-statistics-.patch
Description: Binary data


v1-0002-Add-errcontext-to-errors-of-the-applying-logical-.patch
Description: Binary data


v1-0001-Add-ALTER-SUBSCRIPTION-SET-SKIP-TRANSACTION.patch
Description: Binary data


Re: Deadlock risk while inserting directly into partition?

2021-06-27 Thread Amit Langote
On Thu, Jun 24, 2021 at 7:27 AM David Rowley  wrote:
> On Wed, 23 Jun 2021 at 21:07, Amit Kapila  wrote:
> > I noticed that while inserting directly into a partition table we
> > compute the PartitionCheckExpr by traversing all the parent partitions
> > via 
> > ExecPartitionCheck()->RelationGetPartitionQual()->generate_partition_qual().
> > We take AccessShareLock on parent tables while generating qual.
> >
> > Now, on the other hand, while dropping constraint on a partitioned
> > table, we take the lock from parent to all the child tables.
> >
> > I think taking locks in opposite directions can lead to deadlock in
> > these operations.
>
> I wonder if it's possible to do any better here?  Surely when
> traversing from child to parent we must lock the child before checking
> what the parent relation is.

I remember there was a discussion where I proposed to document the
deadlock hazard that exists when performing DML directly on
partitions.  The proposal didn't get enough attention, perhaps because
it was in the middle of a long reply about other concerns:

https://www.postgresql.org/message-id/16db1458-67cf-4add-736e-31b053115e8e%40lab.ntt.co.jp

Maybe a good idea to add a line or 2 in 5.11. Table Partitioning?

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: PQconnectdb/PQerrorMessage changed behavior on master

2021-06-27 Thread Alexander Lakhin
27.06.2021 23:07, Tom Lane wrote:
>> While trying to use sqlsmith with postgres compiled from the master
>> branch, I've found that the PQerrorMessage() function now returns
>> non-informational but not empty error message after the successful
>> PQconnectdb() call.
> Yeah, see thread here:
>
> https://www.postgresql.org/message-id/20210506162651.GJ27406%40telsasoft.com
>
> sqlsmith is definitely Doing It Wrong there, but there's a
> reasonable question whether such coding is common.
Thanks for info! I agree that sqlsmith's check is incorrect, nonetheless
I was embarrassed by the incomplete error message.

Best regards,
Alexander




Re: Deadlock risk while inserting directly into partition?

2021-06-27 Thread Amit Langote
On Fri, Jun 25, 2021 at 10:26 AM David Rowley  wrote:
> On Thu, 24 Jun 2021 at 12:32, David Rowley  wrote:
> > The overhead of taking these locks is pretty significant for
> > partitioned tables with lots of partitions where only 1 of them
> > survives run-time partition pruning.  That's really terrible for
> > people that want to PREPARE queries and just look up a single row from
> > a single partition.  That seems like a pretty big use case that we're
> > just terrible at today.
>
> I wonder, since we can't delay taking locks until after run-time
> pruning due to being unable to invalidate cached plans, maybe instead
> we could tag on any PartitionPruneInfo onto the PlannedStmt itself and
> do the init plan run-time prune run during AcquireExecutorLocks().

This is exactly what I was mulling doing when working on [1] some last
year, after an off-list discussion with Robert (he suggested the idea
IIRC), though I never quite finished writing a patch.  I have planned
to revisit this topic ("locking overhead in generic plans") for v15,
now that we have *some* proposals mentioned in [1] committed to v14,
so can look into this.

> A lock would need to be taken on each partitioned table before we
> prune for it. So if there was multi-level partitioning, we'd need to
> lock the partitioned table, do pruning for that partitioned table,
> then lock any sub-partitioned tables before doing pruning on those.
>
> I don't immediately see why it couldn't be made to work, it's just
> that it adds quite a lot of complexity to what's being done in
> AcquireExecutorLocks(), which today is a very simple function.

Yeah, AcquireExecutorLocks()'s current method of finding the set of
relations to lock is very simple -- just scan the range table
(PlannedStmt.rtable).  If we're to remove prunable leaf partitions
from that set, maybe we'd have to find a way to remove them from
PlannedStmt.rtable as part of running the "init" pruning, which we'd
have to do anyway, because perhaps the executor proper (mainly
InitPlan) should also see the shrunken version of the range table.
Not to mention the complexity of getting the "init" pruning itself to
run outside a full-blown executor context.

Anyway, do you agree with starting a thread to discuss possible
approaches to attack this?

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] 
https://www.postgresql.org/message-id/ca+hiwqg7zrubmmih3wpsbz4s0h2ehywrnxeducky5hr3fwz...@mail.gmail.com




Re: fdatasync performance problem with large number of DB files

2021-06-27 Thread Thomas Munro
On Tue, Jun 22, 2021 at 5:01 PM Thomas Munro  wrote:
> On Fri, Jun 18, 2021 at 1:11 PM Justin Pryzby  wrote:
> > Thomas, could you comment on this ?
>
> Sorry, I missed that.  It is initially a confusing proposal, but after
> trying it out (that is: making recovery_init_sync_method PGC_SIGHUP
> and testing a scenario where you want to make the next crash use it
> that way and without the change), I agree.  +1 from me.

... and pushed.




Re: Fix uninitialized copy_data var (src/backend/commands/subscriptioncmds.c)

2021-06-27 Thread Michael Paquier
On Mon, Jun 28, 2021 at 10:17:55AM +1000, Peter Smith wrote:
> IIUC for the case ALTER_SUBSCRIPTION_DROP_PUBLICATION it looks like
> the uninitialized copy_data local stack var would remain uninitialized
> (undefined) still at the time it is passed at
> AlterSubscription_refresh(sub, copy_data);

Yes, that's wrong.  AlterSubscription_refresh() would happily look at
this uninitialized value when performing a refresh with this command.
That's the only code path using parse_subscription_options() with this
pattern.  Applied on HEAD.
--
Michael


signature.asc
Description: PGP signature


Re: Farewell greeting

2021-06-27 Thread Tatsuro Yamada

Hi Tsunakawa-san,

On 2021/06/27 16:41, tsunakawa.ta...@fujitsu.com wrote:

I'm moving to another company from July 1st.  I may possibly not be allowed to 
do open source activities there, so let me say goodbye once.  Thank you all for 
your help.  I really enjoyed learning PostgreSQL and participating in its 
development.

It's a pity that I may not be able to part of PostgreSQL's great history until 
it becomes the most popular database (in the DB-Engines ranking.)  However, if 
possible, I'd like to come back as just MauMau.

I hope you all will succeed.



Good luck in your future career! :-D
See you at the Japanese PG developers' meetup.


Thanks,
Tatsuro Yamada





Re: Farewell greeting

2021-06-27 Thread Michael Paquier
On Sun, Jun 27, 2021 at 07:41:19AM +, tsunakawa.ta...@fujitsu.com wrote:
> It's a pity that I may not be able to part of PostgreSQL's great
> history until it becomes the most popular database (in the
> DB-Engines ranking.)  However, if possible, I'd like to come back as
> just MauMau.

This is an open community.  So there would be nothing preventing you
to come back.  Good luck on your next position and all the best.
--
Michael


signature.asc
Description: PGP signature


Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...

2021-06-27 Thread Justin Pryzby
I've just realized that this VM has a strange storage configuration.

It's using LVM thin pools, which I don't use anywhere else.
Someone else set this up, and I think I've literally never used pools before.
Some time ago, the pool ran out of space, and I ran LVM repair on it.
It seems very possible that's the issue.
A latent problem might've been tickled by pg_upgrade --link.

That said, the relevant table is the active "alarms" table, and it would've
gotten plenty of DML with no issue for months running v13.

Feel free to dismiss this report if it seems dubious.

-- 
Justin




RE: Farewell greeting

2021-06-27 Thread kato-...@fujitsu.com
>I'm moving to another company from July 1st.  I may possibly not be allowed to 
>do open source activities there, so let me say >goodbye once.  Thank you all 
>for your help.  I really enjoyed learning PostgreSQL and participating in its 
>development.

Thank you for everything, Tsunakawa-san.
Good luck !!!

Regards
Sho Kato
-Original Message-
From: tsunakawa.ta...@fujitsu.com  
Sent: Sunday, June 27, 2021 4:41 PM
To: pgsql-hackers@lists.postgresql.org
Cc: MauMau 
Subject: Farewell greeting

Hello all,


I'm moving to another company from July 1st.  I may possibly not be allowed to 
do open source activities there, so let me say goodbye once.  Thank you all for 
your help.  I really enjoyed learning PostgreSQL and participating in its 
development.

It's a pity that I may not be able to part of PostgreSQL's great history until 
it becomes the most popular database (in the DB-Engines ranking.)  However, if 
possible, I'd like to come back as just MauMau.

I hope you all will succeed.


Regards
Takayuki Tsunakawa







Re: What is "wraparound failure", really?

2021-06-27 Thread Thomas Munro
On Mon, Jun 28, 2021 at 8:36 AM Peter Geoghegan  wrote:
> "The sole disadvantage of increasing autovacuum_freeze_max_age (and
> vacuum_freeze_table_age along with it) is that the pg_xact and
> pg_commit_ts subdirectories of the database cluster will take more
> space..."

Just by the way, if we're updating this sentence, it continues
"because it must store..." but it should surely be "because they must
store...".




Re: Fix uninitialized copy_data var (src/backend/commands/subscriptioncmds.c)

2021-06-27 Thread Peter Smith
On Fri, Jun 25, 2021 at 11:55 PM Ranier Vilela  wrote:
>
>
> https://github.com/postgres/postgres/commit/3af10943ce21450e299b3915b9cad47cd90369e9
> fixes some issues with subscriptioncmds.c, but IMHO still lack this issue.
>

I have not tested this, and gcc gave no warnings about it, but just by
visual code inspection I do agree with you that this looks like a
problem, even in the latest code.

IIUC for the case ALTER_SUBSCRIPTION_DROP_PUBLICATION it looks like
the uninitialized copy_data local stack var would remain uninitialized
(undefined) still at the time it is passed at
AlterSubscription_refresh(sub, copy_data);

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: What is "wraparound failure", really?

2021-06-27 Thread Andrew Dunstan


On 6/27/21 4:36 PM, Peter Geoghegan wrote:
> The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal
> documentation -- just a basic description of how the GUCs work. I
> think that it certainly merits some discussion under "25.1. Routine
> Vacuuming" -- more specifically under "25.1.5. Preventing Transaction
> ID Wraparound Failures". One reason why this didn't happen in the
> original commit was that I just didn't know where to start with it.
> The docs in question have said this since 2006's commit 48188e16 first
> added autovacuum_freeze_max_age:
>
> "The sole disadvantage of increasing autovacuum_freeze_max_age (and
> vacuum_freeze_table_age along with it) is that the pg_xact and
> pg_commit_ts subdirectories of the database cluster will take more
> space..."
>
> This sentence seems completely unreasonable to me. It seems to just
> ignore the huge disadvantage of increasing autovacuum_freeze_max_age:
> the *risk* that the system will stop being able to allocate new XIDs
> because GetNewTransactionId() errors out with "database is not
> accepting commands to avoid wraparound data loss...". Sure, it's
> possible to take a lot of risk here without it ever blowing up in your
> face. And if it doesn't blow up then the downside really is zero. This
> is hardly a sensible way to talk about this important risk. Or any
> risk at all.
>
> At first I thought that the sentence was not just misguided -- it
> seemed downright bizarre. I thought that it was directly at odds with
> the title "Preventing Transaction ID Wraparound Failures". I thought
> that the whole point of this section was how not to have a wraparound
> failure (as I understand the term), and yet we seem to deliberately
> ignore the single most important practical aspect of making sure that
> that doesn't happen. But I now suspect that the basic definitions have
> been mixed up in a subtle but important way.
>
> What the documentation calls a "wraparound failure" seems to be rather
> different to what I thought that that meant. As I said, I thought that
> that meant the condition of being unable to get new transaction IDs
> (at least until the DBA runs VACUUM in single user mode). But the
> documentation in question seems to actually define it as "the
> condition of an old MVCC snapshot failing to see a version from the
> distant past, because somehow an XID wraparound suddenly makes it look
> as if it's in the distant future rather than in the past". It's
> actually talking about a subtly different thing, so the "sole
> disadvantage" sentence is not actually bizarre. It does still seem
> impractical and confusing, though.
>
> I strongly suspect that my interpretation of what "wraparound failure"
> means is actually the common one. Of course the system is never under
> any circumstances allowed to give totally wrong answers to queries, no
> matter what -- users should be able to take that much for granted.
> What users care about here is sensibly managing XIDs as a resource --
> preventing "XID exhaustion" while being conservative, but not
> ridiculously conservative. Could the documentation be completely
> misleading users here?
>
> I have two questions:
>
> 1. Do I have this right? Is there really confusion about what a
> "wraparound failure" means, or is the confusion mine alone?
>
> 2. How do I go about integrating discussion of the failsafe here?
> Anybody have thoughts on that?
>


AIUI, actual wraparound (i.e. an xid crossing the event horizon so it
appears to be in the future) is no longer possible. But it once was a
very real danger. Maybe the docs haven't quite caught up.


In practical terms, there is an awful lot of head room between the
default for autovacuum_freeze_max_age and any danger of major
anti-wraparound measures. Say you increase it to 1bn from the default
200m. That still leaves you ~1bn transactions of headroom.


cheers


andrew




--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...

2021-06-27 Thread Justin Pryzby
On Sun, Jun 27, 2021 at 03:08:13PM -0700, Peter Geoghegan wrote:
> Can you please amcheck all of the indexes?

ts=# SELECT bt_index_check('child.alarms_null_alarm_clear_time_idx'::regclass);
ERROR:  item order invariant violated for index 
"alarms_null_alarm_clear_time_idx"
DETAIL:  Lower index tid=(1,77) (points to heap tid=(29,9)) higher index 
tid=(1,78) (points to heap tid=(29,9)) page lsn=80/4B9C69D0.

ts=# SELECT itemoffset, ctid ,itemlen, nulls, vars, dead, htid FROM 
bt_page_items('child.alarms_null_alarm_clear_time_idx', 1);
 itemoffset |   ctid| itemlen | nulls | vars | dead |  htid   
+---+-+---+--+--+-
...
 77 | (29,9)|  16 | t | f| f| (29,9)
 78 | (29,9)|  16 | t | f| f| (29,9)

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, 
t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM 
heap_page_items(get_raw_page('child.alarms_null', 29));
 lp | lp_off | lp_flags | lp_len | t_xmin |  t_xmax  | t_field3 | t_ctid  | 
t_infomask2 | t_infomask | t_hoff |  t_bits  | 
t_oid 
++--+++--+--+-+-+++--+---
  1 |   6680 |1 |   1512 |  88669 | 27455486 |   44 | (29,1)  | 
   8225 |  10691 | 32 | 11101000 |  
  2 |  6 |2 |  0 ||  |  | | 
|||  |  
  3 |   5168 |1 |   1512 |  87374 | 27455479 |   37 | (29,3)  | 
   8225 |  10691 | 32 | 11101000 |  
  4 |   4192 |1 |976 | 148104 | 27574887 |0 | (29,4)  | 
   8225 |  10695 | 32 | 11101000 |  
  5 | 10 |2 |  0 ||  |  | | 
|||  |  
  6 |   3216 |1 |976 | 148137 | 27574888 |0 | (29,6)  | 
  40993 |  10695 | 32 | 11101000 |  
  7 |  8 |2 |  0 ||  |  | | 
|||  |  
  8 |   2240 |1 |976 |  47388 | 27574858 |7 | (29,8)  | 
  40993 |  10695 | 32 | 11101000 |  
  9 |  0 |3 |  0 ||  |  | | 
|||  |  
 10 |   1264 |1 |976 | 148935 | 27574889 |0 | (29,10) | 
  40993 |  10695 | 32 | 11101000 |  
 11 |  0 |3 |  0 ||  |  | | 
|||  |  
 12 |  0 |3 |  0 ||  |  | | 
|||  |  
(12 rows)

(gdb) fr 4
#4  0x00509a14 in _bt_insertonpg (rel=rel@entry=0x7f6dfd3cd628, 
itup_key=itup_key@entry=0x2011b40, buf=15, cbuf=cbuf@entry=0, 
stack=stack@entry=0x2011bd8, itup=0x2011c00, itup@entry=0x200d608, itemsz=16, 
newitemoff=2, postingoff=62, split_only_page=split_only_page@entry=false) 
at nbtinsert.c:1174
1174in nbtinsert.c
(gdb) p page
$5 = 0x7f6de58e0e00 "\200"
(gdb) dump binary memory /tmp/dump_block.page page (page + 8192)

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, 
t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM 
heap_page_items(pg_read_binary_file('/tmp/dump_block.page')) WHERE t_xmin IS 
NOT NULL;
 lp  | lp_off | lp_flags | lp_len | t_xmin  |   t_xmax   | t_field3 | t_ctid | 
t_infomask2 | t_infomask | t_hoff | t_bits | t_oid 
-++--++-++--++-++++---
   1 |   8152 |1 | 24 | 1048576 | 2685931521 |0 | (0,0)  |  
 0 |120 |  1 ||  
   2 |   7288 |1 |864 | 1048576 | 2740985997 |0 | (0,0)  |  
 0 |  1 |  0 ||  
  67 |   6368 |1 |920 | 1048576 | 2744656022 |0 | (0,0)  |  
33 |  4 |  0 ||  
 137 |   5056 |1 |   1312 | 1048576 | 2770346200 |0 | (0,0)  |  
69 |  6 |  0 ||  
 142 |   4608 |1 |448 | 1048576 | 2713722952 |0 | (0,0)  |  
   107 |  4 |  0 ||  
(5 rows)

I didn't change the kernel here, nor on the previous bug report - it was going
to be my "next step", until I found the 

Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...

2021-06-27 Thread Peter Geoghegan
Did you also change the kernel on upgrade? I recall that that was a factor
on the other recent bug thread.

Peter Geoghegan
(Sent from my phone)


Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...

2021-06-27 Thread Peter Geoghegan
Would also be good to get a raw binary copy of the page image in question.
Hopefully the data isn't confidential. Same gdb procedure as before.

Thanks

Peter Geoghegan
(Sent from my phone)


Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...

2021-06-27 Thread Peter Geoghegan
Can you please amcheck all of the indexes?

Peter Geoghegan
(Sent from my phone)


pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...

2021-06-27 Thread Justin Pryzby
This is crashing repeatedly during insert/update immediately after upgrading an
instance to v14, from v13.3.  In case it matters, the cluster was originally
initdb at 13.2.

TRAP: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", Line: 
1062, PID: 28580)
postgres: telsasoft ts 127.0.0.1(52250) 
INSERT(ExceptionalCondition+0x8d)[0x967d1d]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(_bt_swap_posting+0x2cd)[0x507cdd]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x509a14]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(_bt_doinsert+0xcb7)[0x50d0b7]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(btinsert+0x52)[0x5130f2]
postgres: telsasoft ts 127.0.0.1(52250) 
INSERT(ExecInsertIndexTuples+0x231)[0x687b81]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x6b8718]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x6b9297]
postgres: telsasoft ts 127.0.0.1(52250) 
INSERT(standard_ExecutorRun+0x142)[0x688b32]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x82da8a]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x82e673]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x82e936]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(PortalRun+0x2eb)[0x82ec8b]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(PostgresMain+0x1f97)[0x82c777]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x48f71a]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(PostmasterMain+0x1138)[0x794c98]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(main+0x6f2)[0x491292]

< 2021-06-27 23:46:43.257 CAT  >DETAIL:  Failed process was running: INSERT 
INTO alarms(...

#3  0x00507cdd in _bt_swap_posting (newitem=newitem@entry=0x2011c00, 
oposting=oposting@entry=0x7f6de58e2a78, postingoff=postingoff@entry=62) at 
nbtdedup.c:1062
nhtids = 
replacepos = 0x2011dac ""
nposting = 0x2011c28
__func__ = "_bt_swap_posting"
#4  0x00509a14 in _bt_insertonpg (rel=rel@entry=0x7f6dfd3cd628, 
itup_key=itup_key@entry=0x2011b40, buf=15, cbuf=cbuf@entry=0, 
stack=stack@entry=0x2011bd8, itup=0x2011c00, itup@entry=0x200d608, itemsz=16, 
newitemoff=2, postingoff=62, split_only_page=split_only_page@entry=false) 
at nbtinsert.c:1174
itemid = 0x7f6de58e0e1c
page = 0x7f6de58e0e00 "\200"
opaque = 0x7f6de58e2df0
isleaf = true
isroot = false
isrightmost = false
isonly = false
oposting = 0x7f6de58e2a78
origitup = 
nposting = 0x0
__func__ = "_bt_insertonpg"
#5  0x0050d0b7 in _bt_doinsert (rel=rel@entry=0x7f6dfd3cd628, 
itup=itup@entry=0x200d608, checkUnique=checkUnique@entry=UNIQUE_CHECK_NO, 
indexUnchanged=indexUnchanged@entry=false, 
heapRel=heapRel@entry=0x7f6dfd48ba80) at nbtinsert.c:257
newitemoff = 0
is_unique = false
insertstate = {itup = 0x200d608, itemsz = 16, itup_key = 0x2011b40, buf 
= 15, bounds_valid = true, low = 2, stricthigh = 3, postingoff = 62}
itup_key = 
checkingunique = 
#6  0x005130f2 in btinsert (rel=0x7f6dfd3cd628, values=, 
isnull=, ht_ctid=0x212e250, heapRel=0x7f6dfd48ba80, 
checkUnique=UNIQUE_CHECK_NO, indexUnchanged=false, 
indexInfo=0x200d2e8) at nbtree.c:199
result = 
itup = 0x200d608
#7  0x00687b81 in ExecInsertIndexTuples 
(resultRelInfo=resultRelInfo@entry=0x200cd90, slot=slot@entry=0x212e220, 
estate=estate@entry=0x212c6b0, update=update@entry=false, 
noDupErr=noDupErr@entry=false, 
specConflict=specConflict@entry=0x0, 
arbiterIndexes=arbiterIndexes@entry=0x0) at execIndexing.c:415


(gdb) p *newitem
$2 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 22}, ip_posid = 4}, t_info = 
32784}
(gdb) p *oposting
$3 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 16}, ip_posid = 8333}, t_info = 
41824}

I will save a copy of the data dir and see if reindexing helps.
Let me know if there's anything else I can provide.

-- 
Justin




What is "wraparound failure", really?

2021-06-27 Thread Peter Geoghegan
The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal
documentation -- just a basic description of how the GUCs work. I
think that it certainly merits some discussion under "25.1. Routine
Vacuuming" -- more specifically under "25.1.5. Preventing Transaction
ID Wraparound Failures". One reason why this didn't happen in the
original commit was that I just didn't know where to start with it.
The docs in question have said this since 2006's commit 48188e16 first
added autovacuum_freeze_max_age:

"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more
space..."

This sentence seems completely unreasonable to me. It seems to just
ignore the huge disadvantage of increasing autovacuum_freeze_max_age:
the *risk* that the system will stop being able to allocate new XIDs
because GetNewTransactionId() errors out with "database is not
accepting commands to avoid wraparound data loss...". Sure, it's
possible to take a lot of risk here without it ever blowing up in your
face. And if it doesn't blow up then the downside really is zero. This
is hardly a sensible way to talk about this important risk. Or any
risk at all.

At first I thought that the sentence was not just misguided -- it
seemed downright bizarre. I thought that it was directly at odds with
the title "Preventing Transaction ID Wraparound Failures". I thought
that the whole point of this section was how not to have a wraparound
failure (as I understand the term), and yet we seem to deliberately
ignore the single most important practical aspect of making sure that
that doesn't happen. But I now suspect that the basic definitions have
been mixed up in a subtle but important way.

What the documentation calls a "wraparound failure" seems to be rather
different to what I thought that that meant. As I said, I thought that
that meant the condition of being unable to get new transaction IDs
(at least until the DBA runs VACUUM in single user mode). But the
documentation in question seems to actually define it as "the
condition of an old MVCC snapshot failing to see a version from the
distant past, because somehow an XID wraparound suddenly makes it look
as if it's in the distant future rather than in the past". It's
actually talking about a subtly different thing, so the "sole
disadvantage" sentence is not actually bizarre. It does still seem
impractical and confusing, though.

I strongly suspect that my interpretation of what "wraparound failure"
means is actually the common one. Of course the system is never under
any circumstances allowed to give totally wrong answers to queries, no
matter what -- users should be able to take that much for granted.
What users care about here is sensibly managing XIDs as a resource --
preventing "XID exhaustion" while being conservative, but not
ridiculously conservative. Could the documentation be completely
misleading users here?

I have two questions:

1. Do I have this right? Is there really confusion about what a
"wraparound failure" means, or is the confusion mine alone?

2. How do I go about integrating discussion of the failsafe here?
Anybody have thoughts on that?

-- 
Peter Geoghegan




Re: Overflow hazard in pgbench

2021-06-27 Thread Tom Lane
I wrote:
> ... according to the C99
> spec this code is broken, because the compiler is allowed to assume
> that signed integer overflow doesn't happen, whereupon the second
> if-block is provably unreachable.  The failure still represents a gcc
> bug, because we're using -fwrapv which should disable that assumption.
> However, not all compilers have that switch, so it'd be better to code
> this in a spec-compliant way.

BTW, for grins I tried building today's HEAD without -fwrapv, using
gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC) 
which is the newest version I have at hand.  Not very surprisingly,
that reproduced the failure shown on moonjelly.  However, after adding
the patch I proposed, "make check-world" passed!  I was not expecting
that result; I supposed we still had lots of lurking assumptions of
traditional C overflow handling.

I'm not in any hurry to remove -fwrapv, because (a) this result doesn't
show that we have no such assumptions, only that they must be lurking
in darker, poorly-tested corners, and (b) I'm not aware of any reason
to think that removing -fwrapv would provide benefits worth taking any
risks for.  But we may be closer to being able to do without that
switch than I thought.

regards, tom lane




Re: PQconnectdb/PQerrorMessage changed behavior on master

2021-06-27 Thread Tom Lane
Alexander Lakhin  writes:
> While trying to use sqlsmith with postgres compiled from the master
> branch, I've found that the PQerrorMessage() function now returns
> non-informational but not empty error message after the successful
> PQconnectdb() call.

Yeah, see thread here:

https://www.postgresql.org/message-id/20210506162651.GJ27406%40telsasoft.com

sqlsmith is definitely Doing It Wrong there, but there's a
reasonable question whether such coding is common.

regards, tom lane




PQconnectdb/PQerrorMessage changed behavior on master

2021-06-27 Thread Alexander Lakhin
Hello,

While trying to use sqlsmith with postgres compiled from the master
branch, I've found that the PQerrorMessage() function now returns
non-informational but not empty error message after the successful
PQconnectdb() call.
    conn = PQconnectdb(conninfo.c_str());
    char *errmsg = PQerrorMessage(conn);
returns
'connection to server on socket "/tmp/ody8OuOaqV/.s.PGSQL.59860" failed: '

The affected sqlsmith code:
https://github.com/anse1/sqlsmith/blob/master/postgres.cc#L305

Best regards,
Alexander


Re: PITR promote bug: Checkpointer writes to older timeline

2021-06-27 Thread Tom Lane
I wrote:
> It sure looks like recovering a prepared
> transaction creates a transient state in which a new backend will
> compute a broken snapshot.

Oh, after further digging this is the same issue discussed here:

https://www.postgresql.org/message-id/flat/20210422203603.fdnh3fu2mmfp2iov%40alap3.anarazel.de

regards, tom lane




Re: PITR promote bug: Checkpointer writes to older timeline

2021-06-27 Thread Tom Lane
I wrote:
> Buildfarm member hornet just reported a failure in this test:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2021-06-27%2013%3A40%3A57
> It's not clear whether this is a problem with the test case or an
> actual server bug, but I'm leaning to the latter theory.  My gut
> feel is it's some problem in the "snapshot scalability" work.  It
> doesn't look the same as the known open issue, but maybe related?

Hmm, the plot thickens.  I scraped the buildfarm logs for similar-looking
assertion failures back to last August, when the snapshot scalability
patches went in.  The first such failure is not until 2021-03-24
(see attachment), and they all look to be triggered by
023_pitr_prepared_xact.pl.  It sure looks like recovering a prepared
transaction creates a transient state in which a new backend will
compute a broken snapshot.

regards, tom lane

sysname| branch |  snapshot   | stage | 

  l 
  
---++-+---+---
 calliphoridae | HEAD   | 2021-03-24 06:50:09 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: 
"/mnt/resource/andres/bf/calliphoridae/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/procarray.c",
 Line: 2463, PID: 3890215)
 francolin | HEAD   | 2021-03-29 16:21:58 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: 
"/mnt/resource/andres/bf/francolin/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/procarray.c",
 Line: 2463, PID: 1861665)
 moonjelly | HEAD   | 2021-04-01 15:25:38 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: "procarray.c", Line: 2463, PID: 2345153)
 francolin | HEAD   | 2021-04-07 12:30:08 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: 
"/mnt/resource/andres/bf/francolin/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/procarray.c",
 Line: 2468, PID: 3257637)
 fairywren | HEAD   | 2021-04-20 03:04:04 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: 
"C:/tools/msys64/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/procarray.c",
 Line: 2094, PID: 94824)
 mantid| HEAD   | 2021-04-25 10:07:06 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: "procarray.c", Line: 2094, PID: 2820886)
 thorntail | HEAD   | 2021-04-29 07:18:09 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: 
"/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/procarray.c",
 Line: 2094, PID: 3099560)
 mantid| HEAD   | 2021-05-03 13:07:06 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: "procarray.c", Line: 2094, PID: 1163004)
 mantid| HEAD   | 2021-05-10 01:07:07 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: "procarray.c", Line: 2468, PID: 2812704)
 hornet| HEAD   | 2021-06-27 13:40:57 | recoveryCheck | TRAP: 
FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", 
File: "procarray.c", Line: 2492, PID: 11862234)
(10 rows)



Re: PITR promote bug: Checkpointer writes to older timeline

2021-06-27 Thread Tom Lane
Michael Paquier  writes:
> I have been working on that over the last couple of days, and applied
> a fix down to 10.  One thing that I did not like in the test was the
> use of compare() to check if the contents of the WAL segment before
> and after the timeline jump remained the same as this would have been
> unstable with any concurrent activity.  Instead, I have added a phase
> at the end of the test with an extra checkpoint and recovery triggered
> once, which is enough to reproduce the PANIC reported at the top of
> the thread.

Buildfarm member hornet just reported a failure in this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2021-06-27%2013%3A40%3A57

the critical bit being

2021-06-27 17:35:46.504 UTC [11862234:1] [unknown] LOG:  connection received: 
host=[local]
2021-06-27 17:35:46.505 UTC [18350260:12] LOG:  recovering prepared transaction 
734 from shared memory
TRAP: FailedAssertion("TransactionIdPrecedesOrEquals(TransactionXmin, 
RecentXmin)", File: "procarray.c", Line: 2492, PID: 11862234)
2021-06-27 17:35:46.511 UTC [14876838:4] LOG:  database system is ready to 
accept connections

It's not clear whether this is a problem with the test case or an
actual server bug, but I'm leaning to the latter theory.  My gut
feel is it's some problem in the "snapshot scalability" work.  It
doesn't look the same as the known open issue, but maybe related?

regards, tom lane




PoC: using sampling to estimate joins / complex conditions

2021-06-27 Thread Tomas Vondra
Hi,

estimating joins is one of the significant gaps related to extended
statistics, and I've been regularly asked about what we might do about
that. This is an early experimental patch that I think might help us
with improving this, possible even in PG15.

Note: I do not claim this is exactly how it should be implemented, but
it's probably sufficient to demonstrate the pros/cons of various
alternative approaches, etc.

In short, the patch samples the tables and uses those samples to
estimate selectivity for scans and joins. The samples are collected
during planning, which may be quite expensive - random I/O for each
query, etc. It'd be possible to build them during analyze, but that'd
require solving serialization, tweak CREATE STATISTICS to handle join
queries, etc. I decided to keep the PoC simple.

It still uses CREATE STATISTICS with a new "sample" kind, instructing
the optimizer to use sampling when estimating clauses on the attributes.

A little example demonstrating what the patch does:

  create table t (a int, b int, c int);

  insert into t select mod(i,10), mod(i,20), mod(i,40)
from generate_series(1,1000) s(i);

  analyze t;

  -- estimate without any statistics / sampling
  explain analyze select * from t where a = 0 and b = 0 and c = 0;

 QUERY PLAN
  ---
   Seq Scan on t  (cost=0.00..229055.00 rows=1361 width=12)
  (actual time=0.025..761.571 rows=25 loops=1)
 Filter: ((a = 0) AND (b = 0) AND (c = 0))
 Rows Removed by Filter: 975
   Planning Time: 0.471 ms
   Execution Time: 901.182 ms
  (5 rows)

  -- enable sampling on those columns
  create statistics s (sample) on a, b, c from t;

  explain analyze select * from t where a = 0 and b = 0 and c = 0;

 QUERY PLAN
  ---
   Seq Scan on t  (cost=0.00..229055.00 rows=250390 width=12)
  (actual time=0.307..717.937 rows=25 loops=1)
 Filter: ((a = 0) AND (b = 0) AND (c = 0))
 Rows Removed by Filter: 975
   Planning Time: 194.528 ms
   Execution Time: 851.832 ms
  (5 rows)

Of course, in this case a MCV would work well too, because there are
very few combinations in (a,b,c) - a sample would work even when that's
not the case, and it has various other benefits (can estimate almost any
expression while MCV supports only a subset, etc.)

Now, let's look at a join between a fact and a dimension table:

  create table f (d1 int, d2 int, f1 int, f2 int, f3 int);

  create table d (d1 int, d2 int, d3 int, d4 int, d5 int,
  primary key (d1, d2));

  insert into d select i, i, mod(i,100), mod(i,100), mod(i,100)
from generate_series(0,999) s(i);

  insert into f select mod(i,1000), mod(i,1000), mod(i,100), mod(i,100),
   mod(i,100) from generate_series(1,100) s(i);

  analyze f, d;

  explain analyze select * from f join d using (d1,d2)
where f1 < 50 and f2 < 50 and d3 < 50 and d4 < 50;

QUERY PLAN
  --
   Hash Join  (cost=25.75..22717.01 rows=63 width=32)
  (actual time=3.197..861.899 rows=50 loops=1)
 Hash Cond: ((f.d1 = d.d1) AND (f.d2 = d.d2))
 ->  Seq Scan on f  (cost=0.00..21370.00 rows=251669 width=20)
(actual time=0.033..315.401 rows=50 loops=1)
   Filter: ((f1 < 50) AND (f2 < 50))
   Rows Removed by Filter: 50
 ->  Hash  (cost=22.00..22.00 rows=250 width=20)
   (actual time=3.139..3.141 rows=500 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 34kB
   ->  Seq Scan on d  (cost=0.00..22.00 rows=250 width=20)
(actual time=0.018..1.706 rows=500 loops=1)
 Filter: ((d3 < 50) AND (d4 < 50))
 Rows Removed by Filter: 500
   Planning Time: 0.806 ms
   Execution Time: 1099.229 ms
  (12 rows)

So, not great - underestimated by 1x is likely to lead to
inefficient plans. And now with the samples enabled on both sides:

  create statistics s1 (sample) on d1, d2, f1, f2, f3 from f;
  create statistics s2 (sample) on d1, d2, d3, d4, d5 from d;

QUERY PLAN
  --
   Hash Join  (cost=29.50..24057.25 rows=503170 width=32)
  (actual time=0.630..837.483 rows=50 loops=1)
 Hash Cond: ((f.d1 = d.d1) AND (f.d2 = d.d2))
 ->  Seq Scan on f  (cost=0.00..21370.00 rows=503879 width=20)
(actual time=0.008..301.584 rows=50 loops=1)
   Filter: ((f1 < 50) AND (f2 < 50))
   Rows Removed by Filter: 50
 ->  Hash  (cost=22.00..22.00 rows=500 width=20)
   (actual time=0.616..0.618 rows=500 loops=1)
   Buckets: 1024  Batches: 1  Memory 

Re: pgsql: Fix pattern matching logic for logs in TAP tests of pgbench

2021-06-27 Thread Fabien COELHO




However, if slurp_file fails it raises an exception and aborts the
whole TAP unexpectedly, which is pretty unclean. So I'd suggest to
keep the eval, as attached. I tested it by changing the file name so
that the slurp fails.


Seem quite unnecessary. We haven't found that to be an issue elsewhere
in the code where slurp_file is used. And in the present case we know
the file exists because we got its name from list_files().


Fine with me!

--
Fabien.




Re: pgsql: Fix pattern matching logic for logs in TAP tests of pgbench

2021-06-27 Thread Fabien COELHO




Seem quite unnecessary. We haven't found that to be an issue elsewhere
in the code where slurp_file is used. And in the present case we know
the file exists because we got its name from list_files().


Agreed.  That's an exchange between a hard failure mid-test and a
failure while letting the whole test run.  Here, we expect the test to
find the log file all the time, so a hard failure does not sound like
a bad thing to me either.


Ok, fine with me!

--
Fabien.




Re: pgsql: Fix pattern matching logic for logs in TAP tests of pgbench

2021-06-27 Thread Michael Paquier
On Sat, Jun 26, 2021 at 11:01:07AM -0400, Andrew Dunstan wrote:
> On 6/26/21 2:47 AM, Fabien COELHO wrote:
>> However, if slurp_file fails it raises an exception and aborts the
>> whole TAP unexpectedly, which is pretty unclean. So I'd suggest to
>> keep the eval, as attached. I tested it by changing the file name so
>> that the slurp fails.
> 
> Seem quite unnecessary. We haven't found that to be an issue elsewhere
> in the code where slurp_file is used. And in the present case we know
> the file exists because we got its name from list_files().

Agreed.  That's an exchange between a hard failure mid-test and a
failure while letting the whole test run.  Here, we expect the test to
find the log file all the time, so a hard failure does not sound like
a bad thing to me either.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Fix pattern matching logic for logs in TAP tests of pgbench

2021-06-27 Thread Andrew Dunstan


On 6/26/21 2:47 AM, Fabien COELHO wrote:
>
> Hello Andrew & Michaël,
>
> My 0.02€:
>
>> There's a whole lot wrong with this code. To start with, why is that
>> unchecked eval there.
>
> Yep. The idea was that other tests would go on being collected eg if
> the file is not found, but it should have been checked anyway.
>
>> And why is it reading in log files on its own instead of using
>> TestLib::slurp_file, which, among other things, normalizes line endings?
>
> Indeed.
>
> However, if slurp_file fails it raises an exception and aborts the
> whole TAP unexpectedly, which is pretty unclean. So I'd suggest to
> keep the eval, as attached. I tested it by changing the file name so
> that the slurp fails.


Seem quite unnecessary. We haven't found that to be an issue elsewhere
in the code where slurp_file is used. And in the present case we know
the file exists because we got its name from list_files().


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pgsql: Fix pattern matching logic for logs in TAP tests of pgbench

2021-06-27 Thread Fabien COELHO


Hello Andrew & Michaël,

My 0.02€:


There's a whole lot wrong with this code. To start with, why is that
unchecked eval there.


Yep. The idea was that other tests would go on being collected eg if the 
file is not found, but it should have been checked anyway.


And why is it reading in log files on its own instead of using 
TestLib::slurp_file, which, among other things, normalizes line endings?


Indeed.

However, if slurp_file fails it raises an exception and aborts the whole 
TAP unexpectedly, which is pretty unclean. So I'd suggest to keep the 
eval, as attached. I tested it by changing the file name so that the slurp 
fails.


There's a very good chance that this latter is the issue. It only 
affects msys which is why you didn't see an issue on MSVC. And also, why 
does it carefully unlink the log files so that any trace of what's gone 
wrong is deleted?



Based on the little I've seen this file needs a serious code review.


Probably: My very old perl expertise is fading away because I'm not using 
it much these days. Cannot say I miss it:-)


--
Fabien.diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 781cc08fb1..6a82a3494d 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1190,29 +1190,32 @@ sub check_pgbench_logs
 	ok(@logs == $nb, "number of log files");
 	ok(grep(/\/$prefix\.\d+(\.\d+)?$/, @logs) == $nb, "file name format");
 
-	my $log_number = 0;
 	for my $log (sort @logs)
 	{
-		# Check the contents of each log file.
-		my $contents_raw = slurp_file($log);
+		eval {
+			# Check the contents of each log file.
+			my $contents_raw = slurp_file($log);
 
-		my @contents = split(/\n/, $contents_raw);
-		my $clen = @contents;
-		ok( $min <= $clen && $clen <= $max,
-			"transaction count for $log ($clen)");
-		my $clen_match = grep(/$re/, @contents);
-		ok($clen_match == $clen, "transaction format for $prefix");
+			my @contents = split(/\n/, $contents_raw);
+			my $clen = @contents;
+			ok( $min <= $clen && $clen <= $max,
+"transaction count for $log ($clen)");
+			my $clen_match = grep(/$re/, @contents);
+			ok($clen_match == $clen, "transaction format for $prefix");
 
-		# Show more information if some logs don't match
-		# to help with debugging.
-		if ($clen_match != $clen)
-		{
-			foreach my $log (@contents)
+			# Show more information if some logs don't match
+			# to help with debugging.
+			if ($clen_match != $clen)
 			{
-print "# Log entry not matching: $log\n"
-  unless $log =~ /$re/;
+foreach my $log (@contents)
+{
+	print "# Log entry not matching: $log\n"
+	  unless $log =~ /$re/;
+}
 			}
-		}
+		};
+		# Tell if an exception was raised
+		ok(0, "Error while checking log file \"$log\": $@") if $@;
 	}
 	return;
 }


RE: [Patch] change the default value of shared_buffers in postgresql.conf.sample

2021-06-27 Thread zhangj...@fujitsu.com
> On the whole this seems pretty cosmetic so I'm inclined to leave it alone.  
> But if we were going to do anything I think that adjusting both initdb.c and 
> guc.c to use 128MB might be the most appropriate thing.

Thank you for your suggestions. initdb.c and guc.c have been modified together.

Best Regards!
Zhangjie

-Original Message-
From: Tom Lane  
Sent: Thursday, June 24, 2021 11:50 PM
To: Zhang, Jie/张 杰 
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: [Patch] change the default value of shared_buffers in 
postgresql.conf.sample

"zhangj...@fujitsu.com"  writes:
> In PostgreSQL 14, The default value of shared_buffers is 128MB, but in 
> postgresql.conf.sample, the default value of shared_buffers is 32MB.
> I think the following changes should be made.

> File: postgresql\src\backend\utils\misc\ postgresql.conf.sample 
> #shared_buffers = 32MB  =>  #shared_buffers = 128MB

As submitted, this patch breaks initdb, which is looking for the exact string 
"#shared_buffers = 32MB".

We could adjust that too of course, but I'm dubious first that any change is 
needed, and second that this is the right one:

1. Since initdb will replace that string, users will never see this entry as-is 
in live databases.  So is it worth doing anything?

2. The *actual*, hard-wired, default in guc.c is 1024 (8MB), not either of 
these numbers.  So maybe the sample file ought to use that instead.  Or maybe 
we should change that value too ... it's surely as obsolete as can be.

On the whole this seems pretty cosmetic so I'm inclined to leave it alone.  But 
if we were going to do anything I think that adjusting both initdb.c and guc.c 
to use 128MB might be the most appropriate thing.

regards, tom lane


postgresql.conf.sample_0625.patch
Description: postgresql.conf.sample_0625.patch


Overflow hazard in pgbench

2021-06-27 Thread Tom Lane
moonjelly just reported an interesting failure [1].  It seems that
with the latest bleeding-edge gcc, this code is misoptimized:

/* check random range */
if (imin > imax)
{
pg_log_error("empty range given to random");
return false;
}
else if (imax - imin < 0 || (imax - imin) + 1 < 0)
{
/* prevent int overflows in random functions */
pg_log_error("random range is too large");
return false;
}

such that the second if-test doesn't fire.  Now, according to the C99
spec this code is broken, because the compiler is allowed to assume
that signed integer overflow doesn't happen, whereupon the second
if-block is provably unreachable.  The failure still represents a gcc
bug, because we're using -fwrapv which should disable that assumption.
However, not all compilers have that switch, so it'd be better to code
this in a spec-compliant way.  I suggest applying the attached in
branches that have the required functions.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly=2021-06-26%2007%3A03%3A17

regards, tom lane

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e61055b6b7..c4023bfa27 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2450,7 +2450,8 @@ evalStandardFunc(CState *st,
 		case PGBENCH_RANDOM_ZIPFIAN:
 			{
 int64		imin,
-			imax;
+			imax,
+			delta;
 
 Assert(nargs >= 2);
 
@@ -2464,7 +2465,8 @@ evalStandardFunc(CState *st,
 	pg_log_error("empty range given to random");
 	return false;
 }
-else if (imax - imin < 0 || (imax - imin) + 1 < 0)
+else if (pg_sub_s64_overflow(imax, imin, ) ||
+		 pg_add_s64_overflow(delta, 1, ))
 {
 	/* prevent int overflows in random functions */
 	pg_log_error("random range is too large");


Re: Farewell greeting

2021-06-27 Thread Peter Geoghegan
On Sun, Jun 27, 2021 at 12:41 AM tsunakawa.ta...@fujitsu.com
 wrote:
> I'm moving to another company from July 1st.  I may possibly not be allowed 
> to do open source activities there, so let me say goodbye once.

It's a pity that you may not be around anymore. I wish you all the best.

-- 
Peter Geoghegan




Re: Composite types as parameters

2021-06-27 Thread Tom Lane
Elijah Stone  writes:
> On Sat, 26 Jun 2021, Tom Lane wrote:
>> If it still doesn't work, please provide a more concrete example.

> Thanks, unfortunately adding the explicit cast doesn't help.  I've 
> attached a minimal runnable example.

So your problem is that you're explicitly saying that the input is
of generic-RECORD type.  You should let the server infer its type,
instead, which it can easily do from context in this example.
That is, pass zero as the type OID, or leave out the paramTypes
array altogether.  The example works for me with this change:

@@ -30,13 +30,13 @@
 
// error:
check(PQexecParams(c, "INSERT INTO tab VALUES($1, 8);",
-   1, &(Oid){RECORDOID}, &(const char*){recbuf},
+   1, &(Oid){0}, &(const char*){recbuf},
&(int){rec - recbuf}, &(int){1/*binary*/},
1/*binary result*/));
 
// error as well:
check(PQexecParams(c, "INSERT INTO tab VALUES($1::some_record, 8);",
-   1, &(Oid){RECORDOID}, &(const char*){recbuf},
+   1, &(Oid){0}, &(const char*){recbuf},
&(int){rec - recbuf}, &(int){1},
1));

In more complicated cases you might need to fetch the composite
type's actual OID and pass that.  But I'd go with the lazy approach
until forced to do differently.

> Is there a 
> way to discover the OID of a composite type?  And is the wire format the 
> same as for a generic record?

Same as for any other type: SELECT 'mytypename'::regtype::oid.
And yes.

regards, tom lane




Re: Deparsing rewritten query

2021-06-27 Thread Julien Rouhaud
On Sun, Jun 27, 2021 at 11:14:05AM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> 
> > Agreed.  One the other hand having such a function in core may ensure that 
> > any
> > significant change in those area will keep an API to retrieve the final 
> > query
> > representation.
> 
> My point is precisely that I'm unwilling to make such a promise.
> 
> I do not buy that this capability is worth very much, given that
> we've gotten along fine without it for twenty-plus years.  If you
> want to have it as an internal, might-change-at-any-time API,
> that seems all right.

I'm totally fine with a might-change-at-any-time API, but not with a
might-disappear-at-anytime API.  If exposing get_query_def() can become
virtually useless in a few releases with no hope to get required in-core change
for retrieving the final query representation, I agree this we can stop the
discussion here.




Re: Deparsing rewritten query

2021-06-27 Thread Tom Lane
Julien Rouhaud  writes:
> On Sun, Jun 27, 2021 at 10:34:52AM -0400, Tom Lane wrote:
>> It's not very hard to imagine someday moving view
>> expansion into the planner on efficiency grounds, leaving the rewriter
>> handling only the rare uses of INSERT/UPDATE/DELETE rules.

> Agreed.  One the other hand having such a function in core may ensure that any
> significant change in those area will keep an API to retrieve the final query
> representation.

My point is precisely that I'm unwilling to make such a promise.

I do not buy that this capability is worth very much, given that
we've gotten along fine without it for twenty-plus years.  If you
want to have it as an internal, might-change-at-any-time API,
that seems all right.  If you're trying to lock it down as something
that will be there forevermore, you're likely to end up with nothing.

regards, tom lane




Re: Deparsing rewritten query

2021-06-27 Thread Julien Rouhaud
On Sun, Jun 27, 2021 at 10:34:52AM -0400, Tom Lane wrote:
> 
> I'm not really excited by this, as it seems like it's exposing internal
> decisions we could change someday; to wit, first that there is any such
> thing as a separate rewriting pass

Sure, but the fact that views will significantly impact the query being
executed from the one written is not an internal decision.  In my opinion
knowing what the final "real" query will be is still a valid concern, whether
we have a rewriting pass or not.

> and second that its output is
> interpretable as pure SQL.  (TBH, I'm not 100% sure that the second
> assumption is true even today, although I know there are ancient comments
> that claim that.)

I totally agree.  Note that there was at least one gotcha handled in this
patch: rewritten views didn't get an alias, which is mandatory for an SQL
query.

> It's not very hard to imagine someday moving view
> expansion into the planner on efficiency grounds, leaving the rewriter
> handling only the rare uses of INSERT/UPDATE/DELETE rules.

Agreed.  One the other hand having such a function in core may ensure that any
significant change in those area will keep an API to retrieve the final query
representation.

> If there's functions in ruleutils.c that we'd need to make public to
> let somebody write a debugging extension that does this kind of thing,
> I'd be happier with that approach than with creating a core-server SQL
> function for it.  There might be more than one use-case for the
> exposed bits.

It would mean exposing at least get_query_def().  I thought that exposing this
function was already suggested and refused, but I may be wrong.  Maybe other
people would like to have nearby functions exposed too.

Note that if we go this way, we would still need at least something like this
patch's chunk in rewriteHandler.c applied, as otherwise the vast majority of
rewritten and deparsed queries won't be valid.




code fork June 28th

2021-06-27 Thread Andrew Dunstan


I am planning on forking the tree so we can start adding developments
for Postgres 15 in the upcoming commitfest. This will be done tomorrow,
June 28, late morning US East coast time. I will be following the
procedures laid out in src/tools/RELEASE_CHANGES under the heading
"Starting a New Development Cycle".


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Deparsing rewritten query

2021-06-27 Thread Tom Lane
Julien Rouhaud  writes:
> As an alternative, maybe we could expose a simple SRF that would take care of
> rewriting the query and deparsing the resulting query tree(s)?

I'm not really excited by this, as it seems like it's exposing internal
decisions we could change someday; to wit, first that there is any such
thing as a separate rewriting pass, and second that its output is
interpretable as pure SQL.  (TBH, I'm not 100% sure that the second
assumption is true even today, although I know there are ancient comments
that claim that.)  It's not very hard to imagine someday moving view
expansion into the planner on efficiency grounds, leaving the rewriter
handling only the rare uses of INSERT/UPDATE/DELETE rules.

If there's functions in ruleutils.c that we'd need to make public to
let somebody write a debugging extension that does this kind of thing,
I'd be happier with that approach than with creating a core-server SQL
function for it.  There might be more than one use-case for the
exposed bits.

regards, tom lane




Re: Deparsing rewritten query

2021-06-27 Thread Julien Rouhaud
On Sun, Jun 27, 2021 at 10:06:00AM -0300, Ranier Vilela wrote:
> 
> 1. strcmp(sql, "") could not be replaced by sql[0] == '\0'?

It's a debugging leftover that I forgot to remove.  There's no point trying
to catch an empty string as e.g. a single space would be exactly the same, and
both would be caught by the next (and correct) test.

> 3. initStringInfo() inside a loop, wouldn't it be exaggerated? each
> time call palloc0.

initStringInfo calls palloc, not palloc0.

It's unlikely to make any difference.  Rules have been strongly discouraged for
many years, and if you have enough to make a noticeable difference here, you
probably have bigger problems.  But no objection to reset the StringInfo
instead.




Re: Deparsing rewritten query

2021-06-27 Thread Ranier Vilela
> Hi,
>
> I sometimes have to deal with queries referencing multiple and/or complex
> views. In such cases, it's quite troublesome to figure out what is the
> query
> really executed. Debug_print_rewritten isn't really useful for non trivial
> queries, and manually doing the view expansion isn't great either.
>
> While not being ideal, I wouldn't mind using a custom extension for that
> but
> this isn't an option as get_query_def() is private and isn't likely to
> change.
>
> As an alternative, maybe we could expose a simple SRF that would take care
> of
> rewriting the query and deparsing the resulting query tree(s)?
>
> I'm attaching a POC patch for that, adding a new pg_get_query_def(text)
> SRF.
+1

If you don't mind, I made small corrections to your patch.
1. strcmp(sql, "") could not be replaced by sql[0] == '\0'?
2. the error message would not be: "empty statement not allowed"?
3. initStringInfo() inside a loop, wouldn't it be exaggerated? each
time call palloc0.

regards,
Ranier Vilela


v1-0002-Add-pg_get_query_def-to-deparse-and-print-a-rewri.patch
Description: Binary data


Re: pglz compression performance, take two

2021-06-27 Thread Andrey Borodin


> 20 марта 2021 г., в 00:35, Mark Dilger  
> написал(а):
> 
> 
> 
>> On Jan 21, 2021, at 6:48 PM, Justin Pryzby  wrote:
>> 
>> @cfbot: rebased
>> <0001-Reorganize-pglz-compression-code.patch>
> 
> Review comments.

Thanks for the review, Mark!
And sorry for such a long delay, I've been trying to figure out a way to do 
things less-platform dependent.
And here's what I've come up with.

We use pglz_read32() not the way xxhash and lz4 does - we really do not need to 
get 4-byte value, we only need to compare 4 bytes at once.
So, essentially, we need to compare two implementation of 4-byte comparison

bool
cpm_a(const void *ptr1, const void *ptr2)
{
return *(const uint32_t *) ptr1 == *(const uint32_t *) ptr2;
}

bool
cmp_b(const void *ptr1, const void *ptr2)
{
return memcmp(ptr1, ptr2, 4) == 0;
}

Variant B is more portable. Inspecting it Godblot's compiler explorer I've 
found out that for GCC 7.1+ it generates assembly without memcmp() call. For 
x86-64 and ARM64 assembly of cmp_b is identical to cmp_a.
So I think maybe we could just stick with version cmp_b instead of optimising 
for ARM6 and similar architectures like Arduino.

I've benchmarked the patch with "REINDEX table pgbench_accounts" on pgbench -i 
of scale 100. wal_compression was on, other settings were default.
Without patch it takes ~11055.077 ms on my machine, with patch it takes 
~9512.411 ms, 14% speedup overall.

PFA v5.

Thanks!

Best regards, Andrey Borodin.



v5-0001-Reorganize-pglz-compression-code.patch
Description: Binary data


Re: Farewell greeting

2021-06-27 Thread Julien Rouhaud
On Sun, Jun 27, 2021 at 07:41:19AM +, tsunakawa.ta...@fujitsu.com wrote:
> 
> I'm moving to another company from July 1st.  I may possibly not be allowed
> to do open source activities there, so let me say goodbye once.  Thank you
> all for your help.  I really enjoyed learning PostgreSQL and participating in
> its development.
> 
> It's a pity that I may not be able to part of PostgreSQL's great history
> until it becomes the most popular database (in the DB-Engines ranking.)
> However, if possible, I'd like to come back as just MauMau.
> 
> I hope you all will succeed.

Wish you all the best!




Re: Farewell greeting

2021-06-27 Thread Tatsuo Ishii
> Hello all,
> 
> 
> I'm moving to another company from July 1st.  I may possibly not be allowed 
> to do open source activities there, so let me say goodbye once.  Thank you 
> all for your help.  I really enjoyed learning PostgreSQL and participating in 
> its development.
> 
> It's a pity that I may not be able to part of PostgreSQL's great history 
> until it becomes the most popular database (in the DB-Engines ranking.)  
> However, if possible, I'd like to come back as just MauMau.
> 
> I hope you all will succeed.
> 
> 
> Regards
> Takayuki Tsunakawa

Good luck, Tsunakawa-san!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Farewell greeting

2021-06-27 Thread tsunakawa.ta...@fujitsu.com
Hello all,


I'm moving to another company from July 1st.  I may possibly not be allowed to 
do open source activities there, so let me say goodbye once.  Thank you all for 
your help.  I really enjoyed learning PostgreSQL and participating in its 
development.

It's a pity that I may not be able to part of PostgreSQL's great history until 
it becomes the most popular database (in the DB-Engines ranking.)  However, if 
possible, I'd like to come back as just MauMau.

I hope you all will succeed.


Regards
Takayuki Tsunakawa