Re: [HACKERS] Custom compression methods

2021-03-15 Thread Dilip Kumar
On Tue, Mar 16, 2021 at 11:18 AM Justin Pryzby  wrote:
>
> I'm a minor contributor now to a couple bits of this patch set, but I can
> answer a couple of these points.
>
> On Mon, Mar 15, 2021 at 03:58:35PM -0700, Andres Freund wrote:
> > Comments about 0003:
> > - why is HIDE_TOAST_COMPRESSION useful? Doesn't quite seem to be
> >   comparable to HIDE_TABLEAM?
>
> That was my idea and implementation.
> It's because until 3 weeks ago, the patchset supported a "plugable compression
> API" like CREATE ACCESS METHOD, a suggestion from Alvaro to avoid making a new
> table and everything involved just for a few rows).  Now, the patch is limited
> to lz4, and the "pluggable compression APIs" isn't included in the latest
> patchsets.

Yeah, but now also it makes sense to hide the compression method to
avoid unrelated regression changes.  But I am okay if we think we want
to drop this?

> > Comments about 0005:
> > - I'm personally not really convinced tracking the compression type in
> >   pg_attribute the way you do is really worth it (. Especially given
> >   that it's right now only about new rows anyway.  Seems like it'd be
> >   easier to just treat it as a default for new rows, and dispense with
> >   all the logic around mismatching compression types etc?
>
> I made the half-serious suggestion to make it a per-relation relopt.
> That would allow implementing pg_dump --no-toast-compression, to allow
> restoring a dump from a server with LZ4 tables to a server --without-lz4.
> Similar to --no-tablespaces.

I am not sure how good an idea it is to support table-level options.
The attribute level option makes sense to me in case we want to
support different compression methods for different data types.
Currently, we have only pglz and lz4 but if we are not planning for
custom compression in the future then we can support 2 more built-in
compression methods so I still feel having an attribute level option
makes more sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-15 Thread Justin Pryzby
I'm a minor contributor now to a couple bits of this patch set, but I can
answer a couple of these points.

On Mon, Mar 15, 2021 at 03:58:35PM -0700, Andres Freund wrote:
> Comments about 0003:
> - why is HIDE_TOAST_COMPRESSION useful? Doesn't quite seem to be
>   comparable to HIDE_TABLEAM?

That was my idea and implementation.
It's because until 3 weeks ago, the patchset supported a "plugable compression
API" like CREATE ACCESS METHOD, a suggestion from Alvaro to avoid making a new
table and everything involved just for a few rows).  Now, the patch is limited
to lz4, and the "pluggable compression APIs" isn't included in the latest
patchsets.

> Comments about 0005:
> - I'm personally not really convinced tracking the compression type in
>   pg_attribute the way you do is really worth it (. Especially given
>   that it's right now only about new rows anyway.  Seems like it'd be
>   easier to just treat it as a default for new rows, and dispense with
>   all the logic around mismatching compression types etc?

I made the half-serious suggestion to make it a per-relation relopt.
That would allow implementing pg_dump --no-toast-compression, to allow
restoring a dump from a server with LZ4 tables to a server --without-lz4.
Similar to --no-tablespaces.

That would also avoid adding a compression column in \d (which avoids the need
for HIDE_TOAST_COMPRESSION).

> > I'm open to being convinced that we don't need to do either of these
> > things, and that the cost of iterating over all varlenas in the tuple
> > is not so bad as to preclude doing things as you have them here. But,
> > I'm afraid it's going to be too expensive.
>
> I mean, I would just define several of those places away by not caring
> about tuples in a different compressino formation ending up in a
> table...

If I understand you right, this is because it's desirable to allow 1) migrating
existing data from pglz to lz4; 2) also allow moving away from lz4, if need be.

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-15 Thread Dilip Kumar
On Tue, Mar 16, 2021 at 4:28 AM Andres Freund  wrote:

Replying to some of the comments..

> - Is nodeModifyTable.c really the right place for the logic around
>   CompareCompressionMethodAndDecompress()?  And is doing it in every
>   place that does "user initiated" inserts really the right way? Why
>   isn't this done on the tuptoasting level?

I think if we do in tuptoasting level then it will be even costlier
because in nodeModifyTable.c at least many time we will get the
virtual tuple e.g. if a user is directly inserting the tuple but once
we go down to tuptoasting level by then we will always get the
HeapTuple and we will have to deform in every case where tupdesc has
any varlena because we don't have any flag in the tuple header to tell
us whether there are any compressed data or not. In the below
thread[1] we have considered these 2 approaches and basically, in
unrelated paths like pg_bench, we did not see any performance
regression with any of those approaches.

[1] 
https://www.postgresql.org/message-id/CAFiTN-vcbfy5ScKVUp16c1N_wzP0RL6EkPBAg_Jm3eDK0ftO5Q%40mail.gmail.com

> > I'm open to being convinced that we don't need to do either of these
> > things, and that the cost of iterating over all varlenas in the tuple
> > is not so bad as to preclude doing things as you have them here. But,
> > I'm afraid it's going to be too expensive.
>
> I mean, I would just define several of those places away by not caring
> about tuples in a different compressino formation ending up in a
> table...

I am just wondering that why we don't need to process in case of
storage change, I mean if the target table has the attribute storage
as external and if there are some compressed data coming from the
source table then we will be inserting those compressed data as it is
in the target attribute without externalizing.  Maybe it is done to
avoid such performance impacts?  Well, we can do the same for the
compression also and just provide some mechanism to recompress maybe
in vacuum full/cluster.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Allow batched insert during cross-partition updates

2021-03-15 Thread Amit Langote
Hi Georgios,

On Fri, Mar 12, 2021 at 7:59 PM  wrote:
> On Friday, March 12, 2021 3:45 AM, Amit Langote  
> wrote:
> > On Thu, Mar 11, 2021 at 8:36 PM gkokola...@pm.me wrote:
> > > On Thursday, March 11, 2021 9:42 AM, Amit Langote amitlangot...@gmail.com 
> > > wrote:
> > > > What we do support however is moving rows from a local partition to a
> > > > remote partition and that involves performing an INSERT on the latter.
> > > > This patch is for teaching those INSERTs to use batched mode if
> > > > allowed, which is currently prohibited. So with this patch, if an
> > > > UPDATE moves 10 rows from a local partition to a remote partition,
> > > > then they will be inserted with a single INSERT command containing all
> > > > 10 rows, instead of 10 separate INSERT commands.
> > >
> > > So, if I understand correctly then in my previously attached repro I
> > > should have written instead:
> > >
> > > CREATE TABLE local_root_remote_partitions (a int) PARTITION BY LIST ( 
> > > a );
> > > CREATE TABLE
> > > local_root_local_partition_1
> > > PARTITION OF
> > > local_root_remote_partitions FOR VALUES IN (1);
> > >
> > > CREATE FOREIGN TABLE
> > > local_root_remote_partition_2
> > > PARTITION OF
> > > local_root_remote_partitions FOR VALUES IN (2)
> > > SERVER
> > > remote_server
> > > OPTIONS (
> > > table_name 'remote_partition_2',
> > > batch_size '10'
> > > );
> > >
> > > INSERT INTO local_root_remote_partitions VALUES (1), (1);
> > > -- Everything should be on local_root_local_partition_1 and on the 
> > > same transaction
> > > SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM 
> > > local_root_remote_partitions;
> > >
> > > UPDATE local_root_remote_partitions SET a = 2;
> > > -- Everything should be on remote_partition_2 and on the same 
> > > transaction
> > > SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM 
> > > local_root_remote_partitions;
> > >
> > >
> > > I am guessing that I am still wrong because the UPDATE operation above 
> > > will
> > > fail due to the restrictions imposed in postgresBeginForeignInsert 
> > > regarding
> > > UPDATES.
> >
> > Yeah, for the move to work without hitting the restriction you
> > mention, you will need to write the UPDATE query such that
> > local_root_remote_partition_2 is not updated. For example, as
> > follows:
> >
> > UPDATE local_root_remote_partitions SET a = 2 WHERE a <> 2;
> >
> > With this query, the remote partition is not one of the result
> > relations to be updated, so is able to escape that restriction.
>
> Excellent. Thank you for the explanation and patience.
>
> > > Would it be too much to ask for the addition of a test case that will
> > > demonstrate the change of behaviour found in patch.
> >
> > Hmm, I don't think there's a way to display whether the INSERT done on
> > the remote partition as a part of an (tuple-moving) UPDATE used
> > batching or not. That's because that INSERT's state is hidden from
> > EXPLAIN. Maybe we should change EXPLAIN to make it show such hidden
> > INSERT's state (especially its batch size) under the original UPDATE
> > node, but I am not sure.
>
> Yeah, there does not seem to be a way for explain to do show that information
> with the current code.
>
> > By the way, the test case added by commit 927f453a94106 does exercise
> > the code added by this patch, but as I said in the last paragraph, we
> > can't verify that by inspecting EXPLAIN output.
>
> I never doubted that. However, there is a difference. The current patch
> changes the query to be executed in the remote from:
>
>INSERT INTO  VALUES ($1);
> to:
>INSERT INTO  VALUES ($1), ($2) ... ($n);
>
> When this patch gets in, it would be very helpful to know that subsequent
> code changes will not cause regressions. So I was wondering if there is
> a way to craft a test case that would break for the code in 927f453a94106
> yet succeed with the current patch.

The test case "works" both before and after the patch, with the
difference being in the form of the remote query.  It seems to me
though that you do get that.

> I attach version 2 of my small reproduction. I am under the impression that
> in this version, examining the value of cmin in the remote table should
> give an indication of whether the remote received a multiple insert queries
> with a single value, or a single insert query with multiple values.
>
> Or is this a wrong assumption of mine?

No, I think you have a good idea here.

I've adjusted that test case to confirm that the batching indeed works
by checking cmin of the moved rows, as you suggest.  Please check the
attached updated patch.
--
Amit Langote
EDB: http://www.enterprisedb.com


v2-0001-Allow-batching-of-inserts-during-cross-partition-.patch
Description: Binary data


RE: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-15 Thread kuroda.hay...@fujitsu.com
Dear Ikeda-san

I think the idea is good.

I read the patch and other sources, and I found process_startup_packet_die also 
execute _exit(1).
I think they can be combined into one function and moved to interrupt.c, but 
some important comments might be removed. How do you think?


Best Regards,
Hayato Kuroda
FUJITSU LIMITED





RE: subscriptionCheck failures

2021-03-15 Thread osumi.takami...@fujitsu.com
Hello

On Tuesday, March 16, 2021 12:31 PM Amit Kapila  wrote:
> On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro
>  wrote:
> >
> > Hi,
> >
> > This seems to be a new low frequency failure, I didn't see it mentioned
> already:
Oh, this is the test I wrote and included as part of the commit ce0fdbfe
#   Failed test 'DROP SUBSCRIPTION during error can clean up the slots on the 
publisher'
#   at t/004_sync.pl line 171.
#  got: '1'
# expected: '0'
# Looks like you failed 1 test of 8.
I apologize this and will check the reason.


Best Regards,
Takamichi Osumi



Re: subscriptionCheck failures

2021-03-15 Thread Amit Kapila
On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro  wrote:
>
> Hi,
>
> This seems to be a new low frequency failure, I didn't see it mentioned 
> already:
>

Thanks for reporting, I'll look into it.

-- 
With Regards,
Amit Kapila.




RE: libpq debug log

2021-03-15 Thread tsunakawa.ta...@fujitsu.com
I'm looking at the last file libpq-trace.c.  I'll continue the review after 
lunch.  Below are some comments so far.


(1)
-  Enables  tracing of the client/server communication to a debugging file 
stream.
+  Enables tracing of the client/server communication to a debugging file
+  stream.
+  Only available when protocol version 3 or higher is used.

The last line is unnecessary now that v2 is not supported.


(2)
@@ -113,6 +114,7 @@ install: all installdirs install-lib
$(INSTALL_DATA) $(srcdir)/libpq-fe.h '$(DESTDIR)$(includedir)'
$(INSTALL_DATA) $(srcdir)/libpq-events.h '$(DESTDIR)$(includedir)'
$(INSTALL_DATA) $(srcdir)/libpq-int.h '$(DESTDIR)$(includedir_internal)'
+   $(INSTALL_DATA) $(srcdir)/libpq-trace.h 
'$(DESTDIR)$(includedir_internal)'
$(INSTALL_DATA) $(srcdir)/pqexpbuffer.h 
'$(DESTDIR)$(includedir_internal)'
$(INSTALL_DATA) $(srcdir)/pg_service.conf.sample 
'$(DESTDIR)$(datadir)/pg_service.conf.sample'

Why is this necessary?


(3) libpq-trace.h
+#ifdef __cplusplus
+extern "C"
+{

This is unnecessary because pqTraceOutputMessage() is for libpq's internal use. 
 This extern is for the user's C++ application to call public libpq functions.

+#include "libpq-fe.h"
+#include "libpq-int.h"

I think these can be removed by declaring the struct and function as follows:

struct pg_conn;
extern void pqTraceOutputMessage(struct pg_conn *conn, const char *message, 
bool toServer);

But... after doing the above, only this declaration is left in libpq-trace.h.  
Why don't you put your original declaration using PGconn in libpq-int.h and 
remove libpq-trace.h?


(4)
+   /* Trace message only when there is first 1 byte */
+   if (conn->Pfdebug && (conn->outCount < conn->outMsgStart))
+   pqTraceOutputMessage(conn, conn->outBuffer + conn->outCount, 
true);

() surrounding the condition after && can be removed.


(5)
+static const char *pqGetProtocolMsgType(unsigned char c,
+   
bool toServer);

This is unnecessary because the function definition precedes its only one call 
site.


(6)
+ * We accumulate frontend message pieces in an array as the libpq code writes
+ * them, and log the complete message when pqTraceOutputFeMsg is called.
+ * For backend, we print the pieces as soon as we receive them from the server.

The first paragraph is no longer true.  I think this entire comment is 
unnecessary.


(7)
+static const char *
+pqGetProtocolMsgType(unsigned char c, bool toServer)
+{
+   if (toServer == true &&
+   c < lengthof(protocol_message_type_f) &&
+   protocol_message_type_f[c] != NULL)
+   return protocol_message_type_f[c];
+
+   if (toServer == false &&
+   c < lengthof(protocol_message_type_b) &&
+   protocol_message_type_b[c] != NULL)
+   return protocol_message_type_b[c];
+
+   return "UnknownMessage:";
+}

"== true/false" can be removed.  libpq doesn't appear to use such style.

Why don't we embed this processing in pqTraceOutputMessage() because this 
function is short and called only once?  The added benefit would be that you 
can print an invalid message type byte like this:

fprintf(..., "Unknown message: %02x\n", ...);


Regards
Takayuki Tsunakawa





Re: Type of wait events WalReceiverWaitStart and WalSenderWaitForWAL

2021-03-15 Thread Kyotaro Horiguchi
At Tue, 16 Mar 2021 03:12:54 +0900, Fujii Masao  
wrote in 
> The wait event WalReceiverWaitStart has been categorized in the type
> Client.
> But why? Walreceiver is waiting for startup process to set the lsn and
> timeline while it is reporting WalReceiverWaitStart. So its type
> should be IPC,
> instead?
>
> The wait event WalSenderWaitForWAL has also been categorized in the
> type
> Client. While this wait event is being reported, logical replication
> walsender
> is waiting for not only new WAL to be flushed but also the socket to
> be
> readable and writeable (if there is pending data). I guess that this
> is why
> its type is Client. But ISTM walsender is *mainly* waiting for new WAL
> to be
> flushed by other processes during that period, so I think that it's
> better
> to use IPC as the type of the wait event WalSenderWaitForWAL. Thought?

I agree that it's definitely not a client wait. It would be either
activity or IPC.  My reasoning for the latter is it's similar to
WAIT_EVENT_WAL_RECEIVER_MAIN since both are a wait while
WalReceiverMain to continue. With a difference thatin walreceiver
hears where to start in the latter state.

I don't object if it were categorized to IPC, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger


> On Mar 15, 2021, at 11:57 AM, Tom Lane  wrote:
> 
> Not sure that I believe the theory that this is from bad luck of
> concurrent autovacuum timing, though.  The fact that we're seeing
> this on just those two animals suggests strongly to me that it's
> architecture-correlated, instead.

I find it a little hard to see how amcheck is tripping over a toasted value 
just in this one table, pg_statistic, and not in any of the others.  The error 
message says the problem is in attribute 27, which I believe means it is 
stavalues2.  The comment in the header for this catalog is intriguing:

/*
 * Values in these arrays are values of the column's data type, or of some
 * related type such as an array element type.  We presently have to cheat
 * quite a bit to allow polymorphic arrays of this kind, but perhaps
 * someday it'll be a less bogus facility.
 */
anyarraystavalues1;
anyarraystavalues2;
anyarraystavalues3;
anyarraystavalues4;
anyarraystavalues5;

This is hard to duplicate in a test, because you can't normally create tables 
with pseudo-type columns.  However, if amcheck is walking the tuple and does 
not correctly update the offset with the length of attribute 26, it may try to 
read attribute 27 at the wrong offset, unsurprisingly leading to garbage, 
perhaps a garbage toast pointer.  The attached patch v7-0004 adds a check to 
verify_heapam to see if the va_toastrelid matches the expected toast table oid 
for the table we're reading.  That check almost certainly should have been 
included in the initial version of verify_heapam, so even if it does nothing to 
help us in this issue, it's good that it be committed, I think.

It is unfortunate that the failing test only runs pg_amcheck after creating 
numerous corruptions, as we can't know if pg_amcheck would have complained 
about pg_statistic before the corruptions were created in other tables, or if 
it only does so after.  The attached patch v7-0003 adds a call to pg_amcheck 
after all tables are created and populated, but before any corruptions are 
caused.  This should help narrow down what is happening, and doesn't hurt to 
leave in place long-term.

I don't immediately see anything wrong with how pg_statistic uses a 
pseudo-type, but it leads me to want to poke a bit more at pg_statistic on 
hornet and tern, though I don't have any regression tests specifically for 
doing so.

Tests v7-0001 and v7-0002 are just repeats of the tests posted previously.



v7-0001-Turning-off-autovacuum-during-corruption-tests.patch
Description: Binary data


v7-0002-Fixing-a-confusing-amcheck-corruption-message.patch
Description: Binary data


v7-0003-Extend-pg_amcheck-test-suite.patch
Description: Binary data


v7-0004-Add-extra-check-of-toast-pointer-in-amcheck.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: shared-memory based stats collector

2021-03-15 Thread Andres Freund
Hi,

On 2021-03-13 10:05:21 -0800, Andres Freund wrote:
> Cool. I'll give it a try.


I have a few questions about the patch:

- Why was collect_oids() changed to a different hashtable as part of
  this change? Seems fairly independent?

- What's the point of all those cached_* stuff? There's not a single
  comment explaining it as far as I can tell...

  Several of them are never used as a cache! E.g. cached_archiverstats,
  cached_bgwriterstats, ...

- What is the idea behind pgstat_reset_shared_counters() using
  pgstat_copy_global_stats() to reset, using StatsShmem->*_reset_offset?
  But then still taking a lock in pgstat_fetch_stat_*?  Again, no
  comments explaining what the goal is.

  It kinda looks like you tried to make both read and write paths not
  use the lock, but then ended up using a lock?


Do you have some benchmarks that you used to verify performance?


I think I'm going to try to split the storage of fixed-size stats in
StatsShmemStruct into a separate patch. That's already a pretty large
change, and it's pretty much unrelated to the rest.

Greetings,

Andres Freund




Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

2021-03-15 Thread Thomas Munro
On Mon, Mar 15, 2021 at 8:25 PM Bharath Rupireddy
 wrote:
> > > The problem with a case like REFRESH MATERIALIZED VIEW is that there's
> > > nothing to prevent something that gets run in the course of the query
> > > from trying to access the view (and the heavyweight lock won't prevent
> > > that, due to group locking). That's probably a stupid thing to do,
> > > but it can't be allowed to break the world. The other cases are safe
> > > from that particular problem because the table doesn't exist yet.
>
> Please correct me if my understanding of the above comment (from the
> commit e9baa5e9) is wrong -  even if the leader opens the matview
> relation in exclusive mode, because of group locking(in case we allow
> parallel workers to feed in the data to the new heap that gets created
> for RMV, see ExecRefreshMatView->make_new_heap), can other sessions
> still access the matview relation with older data?
>
> I performed below testing to prove myself wrong for the above understanding:
> session 1:
> 1) added few rows to the table t1 on which the mv1 is defined;
> 2) refresh materialized view mv1;
>
> session 2:
> 1) select count(*) from mv1;   ---> this query is blocked until
> session 1's step (2) is completed and gives the latest result even if
> the underlying data-generating query runs select part in parallel.
>
> Is there anything I'm missing here?

I think he was talking about things like functions that try to access
the mv from inside the same query, in a worker.  I haven't figured out
exactly which hazards he meant.  I thought about wrong-relfilenode
hazards and combocid hazards, but considering the way this thing
always inserts into a fresh table before performing merge or swap
steps later, I don't yet see why this is different from any other
insert-from-select-with-gather.

With the script below you can reach this error in the leader:

postgres=# refresh materialized view CONCURRENTLY mv;
ERROR:  cannot start commands during a parallel operation
CONTEXT:  SQL function "crazy"

But that's reachable on master too, even if you change crazy() to do
"select 42 from pg_class limit 1" instead of reading mv, when
performing a  CREATE M V without NO DATA.  :-(  Without parallel
leader participation it runs to completion.

===8<===

set parallel_leader_participation = off;
set parallel_setup_cost = 0;
set min_parallel_table_scan_size = 0;
set parallel_tuple_cost = 0;

drop table if exists t cascade;

create table t (i int);
insert into t select generate_series(1, 10);
create materialized view mv as select 42::int i;
create or replace function crazy() returns int as $$ select i from mv
limit 1; $$ language sql parallel safe;
drop materialized view mv;
create materialized view mv as select i + crazy() i from t with no data;
create unique index on mv(i);

refresh materialized view mv;
refresh materialized view concurrently mv;

begin;
refresh materialized view mv;
refresh materialized view mv;
commit;

begin;
refresh materialized view concurrently mv;
refresh materialized view concurrently mv;
commit;

===8<===

PS, off-topic observation made while trying to think of ways to break
your patch: I noticed that REFRESH CONCURRENTLY spends a lot of time
in refresh_by_match_merge()'s big FULL JOIN.  That is separate from
the view query that you're parallelising here, and is used to perform
the merge between a temporary table and the target MV table.  I hacked
the code a bit so that it wasn't scanning a temporary table
(unparallelisable), and tried out the Parallel Hash Full Join patch
which I intend to commit soon.  This allowed REFRESH CONCURRENTLY to
complete much faster.  Huzzah!  Unfortunately that query also does
ORDER BY tid; I guess we could remove that to skip a Sort and use
Gather instead of the more expensive Gather Merge, and hopefully soon
a pushed-down Insert.




Re: [PATCH]: Allow errors in parameter values to be reported during the BIND phase itself..

2021-03-15 Thread Justin Pryzby
On Mon, Mar 15, 2021 at 08:30:18PM -0400, Tom Lane wrote:
> +   /* Pop the error callback */
> +   error_context_stack = error_context_stack->previous;
> +
> /*
>  * Once all parameters have been received, prepare for 
> printing them
>  * in errors, if configured to do so.  (This is saved in the 
> portal,
>  * so that they'll appear when the query is executed later.)
>  */
> if (log_parameter_max_length_on_error != 0)
> params->paramValuesStr =
> BuildParamLogString(params,
...

I think it's somewhat confusing that there's two callbacks.
The first one applies only during typinput/typreceive.
I guess the 2nd one should say that they're printed "in *future errors".

> +# Check that errors are reported during BIND phase, too

Actually, the v13 log_parameter_max_length_on_error feature works during BIND
too, but only after typinput of *all* params.

For example, this shows params when failing in GetCachedPlan().

| python3.5 -c "import pg,time; db=pg.DB('dbname=postgres 
host=/var/run/postgresql port=5432 host=/tmp'); q = db.query(\"SET 
backtrace_functions=pg_strtoint16; SET 
log_parameter_max_length_on_error=-1;\"); db.prepare('p', 'SELECT a::smallint 
from (select \$1 a)a'); db.query_prepared('p',6);"

So my own description of the patch evolved from "show errors during BIND" to
"show errors during typinput of text params".

Thanks,
-- 
Justin




Re: Improvements and additions to COPY progress reporting

2021-03-15 Thread Michael Paquier
On Mon, Mar 15, 2021 at 12:43:40PM +0100, Matthias van de Meent wrote:
> Hmm, does CFBot not run checkout on windows with crlf line endings? I
> had expected it to do as such.

This is environment-sensitive, so I am not surprised that Appveyor
changes the way newlines are handled there.  I could see the
difference by running the tests manually on Windows command prompt for
example.

> That seems great, thanks for picking this up.

Okay.  Applied, then.
--
Michael


signature.asc
Description: PGP signature


Re: Enhance traceability of wal_level changes for backup management

2021-03-15 Thread Laurenz Albe
On Mon, 2021-03-15 at 17:32 -0400, Stephen Frost wrote:
> We explicitly document that people can switch the WAL level and restart
> to do bulk data loads faster, and there's certainly no shortage of
> discussion (including what prompted this thread..) about doing exactly
> that.  Adding more documentation around that would certainly be good,
> as would changing this:
> 
> ereport(WARNING,
> (errmsg("WAL was generated with wal_level=minimal, data may be missing"),
> errhint("This happens if you temporarily set wal_level=minimal without taking 
> a new base backup.")));
> 
> into a PANIC instead of a WARNING.

There is a patch in the commitfest that does exactly that (except it
uses ERROR rather than PANIC):
https://postgr.es/m/osbpr01mb4888cbe1da08818fd2d90ed8ed...@osbpr01mb4888.jpnprd01.prod.outlook.com

Yours,
Laurenz Albe





Re: [PATCH]: Allow errors in parameter values to be reported during the BIND phase itself..

2021-03-15 Thread Tom Lane
Justin Pryzby  writes:
> On Mon, Mar 15, 2021 at 06:45:49PM -0400, Tom Lane wrote:
>> I started to look at this, and immediately began to wonder where is the
>> previous discussion you're evidently referring to.  Can you dig up an
>> archives link?

> I think I was referring to this (from the commit message).
> https://www.postgresql.org/message-id/flat/canfkh5k-6nnt-4csv1vpb80nq2bzczhfvr5o4vznybsx0wz...@mail.gmail.com

Got it, thanks.

After studying the patch it seems like there's a better way to do it:
we should just log the single parameter that's at fault.  That saves
the user from having to dig through a bunch of parameters to figure
out which one we're complaining about, plus we can usefully identify
the parameter (by number) even if it was sent in binary.

This is a little bit more net code addition than what you had,
but only by ten lines or so.  On the plus side, it doesn't
require rearranging any existing code.

regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8a0332dde9..b9ac6b8f76 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -102,7 +102,18 @@ int			max_stack_depth = 100;
 /* wait N seconds to allow attach from a debugger */
 int			PostAuthDelay = 0;
 
+/* 
+ *		private typedefs etc
+ * 
+ */
 
+/* type of argument for bind_param_error_callback */
+typedef struct BindParamCbData
+{
+	const char *portalName;
+	int			paramno;		/* zero-based param number, or -1 initially */
+	const char *paramval;		/* textual input string, if available */
+} BindParamCbData;
 
 /* 
  *		private variables
@@ -183,6 +194,7 @@ static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
 static int	errdetail_recovery_conflict(void);
+static void bind_param_error_callback(void *arg);
 static void start_xact_command(void);
 static void finish_xact_command(void);
 static bool IsTransactionExitStmt(Node *parsetree);
@@ -1698,6 +1710,16 @@ exec_bind_message(StringInfo input_message)
 	if (numParams > 0)
 	{
 		char	  **knownTextValues = NULL; /* allocate on first use */
+		BindParamCbData one_param_data;
+
+		/* Set the error callback so that parameters are logged for errors */
+		one_param_data.portalName = portal->name;
+		one_param_data.paramno = -1;
+		one_param_data.paramval = NULL;
+		params_errcxt.previous = error_context_stack;
+		params_errcxt.callback = bind_param_error_callback;
+		params_errcxt.arg = (void *) _param_data;
+		error_context_stack = _errcxt;
 
 		params = makeParamList(numParams);
 
@@ -1711,6 +1733,9 @@ exec_bind_message(StringInfo input_message)
 			char		csave;
 			int16		pformat;
 
+			one_param_data.paramno = paramno;
+			one_param_data.paramval = NULL;
+
 			plength = pq_getmsgint(input_message, 4);
 			isNull = (plength == -1);
 
@@ -1764,8 +1789,13 @@ exec_bind_message(StringInfo input_message)
 else
 	pstring = pg_client_to_server(pbuf.data, plength);
 
+/* Now we can log the input string in case of error */
+one_param_data.paramval = pstring;
+
 pval = OidInputFunctionCall(typinput, pstring, typioparam, -1);
 
+one_param_data.paramval = NULL;
+
 /*
  * If we might need to log parameters later, save a copy of
  * the converted string in MessageContext; then free the
@@ -1855,6 +1885,9 @@ exec_bind_message(StringInfo input_message)
 			params->params[paramno].ptype = ptype;
 		}
 
+		/* Pop the error callback */
+		error_context_stack = error_context_stack->previous;
+
 		/*
 		 * Once all parameters have been received, prepare for printing them
 		 * in errors, if configured to do so.  (This is saved in the portal,
@@ -2413,6 +2446,55 @@ errdetail_recovery_conflict(void)
 	return 0;
 }
 
+/*
+ * bind_param_error_callback
+ *
+ * Error context callback used while parsing parameters in a Bind message
+ */
+static void
+bind_param_error_callback(void *arg)
+{
+	BindParamCbData *data = (BindParamCbData *) arg;
+	StringInfoData buf;
+	char	   *quotedval;
+
+	if (data->paramno < 0)
+		return;
+
+	/* If we have a textual value, quote it, and trim if necessary */
+	if (data->paramval)
+	{
+		initStringInfo();
+		appendStringInfoStringQuoted(, data->paramval,
+	 log_parameter_max_length_on_error);
+		quotedval = buf.data;
+	}
+	else
+		quotedval = NULL;
+
+	if (data->portalName && data->portalName[0] != '\0')
+	{
+		if (quotedval)
+			errcontext("portal \"%s\" parameter $%d = %s",
+	   data->portalName, data->paramno + 1, quotedval);
+		else
+			errcontext("portal \"%s\" parameter $%d",
+	   data->portalName, data->paramno + 1);
+	}
+	else
+	{
+		if (quotedval)
+			errcontext("unnamed portal parameter $%d = %s",
+	   data->paramno + 1, quotedval);
+		else
+			errcontext("unnamed portal parameter $%d",
+	   data->paramno + 1);
+	}
+
+	if (quotedval)
+		pfree(quotedval);
+}
+
 /*
  * 

Re: New IndexAM API controlling index vacuum strategies

2021-03-15 Thread Peter Geoghegan
On Mon, Mar 15, 2021 at 4:11 PM Andres Freund  wrote:
> > > I'm not comfortable with this change without adding more safety
> > > checks. If there's ever a case in which the HEAPTUPLE_DEAD case is hit
> > > and the xid needs to be frozen, we'll either cause errors or
> > > corruption. Yes, that's already the case with params->index_cleanup ==
> > > DISABLED, but that's not that widely used.
> >
> > I noticed that Noah's similar 2013 patch [1] added a defensive
> > heap_tuple_needs_freeze() + elog(ERROR) to the HEAPTUPLE_DEAD case. I
> > suppose that that's roughly what you have in mind here?
>
> I'm not sure that's sufficient. If the case is legitimately reachable
> (I'm maybe 60% is not, after staring at it for a long time, but ...),
> then we can't just error out when we didn't so far.

If you're only 60% sure that the heap_tuple_needs_freeze() error thing
doesn't break anything that should work by now then it seems unlikely
that you'll ever get past 90% sure. I think that we should make a
conservative assumption that the defensive elog(ERROR) won't be
sufficient, and proceed on that basis.

> I kinda wonder whether this case should just be handled by just gotoing
> back to the start of the blkno loop, and redoing the pruning. The only
> thing that makes that a bit more complicatd is that we've already
> incremented vacrelstats->{scanned_pages,vacrelstats->tupcount_pages}.

That seems like a good solution to me -- this is a very seldom hit
path, so we can be a bit inefficient without it mattering.

It might make sense to *also* check some things (maybe using
heap_tuple_needs_freeze()) in passing, just for debugging purposes.

> We really should put the per-page work (i.e. the blkno loop body) of
> lazy_scan_heap() into a separate function, same with the
> too-many-dead-tuples branch.

+1.

BTW I've noticed that the code (and code like it) tends to confuse
things that the current VACUUM performed versus things by *some*
VACUUM (that may or may not be current one). This refactoring might be
a good opportunity to think about that as well.

> >  * It is assumed that the caller has checked the tuple with
> >  * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
> >  * (else we should be removing the tuple, not freezing it).
> >
> > Does that need work too?
>
> I'm pretty scared of the index-cleanup-disabled path, for that reason. I
> think the hot path is more likely to be unproblematic, but I'd not bet
> my (nonexistant) farm on it.

Well if we can solve the problem by simply doing pruning once again in
the event of a HEAPTUPLE_DEAD return value from the lazy_scan_heap()
HTSV call, then the comment becomes 100% true (which is not the case
even today).

-- 
Peter Geoghegan




make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-15 Thread Masahiro Ikeda

Hi,

This thread came from another thread about collecting the WAL 
stats([1]).


Is it better to make the stats collector shutdown without writing the 
stats files

if the immediate shutdown is requested?

There was a related discussion([2]) although it's stopped on 12/1/2016.
IIUC, the thread's consensus are

(1) It's useful to make the stats collector shutdown quickly without 
writing the stats files
if the immediate shutdown is requested in some cases because there 
is a possibility
that it takes a long time until the failover happens. The possible 
reasons are that
disk write speed is slow, the stats files are big, and so on. And 
there is no negative impact
on the users because all stats files are removed in a crash recovery 
phase now.


(2) As another aspect, it needs to change the behavior removing all 
stats files because autovacuum
uses the stats. There are some ideas, for example writing the stats 
files every X minutes
(using wal or another mechanism) and even if a crash occurs, the 
startup process can restore
the stats with slightly low freshness. But, it needs to find a way 
how to handle the stats files

when deleting on PITR rewind or stats collector crash happens.

I agree that the above consensus and I think we can make separate two 
patches.

The first one is for (1) and the second one is for (2).

Why don't you apply the patch for (1) first?
I attached the patch based on tsunakawa-san's patch([2]).
(v1-0001-pgstat_avoid_writing_on_sigquit.patch)

At this time, there are no cons points for the users because
the stats files are removed in a crash recovery phase as pointed in the 
discussion.


[1] 
https://www.postgresql.org/message-id/c616cf24bf4ecd818f7cab0900a40842%40oss.nttdata.com
[2] 
https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F5EF25A%40G01JPEXMBYT05


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONdiff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b1e2d94951..03d191dfe6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -386,6 +386,8 @@ static void pgstat_recv_connstat(PgStat_MsgConn *msg, int len);
 static void pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
+static void SignalPgstatHandlerForCrashExit(SIGNAL_ARGS);
+
 /* 
  * Public functions called from postmaster follow
  * 
@@ -725,6 +727,8 @@ pgstat_reset_remove_files(const char *directory)
 		snprintf(fname, sizeof(fname), "%s/%s", directory,
  entry->d_name);
 		unlink(fname);
+
+		elog(DEBUG2, "removing stats file \"%s\"", fname);
 	}
 	FreeDir(dir);
 }
@@ -4821,13 +4825,13 @@ PgstatCollectorMain(int argc, char *argv[])
 
 	/*
 	 * Ignore all signals usually bound to some action in the postmaster,
-	 * except SIGHUP and SIGQUIT.  Note we don't need a SIGUSR1 handler to
-	 * support latch operations, because we only use a local latch.
+	 * except SIGHUP, SIGTERM and SIGQUIT.  Note we don't need a SIGUSR1
+	 * handler to support latch operations, because we only use a local latch.
 	 */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
 	pqsignal(SIGINT, SIG_IGN);
-	pqsignal(SIGTERM, SIG_IGN);
-	pqsignal(SIGQUIT, SignalHandlerForShutdownRequest);
+	pqsignal(SIGTERM, SignalHandlerForShutdownRequest);
+	pqsignal(SIGQUIT, SignalPgstatHandlerForCrashExit);
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, SIG_IGN);
@@ -4852,8 +4856,8 @@ PgstatCollectorMain(int argc, char *argv[])
 	AddWaitEventToSet(wes, WL_SOCKET_READABLE, pgStatSock, NULL, NULL);
 
 	/*
-	 * Loop to process messages until we get SIGQUIT or detect ungraceful
-	 * death of our parent postmaster.
+	 * Loop to process messages until we get SIGTERM or SIGQUIT of our parent
+	 * postmaster.
 	 *
 	 * For performance reasons, we don't want to do ResetLatch/WaitLatch after
 	 * every message; instead, do that only after a recv() fails to obtain a
@@ -4871,7 +4875,7 @@ PgstatCollectorMain(int argc, char *argv[])
 		ResetLatch(MyLatch);
 
 		/*
-		 * Quit if we get SIGQUIT from the postmaster.
+		 * Quit if we get SIGTERM from the postmaster.
 		 */
 		if (ShutdownRequestPending)
 			break;
@@ -7458,3 +7462,17 @@ pgstat_count_slru_truncate(int slru_idx)
 {
 	slru_entry(slru_idx)->m_truncate += 1;
 }
+
+/*
+ * Simple signal handler for handling SIGQUIT to exit quickly without
+ * doing any additional works, for example writing stats files.
+ */
+static void
+SignalPgstatHandlerForCrashExit(SIGNAL_ARGS)
+{
+	/*
+	 * We arrange to do _exit(1) because the stats collector doesn't touch
+	 * shared memory.
+	 */
+	_exit(1);
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6436ae0f48..b9ca39ef95 100644
--- a/src/backend/postmaster/postmaster.c
+++ 

Re: New IndexAM API controlling index vacuum strategies

2021-03-15 Thread Andres Freund
Hi,

On 2021-03-15 13:58:02 -0700, Peter Geoghegan wrote:
> On Mon, Mar 15, 2021 at 12:58 PM Peter Geoghegan  wrote:
> > > I'm not comfortable with this change without adding more safety
> > > checks. If there's ever a case in which the HEAPTUPLE_DEAD case is hit
> > > and the xid needs to be frozen, we'll either cause errors or
> > > corruption. Yes, that's already the case with params->index_cleanup ==
> > > DISABLED, but that's not that widely used.
> >
> > I noticed that Noah's similar 2013 patch [1] added a defensive
> > heap_tuple_needs_freeze() + elog(ERROR) to the HEAPTUPLE_DEAD case. I
> > suppose that that's roughly what you have in mind here?
> 
> I'm not sure if you're arguing that there might be (either now or in
> the future) a legitimate case (a case not involving data corruption)
> where we hit HEAPTUPLE_DEAD, and find we have an XID in the tuple that
> needs freezing. You seem to be suggesting that even throwing an error
> might not be acceptable, but what better alternative is there? Did you
> just mean that we should throw a *better*, more specific error right
> there, when we handle HEAPTUPLE_DEAD? (As opposed to relying on
> heap_prepare_freeze_tuple() to error out instead, which is what would
> happen today.)

Right now (outside of the index-cleanup-disabled path), we very well may
just actually successfully and correctly do the deletion? So there
clearly is another option?

See my email from a few minutes ago for a somewhat crude idea for how to
tackle the issue differently...

Greetings,

Andres Freund




Re: fdatasync performance problem with large number of DB files

2021-03-15 Thread Thomas Munro
On Tue, Mar 16, 2021 at 3:30 AM Paul Guo  wrote:
> By the way, there is a usual case that we could skip fsync: A fsync-ed 
> already standby generated by pg_rewind/pg_basebackup.
> The state of those standbys are surely not 
> DB_SHUTDOWNED/DB_SHUTDOWNED_IN_RECOVERY, so the
> pgdata directory is fsync-ed again during startup when starting those pg 
> instances. We could ask users to not fsync
> during pg_rewind_basebackup, but we probably want to just fsync some files 
> in pg_rewind (see [1]), so better
> let the startup process skip the unnecessary fsync? As to the solution, using 
> guc or writing something in some files like
> backup_label(?) does not seem to be good ideas since
> 1. Use guc, we still expect fsync after real crash recovery so we need to 
> reset the guc also need to specify pgoptions in pg_ctl command.
> 2. Write some hint information to files like backup_label(?) in 
> pg_rewind/pg_basebackup, but people might
>  copy the pgdata directory and then we still need fsync.
> The only one simple solution I can think out is to let user touch a file to 
> hint startup, before starting the pg instance.

As a thought experiment only, I wonder if there is a way to make your
touch-a-special-signal-file scheme more reliable and less dangerous
(considering people might copy the signal file around or otherwise
screw this up).  It seems to me that invalidation is the key, and
"unlink the signal file after the first crash recovery" isn't good
enough.  Hmm  What if the file contained a fingerprint containing...
let's see... checkpoint LSN, hostname, MAC address, pgdata path, ...
(add more seasoning to taste), and then also some flags to say what is
known to be fully fsync'd already: the WAL, pgdata but only as far as
changes up to the checkpoint LSN, or all of pgdata?  Then you could be
conservative for a non-match, but skip the extra work in some common
cases like pg_basebackup, as long as you trust the fingerprint scheme
not to produce false positives.  Or something like that...

I'm not too keen to invent clever new schemes for PG14, though.  This
sync_after_crash=syncfs scheme is pretty simple, and has the advantage
that it's very cheap to do it extra redundant times assuming nothing
else is creating new dirty kernel pages in serious quantities.  Is
that useful enough?  In particular it avoids the dreaded "open
1,000,000 uncached files over high latency network storage" problem.

I don't want to add a hypothetical sync_after_crash=none, because it
seems like generally a bad idea.  We already have a
running-with-scissors mode you could use for that: fsync=off.




Re: New IndexAM API controlling index vacuum strategies

2021-03-15 Thread Andres Freund
Hi,

On 2021-03-15 12:58:33 -0700, Peter Geoghegan wrote:
> On Mon, Mar 15, 2021 at 12:21 PM Andres Freund  wrote:
> > It's evil sorcery. Fragile sorcery. I think Robert, Tom and me all run
> > afoul of edge cases around it in the last few years.
> 
> Right, which is why I thought that I might be missing something; why
> put up with that at all for so long?
> 
> > > But removing the awful "tupgone = true" special case seems to buy us a
> > > lot -- it makes unifying everything relatively straightforward. In
> > > particular, it makes it possible to delay the decision to vacuum
> > > indexes until the last moment, which seems essential to making index
> > > vacuuming optional.
> >
> > You haven't really justified, in the patch or this email, why it's OK to
> > remove the whole logic around HEAPTUPLE_DEAD part of the logic.
> 
> I don't follow.
> 
> > VACUUM can take a long time, and not removing space for all the
> > transactions that aborted while it wa
> 
> I guess that you trailed off here. My understanding is that removing
> the special case results in practically no loss of dead tuples removed
> in practice -- so there are no practical performance considerations
> here.
> 
> Have I missed something?

Forget what I said above - I had intended to remove it after dislogding
something stuck in my brain... But apparently didn't :(. Sorry.


> > I'm not comfortable with this change without adding more safety
> > checks. If there's ever a case in which the HEAPTUPLE_DEAD case is hit
> > and the xid needs to be frozen, we'll either cause errors or
> > corruption. Yes, that's already the case with params->index_cleanup ==
> > DISABLED, but that's not that widely used.
> 
> I noticed that Noah's similar 2013 patch [1] added a defensive
> heap_tuple_needs_freeze() + elog(ERROR) to the HEAPTUPLE_DEAD case. I
> suppose that that's roughly what you have in mind here?

I'm not sure that's sufficient. If the case is legitimately reachable
(I'm maybe 60% is not, after staring at it for a long time, but ...),
then we can't just error out when we didn't so far.


I kinda wonder whether this case should just be handled by just gotoing
back to the start of the blkno loop, and redoing the pruning. The only
thing that makes that a bit more complicatd is that we've already
incremented vacrelstats->{scanned_pages,vacrelstats->tupcount_pages}.

We really should put the per-page work (i.e. the blkno loop body) of
lazy_scan_heap() into a separate function, same with the
too-many-dead-tuples branch.


> Comments above heap_prepare_freeze_tuple() say something about making
> sure that HTSV did not return HEAPTUPLE_DEAD...but that's already
> possible today:
> 
>  * It is assumed that the caller has checked the tuple with
>  * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
>  * (else we should be removing the tuple, not freezing it).
> 
> Does that need work too?

I'm pretty scared of the index-cleanup-disabled path, for that reason. I
think the hot path is more likely to be unproblematic, but I'd not bet
my (nonexistant) farm on it.

Greetings,

Andres Freund




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-15 Thread Alvaro Herrera
On 2021-Feb-26, Alvaro Herrera wrote:

> Hmm, but if we take this approach, then we're still vulnerable to the
> problem that somebody can do DETACH CONCURRENTLY and cancel the wait (or
> crash the server), then mess up the state before doing DETACH FINALIZE:
> when they cancel the wait, the lock will be released.
> 
> I think the right fix is to disallow any action on a partition which is
> pending detach other than DETACH FINALIZE.  (Didn't do that here.)

Here's a fixup patch to do it that way.  I tried running the commands
you showed and one of them immediately dies with the new error message;
I can't cause the bogus constraint to show up anymore.

I'll clean this up for a real submission tomorrow.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)
diff --git a/src/backend/catalog/pg_inherits.c 
b/src/backend/catalog/pg_inherits.c
index 3fd0880ca0..83093f9730 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -535,3 +535,39 @@ DeleteInheritsTuple(Oid inhrelid, Oid inhparent, bool 
detached,
 
return found;
 }
+
+bool
+PartitionHasPendingDetach(Oid partoid)
+{
+   RelationcatalogRelation;
+   ScanKeyData key;
+   SysScanDesc scan;
+   HeapTuple   inheritsTuple;
+
+   /*
+* Find pg_inherits entries by inhrelid.
+*/
+   catalogRelation = table_open(InheritsRelationId, RowExclusiveLock);
+   ScanKeyInit(,
+   Anum_pg_inherits_inhrelid,
+   BTEqualStrategyNumber, F_OIDEQ,
+   ObjectIdGetDatum(partoid));
+   scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId,
+ true, NULL, 1, );
+
+   while (HeapTupleIsValid(inheritsTuple = systable_getnext(scan)))
+   {
+   booldetached;
+
+   detached = ((Form_pg_inherits) 
GETSTRUCT(inheritsTuple))->inhdetached;
+
+   /* Done */
+   systable_endscan(scan);
+   table_close(catalogRelation, RowExclusiveLock);
+
+   return detached;
+   }
+
+   elog(ERROR, "relation %u is not a partition", partoid);
+   return false;   /* keep compiler quiet */
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5986b4dd56..5074c0ff2b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17106,6 +17106,8 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
 */
partRel = table_openrv(name, concurrent ? ShareLock :
   AccessExclusiveLock);
+   if (PartitionHasPendingDetach(RelationGetRelid(partRel)))
+   elog(ERROR, "hey, don't!");
 
/*
 * Check inheritance conditions and either delete the pg_inherits row
diff --git a/src/include/catalog/pg_inherits.h 
b/src/include/catalog/pg_inherits.h
index 155596907c..19243ea9e4 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -61,5 +61,6 @@ extern void StoreSingleInheritance(Oid relationId, Oid 
parentOid,
   int32 
seqNumber);
 extern bool DeleteInheritsTuple(Oid inhrelid, Oid inhparent, bool 
allow_detached,
const char 
*childname);
+extern bool PartitionHasPendingDetach(Oid partoid);
 
 #endif /* PG_INHERITS_H */


Re: [HACKERS] Custom compression methods

2021-03-15 Thread Andres Freund
Hi,

On 2021-03-15 15:29:05 -0400, Robert Haas wrote:
> On Mon, Mar 15, 2021 at 8:14 AM Dilip Kumar  wrote:
> > In the attached patches I have changed this, ...
>
> OK, so just looking over this patch series, here's what I think:
>
> - 0001 and 0002 are now somewhat independent of the rest of this work,
> and could be dropped, but I think they're a good idea, so I'd like to
> commit them. I went over 0001 carefully this morning and didn't find
> any problems. I still need to do some more review of 0002.

I don't particularly like PG_RETURN_HEAPTUPLEHEADER_RAW(). What is "raw"
about it?  It also seems to me like there needs to at least be a
sentence or two explaining when to use which of the functions.

I think heap_copy_tuple_as_raw_datum() should grow an assert checking
there are no external columns?

The commit messages could use a bit more explanation about motivation.


I'm don't like that after 0002 ExecEvalRow(), ExecEvalFieldStoreForm()
contain a nearly identical copy of the same code.  And
make_tuple_from_row() also is similar. It seem that there should be a
heap_form_tuple() version doing this for us?


> - 0003 through 0005 are the core of this patch set. I'd like to get
> them into this release, but I think we're likely to run out of time.


Comments about 0003:
- why is HIDE_TOAST_COMPRESSION useful? Doesn't quite seem to be
  comparable to HIDE_TABLEAM?

- (you comment on this later): toast_get_compression_method() needing to
  fetch some of the data to figure out the compression method is pretty
  painful. Especially because it then goes and throws away that data!

- Adding all these indirect function calls via toast_compression[] just
  for all of two builtin methods isn't fun either.

- I guess NO_LZ4_SUPPORT() is a macro so it shows the proper
  file/function name?

- I wonder if adding compression to the equalTupleDesc() is really
  necessary / won't cause problems (thinking of cases like the
  equalTupleDesc() call in pg_proc.c).

- Is nodeModifyTable.c really the right place for the logic around
  CompareCompressionMethodAndDecompress()?  And is doing it in every
  place that does "user initiated" inserts really the right way? Why
  isn't this done on the tuptoasting level?

- CompareCompressionMethodAndDecompress() is pretty deeply
  indented. Perhaps rewrite a few more of the conditions to be
  continue;?


Comments about 0005:
- I'm personally not really convinced tracking the compression type in
  pg_attribute the way you do is really worth it (. Especially given
  that it's right now only about new rows anyway.  Seems like it'd be
  easier to just treat it as a default for new rows, and dispense with
  all the logic around mismatching compression types etc?


> The biggest thing that jumps out at me while looking at this with
> fresh eyes is that the patch doesn't touch varatt_external.va_extsize
> at all. In a varatt_external, we can't use the va_rawsize to indicate
> the compression method, because there are no bits free, because the 2
> bits not required to store the size are used to indicate what type of
> varlena we've got.

Once you get to varatt_external, you could also just encode it via
vartag_external...


> But, that means that the size of a varlena is limited to 1GB, so there
> are 2 bits free in varatt_external.va_extsize, just like there are in
> va_compressed.va_rawsize. We could store the same two bits in
> varatt_external.va_extsize that we're storing in
> va_compressed.va_rawsize aka va_tcinfo. That's a big deal, because
> then toast_get_compression_method() doesn't have to call
> toast_fetch_datum_slice() any more, which is a rather large savings.
> If it's only impacting pg_column_compression() then whatever, but
> that's not the case: we've got calls to
> CompareCompressionMethodAndDecompress in places like intorel_receive()
> and ExecModifyTable() that look pretty performance-critical.

Yea, I agree, that does seem problematic.


> There's another, rather brute-force approach to this problem, too. We
> could just decide that lz4 will only be used for external data, and
> that there's no such thing as an inline-compressed lz4 varlena.
> deotast_fetch_datum() would just notice that the value is lz4'd and
> de-lz4 it before returning it, since a compressed lz4 datum is
> impossible.

That seems fairly terrible.


> I'm open to being convinced that we don't need to do either of these
> things, and that the cost of iterating over all varlenas in the tuple
> is not so bad as to preclude doing things as you have them here. But,
> I'm afraid it's going to be too expensive.

I mean, I would just define several of those places away by not caring
about tuples in a different compressino formation ending up in a
table...

Greetings,

Andres Freund




Re: [PATCH]: Allow errors in parameter values to be reported during the BIND phase itself..

2021-03-15 Thread Justin Pryzby
On Mon, Mar 15, 2021 at 06:45:49PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > For example:
> > $ python3.5 -c "import pg; db=pg.DB(); q = db.query(\"SET 
> > log_parameter_max_length_on_error=-1;\"); db.prepare('p', 'SELECT 
> > \$1::smallint'); db.query_prepared('p',6);"
> > 2021-01-03 02:21:04.547 CST [20157] ERROR:  value "6" is out of range 
> > for type smallint
> > 2021-01-03 02:21:04.547 CST [20157] CONTEXT:  unnamed portal with 
> > parameters: $1 = '6'
> > 2021-01-03 02:21:04.547 CST [20157] STATEMENT:  SELECT $1::smallint
> 
> > When there are many bind params, this can be useful to determine which is 
> > out
> > of range.  Think 900 int/smallint columns, or less-wide tables being 
> > inserted
> > multiple rows at a time with VALUES(),(),()...
> 
> > Of course, this isn't as good as showing the column name, so I might pursue
> > Tom's suggestion for that at some point.
> 
> I started to look at this, and immediately began to wonder where is the
> previous discussion you're evidently referring to.  Can you dig up an
> archives link?

I think I was referring to this (from the commit message).
https://www.postgresql.org/message-id/flat/canfkh5k-6nnt-4csv1vpb80nq2bzczhfvr5o4vznybsx0wz...@mail.gmail.com

Also, I looked through the original thread, and found this was discussed at the
time:

https://www.postgresql.org/message-id/b1b68453-9756-bd92-306e-a29fc5ad7cd7%402ndquadrant.com
> >> ERROR:  value "62812" is out of range for type smallint
> >> STATEMENT:  SELECT abalance FROM pgbench_accounts WHERE aid = $1;
> >>
> >> (In this case the error message contains the parameter value, so it's
> >> not a very practical case, but it should work, it seems.)
> > I guess this error occurred /while/ binding, so the parameters probably
> > weren't yet all bound by the time of error reporting.
> > That's why the error message came without parameters.
> 
> I see.  But I think that could be fixed.  Change exec_bind_message() to
> loop over the parameters twice: once to save them away, once to actually
> process them.  I think the case of a faulty input value is probably very
> common, so it would be confusing if that didn't work.

https://www.postgresql.org/message-id/resend/20191205231550.GA28677%40alvherre.pgsql
> One problem I noticed is that we don't log parameters when
> exec_bind_message fetches the parameter values.  So the very first
> example problem in testlibpq5 fails to print the values of any
> parameters previously seen.  I don't think this is a real problem in
> practice.  You still get the unparseable value in the error message from
> the input function, so not having the errdetail() does not seem very
> important.

I see that as a deficiency (as Peter did), so I'm requesting to improve that
now.  It's not a bugfix, though.

-- 
Justin




Re: [PATCH]: Allow errors in parameter values to be reported during the BIND phase itself..

2021-03-15 Thread Tom Lane
Justin Pryzby  writes:
> For example:
> $ python3.5 -c "import pg; db=pg.DB(); q = db.query(\"SET 
> log_parameter_max_length_on_error=-1;\"); db.prepare('p', 'SELECT 
> \$1::smallint'); db.query_prepared('p',6);"
> 2021-01-03 02:21:04.547 CST [20157] ERROR:  value "6" is out of range for 
> type smallint
> 2021-01-03 02:21:04.547 CST [20157] CONTEXT:  unnamed portal with parameters: 
> $1 = '6'
> 2021-01-03 02:21:04.547 CST [20157] STATEMENT:  SELECT $1::smallint

> When there are many bind params, this can be useful to determine which is out
> of range.  Think 900 int/smallint columns, or less-wide tables being inserted
> multiple rows at a time with VALUES(),(),()...

> Of course, this isn't as good as showing the column name, so I might pursue
> Tom's suggestion for that at some point.

I started to look at this, and immediately began to wonder where is the
previous discussion you're evidently referring to.  Can you dig up an
archives link?

regards, tom lane




pg_subscription - substream column?

2021-03-15 Thread Peter Smith
I noticed that the PG docs [1] for the catalog pg_subscription doesn't
have any mention of the substream column.

Accidental omission by commit 4648243 [2] from last year?


[1] https://www.postgresql.org/docs/devel/catalog-pg-subscription.html
[2] 
https://github.com/postgres/postgres/commit/464824323e57dc4b397e8b05854d779908b55304

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Calendar support in localization

2021-03-15 Thread Thomas Munro
Hi Surafel,

On Tue, Mar 16, 2021 at 3:48 AM Surafel Temesgen  wrote:
> My country(Ethiopia) is one of the nations that uses different kind of 
> calendar than what PostgreSQL have so we are deprived from the benefit of 
> data datatype. We just uses String to store date that limits our application 
> quality greatly. The lag became even worst once application and system time 
> support is available and it seems to me it is not fair to suggest to add 
> other date data type kind and implementation for just different calendar that 
> even not minor user group. Having calendar support to localization will be 
> very very very very exciting feature for none Gregorian calendar user group 
> and make so loved. As far as i can see the difficult thing is understanding 
> different calendar. I can prepare a patch for Ethiopian calendar once we have 
> consensus.

One key question here is whether you need a different date type or
just different operations (functions, operators etc) on the existing
types.

> I cc Thomas Munro and Vik because they have interest on this area

Last time it came up[1], I got as far as wondering if the best way
would be to write a set of ICU-based calendar functions.  Would it be
enough for your needs to have Ethiopic calendar-aware date arithmetic
(add, subtract a month etc), date part extraction (get the current
Ethiopic day/month/year of a date), display and parsing, and have all
of these as functions that you have to call explicitly, but have them
take the standard built-in date and timestamp types, so that your
tables would store regular date and timestamp values?  If not, what
else do you need?

ICU is very well maintained and widely used software, and PostgreSQL
already depends on it optionally, and that's enabled in all common
distributions.  In other words, maybe all the logic you want exists
already in your process's memory, we just have to figure out how to
reach it from SQL.  Another reason to use ICU is that we can solve
this problem once and then it'll work for many other calendars.

> Please don't suggests to fork from PostgreSQL just for this feature

I would start with an extension, and I'd try to do a small set of
simple functions, to let me write things like:

  icu_format(now(), 'fr_FR@calendar=buddhist') to get a Buddhist
calendar with French words

  icu_date_part('year', current_date, 'am_ET@calendar=traditional') to
get the current year in the Ethiopic calendar (2013 apparently)

Well, the first one probably also needs a format string too, actual
details to be worked out by reading the ICU manual...

Maybe instead of making a new extension, I might try to start from
https://github.com/dverite/icu_ext and see if it makes sense to extend
it to cover calendars.

Maybe one day ICU will become a hard dependency of PostgreSQL and
someone will propose all that stuff into core, and then maybe we could
start to think about the possibility of tighter integration with the
built-in date/time functions (and LC_TIME setting?  seems complicated,
see also problems with setting LC_COLLATE/datcollate to an ICU
collation name, but I digress and that's a far off problem).  I would
also study the SQL standard and maybe DB2 (highly subjective comment:
at a wild guess, the most likely commercial RDBMS to have done a good
job of this if anyone has) to see if they contemplate non-Gregorian
calendars, to get some feel for whether that would eventually be a
possibility to conform with whatever the standard says.

In summary, getting something of very high quality by using a widely
used open source library that we already use seems like a better plan
than trying to write and maintain our own specialist knowledge about
individual calendars.  If there's something you need that can't be
done with its APIs working on top of our regular date and timestamp
types, could you elaborate?

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BybW0LJuLtj3yAUsbOw3DrzK00pGk8JyfpCREzi_LSsg%40mail.gmail.com#393d827f1be589d0ad6ca6b016905e80




Re: Nondeterministic collations and the value returned by GROUP BY x

2021-03-15 Thread Tom Lane
Jim Finnerty  writes:
> PostgreSQL 12 and onward supports nondeterministic collations.  For "GROUP
> BY x",  which value of 'x' will PostgreSQL return in this case?  The first
> value of x?

> The SQL standard (section 8.2) states that the specific value returned is
> implementation-defined, but requires that the value must be one of the
> specific values in the set of values that compare equally:

> d) Depending on the collation, two strings may compare as equal even if they
> are of different lengths or contain different sequences of characters. When
> any of the operations MAX, MIN, and DISTINCT reference a grouping column,
> and the UNION, EXCEPT, and INTERSECT operators refer to character strings,
> *the specific value selected by these operations from a set of such equal
> values is implementation- dependent*.

As I recall, "implementation-dependent" means specifically that we *don't*
have to make any promise about which particular value will be selected.
If it said "implementation-defined" then we would.

I expect that in practice it'd be the first of the group that arrives at
the grouping plan node --- but that doesn't really get you any closer
to being able to say which one it is exactly.  The input is either not
ordered at all, or ordered by something like a Sort node, which itself
is not going to make any promises about which one of a group of peers
is delivered first.

regards, tom lane




Re: Enhance traceability of wal_level changes for backup management

2021-03-15 Thread Stephen Frost
Greetings,

* tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> From: David Steele 
> > As a backup software author, I don't see this feature as very useful.
> > 
> > The problem is that there are lots of ways for WAL to go missing so
> > monitoring the WAL archive for gaps is essential and this feature would
> > not replace that requirement. The only extra information you'd get is
> > the ability to classify the most recent gap as "intentional", maybe.
> 
> But how do you know there's any missing WAL?  I think there are the following 
> cases of missing WAL:

> 1. A WAL segment file is missing.  e.g., 0001 and 0003 exist, but 
> 0002 doesn't.
> 
> 2. All consecutive WAL segment files appear to exist, but some WAL records 
> are missing.
> This occurs ?only? when some WAL-optimized statements are run while wal_level 
> = minimal.
> 
> Currently, backup management tools can detect 1 by scanning through the WAL 
> archive directory.  But the can't notice 2.  The patch addresses this.

They could notice #2 also by scanning the WAL, but that's certainly a
lot more work than just looking in pg_control.

* Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:
> On 08.03.21 03:45, osumi.takami...@fujitsu.com wrote:
> >OK. The basic idea is to enable backup management
> >tools to recognize wal_level drop between*snapshots*.
> >When you have a snapshot of the cluster at one time and another one
> >at different time, with this new parameter, you can see if
> >anything that causes discontinuity from the drop happens
> >in the middle of the two snapshots without efforts to have a look at the 
> >WALs in between.
> 
> Is this an actual problem?  Changing wal_level requires a restart.  Are
> users frequently restarting their servers to change wal_level and then
> wonder why their backups are misbehaving or incomplete?  Why?  Just like
> fsync is "breaks your database", wal_level might as well be "breaks your
> backups".  Is it not documented well enough?

We explicitly document that people can switch the WAL level and restart
to do bulk data loads faster, and there's certainly no shortage of
discussion (including what prompted this thread..) about doing exactly
that.  Adding more documentation around that would certainly be good,
as would changing this:

ereport(WARNING,
(errmsg("WAL was generated with wal_level=minimal, data may be missing"),
errhint("This happens if you temporarily set wal_level=minimal without taking a 
new base backup.")));

into a PANIC instead of a WARNING.  It's simply far too easy to end up
with corruption in the system when doing PITR through a period of time
when the WAL level was set to minimal.  Unfortunately, if the user
didn't happen to know that they needed to take a new full backup after
flipping to minimal and back then they could end up with corruption at
restore/replay time which is certainly not when you want anything to go
wrong.  If it was available in the control file then we could more
proactively make noise at the user to take a new full backup.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: New IndexAM API controlling index vacuum strategies

2021-03-15 Thread Peter Geoghegan
On Mon, Mar 15, 2021 at 12:58 PM Peter Geoghegan  wrote:
> > I'm not comfortable with this change without adding more safety
> > checks. If there's ever a case in which the HEAPTUPLE_DEAD case is hit
> > and the xid needs to be frozen, we'll either cause errors or
> > corruption. Yes, that's already the case with params->index_cleanup ==
> > DISABLED, but that's not that widely used.
>
> I noticed that Noah's similar 2013 patch [1] added a defensive
> heap_tuple_needs_freeze() + elog(ERROR) to the HEAPTUPLE_DEAD case. I
> suppose that that's roughly what you have in mind here?

I'm not sure if you're arguing that there might be (either now or in
the future) a legitimate case (a case not involving data corruption)
where we hit HEAPTUPLE_DEAD, and find we have an XID in the tuple that
needs freezing. You seem to be suggesting that even throwing an error
might not be acceptable, but what better alternative is there? Did you
just mean that we should throw a *better*, more specific error right
there, when we handle HEAPTUPLE_DEAD? (As opposed to relying on
heap_prepare_freeze_tuple() to error out instead, which is what would
happen today.)

That seems like the most reasonable interpretation of your words to
me. That is, I think that you're saying (based in part on remarks on
that other thread [1]) that you believe that fully eliminating the
"tupgone = true" special case is okay in principle, but that more
hardening is needed -- if it ever breaks we want to hear about it.
And, while it would be better to do a much broader refactor to unite
heap_prune_chain() and lazy_scan_heap(), it is not essential (because
the issue is not really new, even without VACUUM (INDEX_CLEANUP
OFF)/"params->index_cleanup == DISABLED").

Which is it?

[1] 
https://www.postgresql.org/message-id/20200724165514.dnu5hr4vvgkssf5p%40alap3.anarazel.de
-- 
Peter Geoghegan




Re: logical replication worker accesses catalogs in error context callback

2021-03-15 Thread Tom Lane
Bharath Rupireddy  writes:
> Thanks for pointing to the changes in the commit message. I corrected
> them. Attaching v4 patch set, consider it for further review.

I took a quick look at this.  I'm quite worried about the potential
performance cost of the v4-0001 patch (the one for fixing
slot_store_error_callback).  Previously, we didn't pay any noticeable
cost for having the callback unless there actually was an error.
As patched, we perform several catalog lookups per column per row,
even in the non-error code path.  That seems like it'd be a noticeable
performance hit.  Just to add insult to injury, it leaks memory.

I propose a more radical but simpler solution: let's just not bother
with including the type names in the context message.  How much are
they really buying?

v4-0002 (for postgres_fdw's conversion_error_callback) has the same
problems, although mitigated a bit by not needing to do any catalog
lookups in the non-join case.  For the join case, I wonder whether
we could push the lookups back to plan setup time, so they don't
need to be done over again for each row.  (Not doing lookups at all
doesn't seem attractive there; it'd render the context message nearly
useless.)  A different idea is to try to get the column names out
of the query's rangetable instead of going to the catalogs.

regards, tom lane




Re: Support tab completion for upper character inputs in psql

2021-03-15 Thread Peter Eisentraut

On 09.02.21 15:48, Tang, Haiying wrote:

I'm still confused about the APPROPRIATE behavior of tab completion.
It seems ALTER table/tablespace  SET/RESET is already case-insensitive.

For example
# alter tablespace dbspace set(e[tab]
# alter tablespace dbspace set(effective_io_concurrency

# alter tablespace dbspace set(E[tab]
# alter tablespace dbspace set(EFFECTIVE_IO_CONCURRENCY


This case completes with a hardcoded list, which is done  
case-insensitively by default.  The cases that complete with a query  
result are not case insensitive right now.  This affects things like


UPDATE T

as well.  I think your first patch was basically right.  But we need to  
understand that this affects all completions with query results, not  
just the one you wanted to fix.  So you should analyze all the callers  
and explain why the proposed change is appropriate.





Re: New IndexAM API controlling index vacuum strategies

2021-03-15 Thread Peter Geoghegan
On Mon, Mar 15, 2021 at 12:21 PM Andres Freund  wrote:
> It's evil sorcery. Fragile sorcery. I think Robert, Tom and me all run
> afoul of edge cases around it in the last few years.

Right, which is why I thought that I might be missing something; why
put up with that at all for so long?

> > But removing the awful "tupgone = true" special case seems to buy us a
> > lot -- it makes unifying everything relatively straightforward. In
> > particular, it makes it possible to delay the decision to vacuum
> > indexes until the last moment, which seems essential to making index
> > vacuuming optional.
>
> You haven't really justified, in the patch or this email, why it's OK to
> remove the whole logic around HEAPTUPLE_DEAD part of the logic.

I don't follow.

> VACUUM can take a long time, and not removing space for all the
> transactions that aborted while it wa

I guess that you trailed off here. My understanding is that removing
the special case results in practically no loss of dead tuples removed
in practice -- so there are no practical performance considerations
here.

Have I missed something?

> > Note that I've merged multiple existing functions in vacuumlazy.c into
> > one: the patch merges lazy_vacuum_all_indexes() and lazy_vacuum_heap()
> > into a single function named vacuum_indexes_mark_unused() (note also
> > that lazy_vacuum_page() has been renamed to mark_unused_page() to
> > reflect the fact that it is now strictly concerned with making LP_DEAD
> > line pointers LP_UNUSED).
>
> It doesn't really seem to be *just* doing that - doing the
> PageRepairFragmentation() and all-visible marking is relevant too?

I wrote it in a day, just to show what I had in mind. The renaming
stuff is a part of unifying those functions, which can be discussed
after the "tupgone = true" special case is removed. It's not like I'm
set on the details that you see in the patch.

> For me the patch does way too many things at once, making it harder than
> necessary to review, test (including later bisection). I'd much rather
> see the tupgone thing addressed on its own, without further changes, and
> then the rest done in separate commits subsequently.

I agree that it should be broken up for review.

> I'm not comfortable with this change without adding more safety
> checks. If there's ever a case in which the HEAPTUPLE_DEAD case is hit
> and the xid needs to be frozen, we'll either cause errors or
> corruption. Yes, that's already the case with params->index_cleanup ==
> DISABLED, but that's not that widely used.

I noticed that Noah's similar 2013 patch [1] added a defensive
heap_tuple_needs_freeze() + elog(ERROR) to the HEAPTUPLE_DEAD case. I
suppose that that's roughly what you have in mind here?

I suppose that that was pre-9.3-MultiXacts, and so now it's more complicated.

Comments above heap_prepare_freeze_tuple() say something about making
sure that HTSV did not return HEAPTUPLE_DEAD...but that's already
possible today:

 * It is assumed that the caller has checked the tuple with
 * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
 * (else we should be removing the tuple, not freezing it).

Does that need work too?

> See
> https://postgr.es/m/20200724165514.dnu5hr4vvgkssf5p%40alap3.anarazel.de
> for some discussion around the fragility.

That's a good reference, thanks.

[1] 
https://www.postgresql.org/message-id/20130130020456.GE3524%40tornado.leadboat.com
-- 
Peter Geoghegan




Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2021-03-15 Thread Alvaro Herrera
On 2021-Mar-15, Justin Pryzby wrote:

> Are you going to update the assertion ?
> 
> +#if 0
>   
>
> Assert((meta == META_NONE && varprefix == NULL) ||
>   
>
>((meta == META_GSET || meta == META_ASET) && varprefix != 
> NULL));   
> 
> +#endif   
>   
>

Yeah, caught that just after sending.  Here's a notpatch.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"La virtud es el justo medio entre dos defectos" (Aristóteles)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ba7b35d83c..e69d43b26b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2823,13 +2823,12 @@ readCommandResponse(CState *st, MetaCommand meta, char 
*varprefix)
int qrynum = 0;
 
/*
-* varprefix should be set only with \gset or \aset, and SQL commands do
-* not need it.
+* varprefix should be set only with \gset or \aset, and \endpipeline 
and
+* SQL commands do not need it.
 */
-#if 0
Assert((meta == META_NONE && varprefix == NULL) ||
+  ((meta == META_ENDPIPELINE) && varprefix == NULL) ||
   ((meta == META_GSET || meta == META_ASET) && varprefix != 
NULL));
-#endif
 
res = PQgetResult(st->con);
 


Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-03-15 Thread Tom Lane
[ Sorry for not looking at this thread sooner ]

Bharath Rupireddy  writes:
> Currently, $subject is not allowed. We do plan the mat view query
> before every refresh. I propose to show the explain/explain analyze of
> the select part of the mat view in case of Refresh Mat View(RMV).

TBH, I think we should reject this.  The problem with it is that it
binds us to the assumption that REFRESH MATERIALIZED VIEW has an
explainable plan.  There are various people poking at ideas like
incremental matview updates, which might rely on some implementation
that doesn't exactly equate to a SQL query.  Incremental updates are
hard enough already; they'll be even harder if they also have to
maintain compatibility with a pre-existing EXPLAIN behavior.

I don't really see that this feature buys us anything you can't
get by explaining the view's query, so I think we're better advised
to keep our options open about how REFRESH might be implemented
in future.

regards, tom lane




Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Tue, Mar 16, 2021 at 2:32 AM Mark Dilger
 wrote:
>
> We do test corrupt relations.  We intentionally corrupt the pages within 
> corrupted heap tables to check that they get reported as corrupt.  (See 
> src/bin/pg_amcheck/t/004_verify_heapam.pl)

I disagree.  You're testing a modified version of the pages in OS
cache, which is very likely to be different from real world
corruption.  Those usually end up with a discrepancy between storage
and OS cache and this scenario isn't tested nor documented.




Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger


> On Mar 15, 2021, at 11:57 AM, Tom Lane  wrote:
> 
> Yeah, that could be phrased better.

Attaching the 0001 patch submitted earlier, plus 0002 which fixes the confusing 
corruption message.




v6-0001-Turning-off-autovacuum-during-corruption-tests.patch
Description: Binary data


v6-0002-Fixing-a-confusing-amcheck-corruption-message.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: [HACKERS] Custom compression methods

2021-03-15 Thread Robert Haas
On Mon, Mar 15, 2021 at 8:14 AM Dilip Kumar  wrote:
> In the attached patches I have changed this, ...

OK, so just looking over this patch series, here's what I think:

- 0001 and 0002 are now somewhat independent of the rest of this work,
and could be dropped, but I think they're a good idea, so I'd like to
commit them. I went over 0001 carefully this morning and didn't find
any problems. I still need to do some more review of 0002.
- 0003 through 0005 are the core of this patch set. I'd like to get
them into this release, but I think we're likely to run out of time.
- I don't think we want to proceed with 0006 at this time. It needs
broader buy-in, I think, and I think it also needs some other
improvements, at the least to the comments.
- 0007 is not intended for commit, but just exists to fool the
CommitFest bot into testing the feature.

Regarding 0003:

The biggest thing that jumps out at me while looking at this with
fresh eyes is that the patch doesn't touch varatt_external.va_extsize
at all. In a varatt_external, we can't use the va_rawsize to indicate
the compression method, because there are no bits free, because the 2
bits not required to store the size are used to indicate what type of
varlena we've got. But, that means that the size of a varlena is
limited to 1GB, so there are 2 bits free in
varatt_external.va_extsize, just like there are in
va_compressed.va_rawsize. We could store the same two bits in
varatt_external.va_extsize that we're storing in
va_compressed.va_rawsize aka va_tcinfo. That's a big deal, because
then toast_get_compression_method() doesn't have to call
toast_fetch_datum_slice() any more, which is a rather large savings.
If it's only impacting pg_column_compression() then whatever, but
that's not the case: we've got calls to
CompareCompressionMethodAndDecompress in places like intorel_receive()
and ExecModifyTable() that look pretty performance-critical.

I think that CompareCompressionMethodAndDecompress() could be
redesigned to be more efficient by moving more of the per-tuple work
into a separate setup phase. Consider a case where the tuple has 300
columns. 299 of them are fixed-with, but column 100 is a varlena. In
an ideal world, we would do slot_getsomeattrs(slot, 100). Then we
would check whether column is not null, whether it is compressed, and
whether the compression method is the one we want. If recompression is
required, then we must slot_getallattrs(slot), memcpy all the values
to the virtual slot created for this purpose, and decompress and
recompress column 100. But, if column 100 is not null, then we need
not ever deform beyond column 100, and in no case do we need to
iterate over all 300 attributes. But the current code will do just
that. For every new tuple, it loops again over every attribute and
re-discovers which ones are varlenas. That's kinda the pits.

I got thinking about this after looking at ExecFilterJunk(). That
function is extremely simple precisely because all the work of
figuring out what should be done has been precomputed. All the smarts
are in cleanmap[], which is set up before we actually begin execution.
In a similar way, you can imagine creating some sort of object, let's
call it a CompressedAttributeFilter, that looks at the tupledesc
figures out from the tupledesc which columns we need to consider
recompressing and puts them in an array. Then you have a struct that
stores a pointer to the array, the number of elements in the array,
and the value of the last array element. You pass this struct to what
is now CompareCompressionMethodAndDecompress() and it can now run more
like what I described above.

It's possible to imagine doing even better. Imagine that for every
column we maintain an attcompression value and an
attpreservecompression value. The former indicates the compression
type for the column or '\0' if it cannot be compressed, and the latter
indicates whether any other compression type might be present. Then,
when we build the CompressedAttributeFilter object, we can exclude
varlena attributes if attpreservecompression is false and
attcompression is '\0' or matches the attcompression value for the
corresponding attribute in the table into which ExecModifyTable() or
intorel_receive() will be putting the tuple. This seems quite complex
in terms of bookkeeping, but it would allow us to elide basically all
of the per-tuple bookkeeping in a lot of common cases, such as UPDATE,
or an INSERT or CTAS into a table that's using the same compression
method as the source data. You could probably contrive it so that you
have a CompressedAttributeFilter pointer that's NULL if no such
treatment is required, just like we already do for junkfilter.

There's another, rather brute-force approach to this problem, too. We
could just decide that lz4 will only be used for external data, and
that there's no such thing as an inline-compressed lz4 varlena.
deotast_fetch_datum() would just notice that the value is lz4'd and
de-lz4 it before returning 

Re: New IndexAM API controlling index vacuum strategies

2021-03-15 Thread Andres Freund
Hi,

On 2021-03-14 19:04:34 -0700, Peter Geoghegan wrote:
> Attached is a POC-quality revision of Masahiko's
> skip_index_vacuum.patch [1]. There is an improved design for skipping
> index vacuuming (improved over the INDEX_CLEANUP stuff from Postgres
> 12). I'm particularly interested in your perspective on this
> refactoring stuff, Robert, because you ran into the same issues after
> initial commit of the INDEX_CLEANUP reloption feature [2] -- you ran
> into issues with the "tupgone = true" special case. This is the case
> where VACUUM considers a tuple dead that was not marked LP_DEAD by
> pruning, and so needs to be killed in the second heap scan in
> lazy_vacuum_heap() instead.

It's evil sorcery. Fragile sorcery. I think Robert, Tom and me all run
afoul of edge cases around it in the last few years.


> But removing the awful "tupgone = true" special case seems to buy us a
> lot -- it makes unifying everything relatively straightforward. In
> particular, it makes it possible to delay the decision to vacuum
> indexes until the last moment, which seems essential to making index
> vacuuming optional.

You haven't really justified, in the patch or this email, why it's OK to
remove the whole logic around HEAPTUPLE_DEAD part of the logic.

VACUUM can take a long time, and not removing space for all the
transactions that aborted while it wa


> Note that I've merged multiple existing functions in vacuumlazy.c into
> one: the patch merges lazy_vacuum_all_indexes() and lazy_vacuum_heap()
> into a single function named vacuum_indexes_mark_unused() (note also
> that lazy_vacuum_page() has been renamed to mark_unused_page() to
> reflect the fact that it is now strictly concerned with making LP_DEAD
> line pointers LP_UNUSED).

It doesn't really seem to be *just* doing that - doing the
PageRepairFragmentation() and all-visible marking is relevant too?

For me the patch does way too many things at once, making it harder than
necessary to review, test (including later bisection). I'd much rather
see the tupgone thing addressed on its own, without further changes, and
then the rest done in separate commits subsequently.


I don't like vacuum_indexes_mark_unused() as a name. That sounds like
the index is marked unused, not index entries pointing to tuples.  Don't
really like mark_unused_page() either for similar reasons - but it's not
quite as confusing.



> - if (tupgone)
> - {
> - lazy_record_dead_tuple(dead_tuples, 
> &(tuple.t_self));
> - 
> HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
> - 
>>latestRemovedXid);
> - tups_vacuumed += 1;
> - has_dead_items = true;
> - }
> - else
> - {
> - booltuple_totally_frozen;
> + num_tuples += 1;
> + hastup = true;
>  
> - num_tuples += 1;
> - hastup = true;
> + /*
> +  * Each non-removable tuple must be checked to see if 
> it needs
> +  * freezing.  Note we already have exclusive buffer 
> lock.
> +  */
> + if (heap_prepare_freeze_tuple(tuple.t_data,
> + 
>   relfrozenxid, relminmxid,
> + 
>   FreezeLimit, MultiXactCutoff,
> + 
>   [nfrozen],
> + 
>   _totally_frozen))
> + frozen[nfrozen++].offset = offnum;

I'm not comfortable with this change without adding more safety
checks. If there's ever a case in which the HEAPTUPLE_DEAD case is hit
and the xid needs to be frozen, we'll either cause errors or
corruption. Yes, that's already the case with params->index_cleanup ==
DISABLED, but that's not that widely used.

See
https://postgr.es/m/20200724165514.dnu5hr4vvgkssf5p%40alap3.anarazel.de
for some discussion around the fragility.

Greetings,

Andres Freund




Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 11:57 AM, Tom Lane  wrote:
> 
> Do we have a strong enough lock on
> the table under examination to be sure that autovacuum couldn't remove
> a dead toast entry before we reach it?

The main table and the toast table are only locked with AccessShareLock.  Each 
page in the main table is locked with BUFFER_LOCK_SHARE.  Toast is not checked 
unless the tuple passes visibility checks verifying the tuple is not dead.

>  But this would only be an
> issue if we are trying to check validity of toasted fields within
> known-dead tuples, which I would argue we shouldn't, since lock or
> no lock there's no guarantee the toast entry is still there.

It does not intentionally check toasted fields within dead tuples.  If that is 
happening, it's a bug, possibly in the visibility function.  But I'm not seeing 
a specific reason to assume that is the issue.  If we still see the complaint 
on tern or hornet after committing the patch to turn off autovacuum, we will be 
able to rule out the theory that the toast was removed by autovacuum.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2021-03-15 Thread Justin Pryzby
Are you going to update the assertion ?

+#if 0  

   
Assert((meta == META_NONE && varprefix == NULL) ||  

   
   ((meta == META_GSET || meta == META_ASET) && varprefix != 
NULL)); 
  
+#endif 

   





Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 11:32 AM, Mark Dilger  
> wrote:
> 
> If you had a real, not fake, collation provider which actually provided a 
> collation with an actual version number, stopped the server, changed the 
> behavior of the collation as well as its version number, started the server, 
> and ran REINDEX (OUTDATED), I think that would be a more real-world test.  
> I'm not demanding that you write such a test.  I'm just saying that it is 
> strange that we don't have coverage for this anywhere, and was asking if you 
> think there is such coverage, because, you know, maybe I just didn't see 
> where that test was lurking.

I should add some context regarding why I mentioned this issue at all.

Not long ago, if an upgrade of icu or libc broke your collations, you were sad. 
 But postgres didn't claim to be competent to deal with this problem, so it was 
just a missing feature.  Now, with REINDEX (OUTDATED), we're really implying, 
if not outright saying, that postgres knows how to deal with collation 
upgrades.  I feel uncomfortable that v14 will make such a claim with not a 
single regression test confirming such a claim.  I'm happy to discover that 
such a test is lurking somewhere and I just didn't see it.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-15 Thread Tom Lane
Mark Dilger  writes:
> On Mar 15, 2021, at 10:04 AM, Tom Lane  wrote:
>> These animals have somewhat weird alignment properties: MAXALIGN is 8
>> but ALIGNOF_DOUBLE is only 4.  I speculate that that is affecting their
>> choices about whether an out-of-line TOAST value is needed, breaking
>> this test case.

> The logic in verify_heapam only looks for a value in the toast table if
> the tuple it gets from the main table (in this case, from pg_statistic)
> has an attribute that claims to be toasted.  The error message we're
> seeing that you quoted above simply means that no entry exists in the
> toast table.

Yeah, that could be phrased better.  Do we have a strong enough lock on
the table under examination to be sure that autovacuum couldn't remove
a dead toast entry before we reach it?  But this would only be an
issue if we are trying to check validity of toasted fields within
known-dead tuples, which I would argue we shouldn't, since lock or
no lock there's no guarantee the toast entry is still there.

Not sure that I believe the theory that this is from bad luck of
concurrent autovacuum timing, though.  The fact that we're seeing
this on just those two animals suggests strongly to me that it's
architecture-correlated, instead.

regards, tom lane




Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2021-03-15 Thread Alvaro Herrera
Here's what seems a final version of the patch.  I renamed one more
function: PQsendPipeline is now PQpipelineSync.  I also reworded the
docs in a couple of places, added a few tests to the pgbench patch, and
made it work.

Note the pgbench results in pipeline mode:

./pgbench -r -Mextended -n -f 
/home/alvherre/Code/pgsql-build/pipeline/src/bin/pgbench/tmp_check/t_001_pgbench_with_server_main_data/001_pgbench_pipeline
 -c 100 -t1
pgbench (PostgreSQL) 14.0
transaction type: 
/home/alvherre/Code/pgsql-build/pipeline/src/bin/pgbench/tmp_check/t_001_pgbench_with_server_main_data/001_pgbench_pipeline
scaling factor: 1
query mode: extended
number of clients: 100
number of threads: 1
number of transactions per client: 1
number of transactions actually processed: 100/100
latency average = 2.316 ms
initial connection time = 113.859 ms
tps = 43182.438635 (without initial connection time)
statement latencies in milliseconds:
 0.000  \startpipeline
 0.000  select 1;
 0.000  select 1;
 0.000  select 1;
 0.000  select 1;
 0.000  select 1;
 0.000  select 1;
 0.000  select 1;
 0.000  select 1;
 0.000  select 1;
 0.000  select 1;
 1.624  \endpipeline

If I just replace the \startpipeline and \endpipeline lines with BEGIN
and COMMIT respectively, I get this:

tps = 10220.259051 (without initial connection time)

 0.830  begin;
 0.765  select 1;
 0.752  select 1;
 0.753  select 1;
 0.755  select 1;
 0.754  select 1;
 0.755  select 1;
 0.757  select 1;
 0.756  select 1;
 0.756  select 1;
 0.756  select 1;
 0.750  commit;

Yes, you could say that this is a ltle bit unfair -- but it seems
quite impressive nonetheless.

-- 
Álvaro Herrera39°49'30"S 73°17'W
>From 5e4fdd5246d559caf0d75ad74001f09a48ec4c0e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 15 Mar 2021 15:05:22 -0300
Subject: [PATCH v37 1/2] Implement pipeline mode in libpq
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pipeline mode in libpq lets an application avoid the Sync messages in
the FE/BE protocol that are implicit in the old libpq API after each
query.  The application can then insert Sync at its leisure with a new
libpq function PQpipelineSync.  This can lead to substantial reductions
in query latency.

Co-authored-by: Craig Ringer 
Co-authored-by: Matthieu Garrigues 
Co-authored-by: Álvaro Herrera 
Reviewed-by: Andres Freund 
Reviewed-by: Aya Iwata 
Reviewed-by: Daniel Vérité 
Reviewed-by: David G. Johnston 
Reviewed-by: Justin Pryzby 
Reviewed-by: Kirk Jamison 
Reviewed-by: Michael Paquier 
Reviewed-by: Nikhil Sontakke 
Reviewed-by: Vaishnavi Prabakaran 
Reviewed-by: Zhihong Yu 

Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=b4dm...@mail.gmail.com
Discussion: https://postgr.es/m/cajkzx4t5e-2cqe3dtv2r78dyfvz+in8py7a8marvlhs_pg7...@mail.gmail.com
---
 doc/src/sgml/libpq.sgml   |  522 ++-
 doc/src/sgml/lobj.sgml|4 +
 doc/src/sgml/ref/pgbench.sgml |   21 +
 .../libpqwalreceiver/libpqwalreceiver.c   |6 +
 src/bin/pg_amcheck/pg_amcheck.c   |2 +
 src/interfaces/libpq/exports.txt  |4 +
 src/interfaces/libpq/fe-connect.c |   37 +-
 src/interfaces/libpq/fe-exec.c|  717 -
 src/interfaces/libpq/fe-protocol3.c   |   77 +-
 src/interfaces/libpq/libpq-fe.h   |   21 +-
 src/interfaces/libpq/libpq-int.h  |   60 +-
 src/test/modules/Makefile |1 +
 src/test/modules/libpq_pipeline/.gitignore|5 +
 src/test/modules/libpq_pipeline/Makefile  |   20 +
 src/test/modules/libpq_pipeline/README|1 +
 .../modules/libpq_pipeline/libpq_pipeline.c   | 1303 +
 .../libpq_pipeline/t/001_libpq_pipeline.pl|   28 +
 src/tools/msvc/Mkvcbuild.pm   |9 +-
 src/tools/pgindent/typedefs.list  |2 +
 19 files changed, 2727 insertions(+), 113 deletions(-)
 create mode 100644 src/test/modules/libpq_pipeline/.gitignore
 create mode 100644 src/test/modules/libpq_pipeline/Makefile
 create mode 100644 src/test/modules/libpq_pipeline/README
 create mode 100644 src/test/modules/libpq_pipeline/libpq_pipeline.c
 create mode 100644 src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 910e9a81ea..be674fbaa9 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3180,6 +3180,33 @@ ExecStatusType PQresultStatus(const PGresult *res);

   
  
+
+ 
+  PGRES_PIPELINE_SYNC
+  
+   
+The PGresult represents a
+synchronization point in pipeline mode, requested 

Re: Tightening up allowed custom GUC names

2021-03-15 Thread Tom Lane
[ getting back to this, after a bit of procrastination ]

Andrew Dunstan  writes:
> On 2/11/21 1:32 PM, Tom Lane wrote:
>> Noah Misch  writes:
>>> On Tue, Feb 09, 2021 at 05:34:37PM -0500, Tom Lane wrote:
 * A case could be made for tightening things up a lot more, and not
 allowing anything that doesn't look like an identifier.  I'm not
 pushing for that, as it seems more likely to break existing
 applications than the narrow restriction proposed here.  But I could
 live with it if people prefer that way.

>>> I'd prefer that.  Characters like backslash, space, and double quote have
>>> significant potential to reveal bugs, while having negligible application
>>> beyond revealing bugs.

> That might be a bit restrictive. I could at least see allowing '-' as
> reasonable, and maybe ':'. Not sure about other punctuation characters.
> OTOH I'd be surprised if the identifier restriction would burden a large
> number of people.

We can't allow '-', for the specific reason that it won't work as a -c
argument (thanks to -c's translation of '-' to '_').  The whole point here
is to prevent corner cases like that.  ':' would be all right, but I think
it's a lot simpler to explain and a lot harder to break in future if we
just say that the names have to be valid identifiers.

Patch that does it like that attached.

(I concur with the downthread opinions that we shouldn't back-patch this.)

regards, tom lane

diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 7885a169bb..9498bbea2f 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -282,7 +282,7 @@ ProcessConfigFileInternal(GucContext context, bool applySettings, int elevel)
 		 * Try to find the variable; but do not create a custom placeholder if
 		 * it's not there already.
 		 */
-		record = find_option(item->name, false, elevel);
+		record = find_option(item->name, false, true, elevel);
 
 		if (record)
 		{
@@ -306,7 +306,7 @@ ProcessConfigFileInternal(GucContext context, bool applySettings, int elevel)
 			/* Now mark it as present in file */
 			record->status |= GUC_IS_IN_FILE;
 		}
-		else if (strchr(item->name, GUC_QUALIFIER_SEPARATOR) == NULL)
+		else if (!valid_custom_variable_name(item->name))
 		{
 			/* Invalid non-custom variable, so complain */
 			ereport(elevel,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 855076b1fd..46434950a7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -5252,6 +5252,45 @@ add_guc_variable(struct config_generic *var, int elevel)
 	return true;
 }
 
+/*
+ * Decide whether a proposed custom variable name is allowed.
+ *
+ * It must be "identifier.identifier", where the rules for what is an
+ * identifier agree with scan.l.
+ */
+static bool
+valid_custom_variable_name(const char *name)
+{
+	int			num_sep = 0;
+	bool		name_start = true;
+
+	for (const char *p = name; *p; p++)
+	{
+		if (*p == GUC_QUALIFIER_SEPARATOR)
+		{
+			if (name_start)
+return false;	/* empty name component */
+			num_sep++;
+			name_start = true;
+		}
+		else if (strchr("ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+		"abcdefghijklmnopqrstuvwxyz", *p) != NULL ||
+ IS_HIGHBIT_SET(*p))
+		{
+			/* okay as first or non-first character */
+			name_start = false;
+		}
+		else if (!name_start && strchr("0123456789_$", *p) != NULL)
+			 /* okay as non-first character */ ;
+		else
+			return false;
+	}
+	if (name_start)
+		return false;			/* empty name component */
+	/* OK if we had exactly one separator */
+	return (num_sep == 1);
+}
+
 /*
  * Create and add a placeholder variable for a custom variable name.
  */
@@ -5299,12 +5338,23 @@ add_placeholder_variable(const char *name, int elevel)
 }
 
 /*
- * Look up option NAME.  If it exists, return a pointer to its record,
- * else return NULL.  If create_placeholders is true, we'll create a
- * placeholder record for a valid-looking custom variable name.
+ * Look up option "name".  If it exists, return a pointer to its record.
+ * Otherwise, if create_placeholders is true and name is a valid-looking
+ * custom variable name, we'll create and return a placeholder record.
+ * Otherwise, if skip_errors is true, then we silently return NULL for
+ * an unrecognized or invalid name.  Otherwise, the error is reported at
+ * error level elevel (and we return NULL if that's less than ERROR).
+ *
+ * Note: internal errors, primarily out-of-memory, draw an elevel-level
+ * report and NULL return regardless of skip_errors.  Hence, callers must
+ * handle a NULL return whenever elevel < ERROR, but they should not need
+ * to emit any additional error message.  (In practice, internal errors
+ * can only happen when create_placeholders is true, so callers passing
+ * false need not think terribly hard about this.)
  */
 static struct config_generic *
-find_option(const char *name, bool create_placeholders, int elevel)
+find_option(const char *name, 

Re: Parser Hook

2021-03-15 Thread Joshua Drake
>
>
>
> Also, I'm not sure that many extensions would really benefit from custom
> utility command, as you can already do pretty much anything you want using
> SQL
> functions.  For instance it would be nice for hypopg to be able to support
>
> CREATE HYPOTHETICAL INDEX ...
>
> rather than
>
> SELECT hypopg_create_index('CREATE INDEX...')
>
> But really the only benefit would be autocompletion, which still wouldn't
> be
> possible as psql autocompletion won't be extended.  And even if it somehow
> was,
> I wouldn't expect all psql clients to be setup as needed.
>

"technically" speaking you are correct, usability speaking you are not. We
ran into this discussion previously when dealing with replication. There is
certainly a history to calling functions to do what the grammar (from a
usability perspective) should do and that is not really a good history. It
is just what we are all used to. Looking at what you wrote above as a DBA
or even an average developer: CREATE HYPOTHETICAL INDEX makes much more
sense than the SELECT execution.

JD

P.S. I had to write HYPOTHETICAL 4 times, I kept typing HYPOTECHNICAL :/


Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger


> On Mar 15, 2021, at 11:11 AM, Mark Dilger  
> wrote:
> 
> I will submit a patch that turns off autovacuum for the test node shortly.



v5-0001-Turning-off-autovacuum-during-corruption-tests.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 11:10 AM, Julien Rouhaud  wrote:
> 
> On Mon, Mar 15, 2021 at 10:56:50AM -0700, Mark Dilger wrote:
>> 
>> 
>>> On Mar 15, 2021, at 10:50 AM, Julien Rouhaud  wrote:
>>> 
>>> On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:
 I'm saying that your patch seems to call down to 
 get_collation_actual_version() via get_collation_version_for_oid() from 
 your new function do_check_index_has_outdated_collation(), but I'm not 
 seeing how that gets exercised.
>>> 
>>> It's a little bit late here so sorry if I'm missing something.
>>> 
>>> do_check_index_has_outdated_collation() is called from
>>> index_has_outdated_collation() which is called from
>>> index_has_outdated_dependency() which is called from
>>> RelationGetIndexListFiltered(), and that function is called when passing the
>>> OUTDATED option to REINDEX (and reindexdb --outdated).  So this is exercised
>>> with added tests for both matching and non matching collation version.
>> 
>> Ok, fair enough.  I was thinking about the case where the collation actually 
>> returns a different version number because it (the C library providing the 
>> collation) got updated, but I think you've answered already that you are not 
>> planning to test that case, only the case where pg_depend is modified to 
>> have a bogus version number.
> 
> This infrastructure is supposed to detect that the collation library *used to*
> return a different version before it was updated.  And that's exactly what
> we're testing by manually updating the refobjversion.
> 
>> It seems a bit odd to me that a feature intended to handle cases where 
>> collations are updated is not tested via having a collation be updated 
>> during the test.  It leaves open the possibility that something differs 
>> between the test and reindexed run after real world collation updates.  But 
>> that's for the committer who picks up your patch to decide, and perhaps it 
>> is unfair to make your patch depend on addressing that issue.
> 
> Why is that odd?  We're testing that we're correctly storing the collation
> version during index creating and correctly detecting a mismatch.  Having a
> fake collation provider to return a fake version number won't add any more
> coverage unless I'm missing something.
> 
> It's similar to how we test the various corruption scenario.  AFAIK we're not
> providing custom drivers to write corrupted data but we're simply simulating a
> corruption overwriting some blocks.

We do test corrupt relations.  We intentionally corrupt the pages within 
corrupted heap tables to check that they get reported as corrupt.  (See 
src/bin/pg_amcheck/t/004_verify_heapam.pl)  Admittedly, the corruptions used in 
the tests are not necessarily representative of corruptions that might occur in 
the wild, but that is a hard problem to solve, since we don't know the 
statistical distribution of corruptions in the wild.

If you had a real, not fake, collation provider which actually provided a 
collation with an actual version number, stopped the server, changed the 
behavior of the collation as well as its version number, started the server, 
and ran REINDEX (OUTDATED), I think that would be a more real-world test.  I'm 
not demanding that you write such a test.  I'm just saying that it is strange 
that we don't have coverage for this anywhere, and was asking if you think 
there is such coverage, because, you know, maybe I just didn't see where that 
test was lurking.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Postgres crashes at memcopy() after upgrade to PG 13.

2021-03-15 Thread Avinash Kumar
Hi,

On Mon, Mar 15, 2021 at 1:18 PM Peter Geoghegan  wrote:

> On Mon, Mar 15, 2021 at 6:56 AM Avinash Kumar
>  wrote:
> > psql:amchecksql.sql:17: DEBUG:  leaf block 1043751 of index
> "idx_id_mtime" has no first data item
>
> That one is harmless.
>
> > And one error as follows.
> >
> > psql:amchecksql.sql:17: ERROR:  down-link lower bound invariant violated
> for index "some_other_index"
>
> That indicates corruption. Can you tie this back to the crash? Is it
> the same index?
>
No, that's not the same index.  The Index discussed in the previous
messages shows the following output.

DEBUG:  verifying consistency of tree structure for index "idx_id_mtime"
with cross-level checks
DEBUG:  verifying level 2 (true root level)
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)
DEBUG:  verifying that tuples from index "idx_id_mtime" are present in
"player"
DEBUG:  finished verifying presence of 1966412 tuples from table "player"
with bitset 29.89% set
LOG:  duration: 3341.755 ms  statement: SELECT bt_index_parent_check(index
=> c.oid, heapallindexed => true), c.relname, c.relpages FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod
= am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON
c.relnamespace = n.oid WHERE am.amname = 'btree' AND c.relpersistence !=
't' AND c.relkind = 'i' AND i.indisready AND i.indisvalid AND indexrelid =
80774 AND n.nspname = 'public' ORDER BY c.relpages DESC;
 bt_index_parent_check | relname | relpages
---+-+--
   | idx_id_mtime | 8439
(1 row)


> --
> Peter Geoghegan
>


-- 
Regards,
Avi.


Type of wait events WalReceiverWaitStart and WalSenderWaitForWAL

2021-03-15 Thread Fujii Masao

The wait event WalReceiverWaitStart has been categorized in the type Client.
But why? Walreceiver is waiting for startup process to set the lsn and
timeline while it is reporting WalReceiverWaitStart. So its type should be IPC,
instead?

The wait event WalSenderWaitForWAL has also been categorized in the type
Client. While this wait event is being reported, logical replication walsender
is waiting for not only new WAL to be flushed but also the socket to be
readable and writeable (if there is pending data). I guess that this is why
its type is Client. But ISTM walsender is *mainly* waiting for new WAL to be
flushed by other processes during that period, so I think that it's better
to use IPC as the type of the wait event WalSenderWaitForWAL. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 10:04 AM, Tom Lane  wrote:
> 
> Looks like we're not quite out of the woods, as hornet and tern are
> still unhappy:
> 
> #   Failed test 'pg_amcheck excluding all corrupt schemas status (got 2 vs 
> expected 0)'
> #   at t/003_check.pl line 498.
> 
> #   Failed test 'pg_amcheck excluding all corrupt schemas stdout /(?^:^$)/'
> #   at t/003_check.pl line 498.
> #   'heap table "db1"."pg_catalog"."pg_statistic", block 2, 
> offset 1, attribute 27:
> # final toast chunk number 0 differs from expected value 1
> # heap table "db1"."pg_catalog"."pg_statistic", block 2, offset 1, attribute 
> 27:
> # toasted value for attribute 27 missing from toast table
> # '
> # doesn't match '(?^:^$)'
> # Looks like you failed 2 tests of 60.
> [12:18:06] t/003_check.pl ... 
> Dubious, test returned 2 (wstat 512, 0x200)
> Failed 2/60 subtests 
> 
> These animals have somewhat weird alignment properties: MAXALIGN is 8
> but ALIGNOF_DOUBLE is only 4.  I speculate that that is affecting their
> choices about whether an out-of-line TOAST value is needed, breaking
> this test case.

The pg_amcheck test case is not corrupting any pg_catalog tables, but 
contrib/amcheck/verify_heapam is complaining about a corruption in 
pg_catalog.pg_statistic.

The logic in verify_heapam only looks for a value in the toast table if the 
tuple it gets from the main table (in this case, from pg_statistic) has an 
attribute that claims to be toasted.  The error message we're seeing that you 
quoted above simply means that no entry exists in the toast table.  The bit 
about "final toast chunk number 0 differs from expected value 1" is super 
unhelpful, as what it is really saying is that there were no chunks found.  I 
should submit a patch to not print that message in cases where the attribute is 
missing from the toast table.

Is it possible that pg_statistic really is corrupt here, and that this is not a 
bug in pg_amcheck?  It's not like we've been checking for corruption in the 
build farm up till now.  I notice that this test, as well as test 
005_opclass_damage.pl, neglects to turn off autovacuum for the test node.  So 
maybe the corruptions are getting propogated during autovacuum?  This is just a 
guess, but I will submit a patch that turns off autovacuum for the test node 
shortly.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 10:56:50AM -0700, Mark Dilger wrote:
> 
> 
> > On Mar 15, 2021, at 10:50 AM, Julien Rouhaud  wrote:
> > 
> > On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:
> >> I'm saying that your patch seems to call down to 
> >> get_collation_actual_version() via get_collation_version_for_oid() from 
> >> your new function do_check_index_has_outdated_collation(), but I'm not 
> >> seeing how that gets exercised.
> > 
> > It's a little bit late here so sorry if I'm missing something.
> > 
> > do_check_index_has_outdated_collation() is called from
> > index_has_outdated_collation() which is called from
> > index_has_outdated_dependency() which is called from
> > RelationGetIndexListFiltered(), and that function is called when passing the
> > OUTDATED option to REINDEX (and reindexdb --outdated).  So this is exercised
> > with added tests for both matching and non matching collation version.
> 
> Ok, fair enough.  I was thinking about the case where the collation actually 
> returns a different version number because it (the C library providing the 
> collation) got updated, but I think you've answered already that you are not 
> planning to test that case, only the case where pg_depend is modified to have 
> a bogus version number.

This infrastructure is supposed to detect that the collation library *used to*
return a different version before it was updated.  And that's exactly what
we're testing by manually updating the refobjversion.

> It seems a bit odd to me that a feature intended to handle cases where 
> collations are updated is not tested via having a collation be updated during 
> the test.  It leaves open the possibility that something differs between the 
> test and reindexed run after real world collation updates.  But that's for 
> the committer who picks up your patch to decide, and perhaps it is unfair to 
> make your patch depend on addressing that issue.

Why is that odd?  We're testing that we're correctly storing the collation
version during index creating and correctly detecting a mismatch.  Having a
fake collation provider to return a fake version number won't add any more
coverage unless I'm missing something.

It's similar to how we test the various corruption scenario.  AFAIK we're not
providing custom drivers to write corrupted data but we're simply simulating a
corruption overwriting some blocks.




Re: Different compression methods for FPI

2021-03-15 Thread Justin Pryzby
On Sun, Mar 14, 2021 at 07:31:35PM -0500, Justin Pryzby wrote:
> On Sat, Mar 13, 2021 at 08:48:33PM +0500, Andrey Borodin wrote:
> > > 13 марта 2021 г., в 06:28, Justin Pryzby  
> > > написал(а):
> > > Updated patch with a minor fix to configure.ac to avoid warnings on OSX.
> > > And 2ndary patches from another thread to allow passing recovery tests.
> > > Renamed to WAL_COMPRESSION_*
> > > Split LZ4 support to a separate patch and support zstd.  These show that
> > > changes needed for a new compression method have been minimized, although 
> > > not
> > > yet moved to a separate, abstracted compression/decompression function.
> > 
> > Thanks! Awesome work!
> > 
> > > These two patches are a prerequisite for this patch to progress:
> > > * Run 011_crash_recovery.pl with wal_level=minimal
> > > * Make sure published XIDs are persistent
> > > 
> > > I don't know if anyone will consider this patch for v14 - if not, it 
> > > should be
> > > set to v15 and revisit in a month.  
> > 
> > I want to note, that fixes for 011_crash_recovery.pl are not strictly 
> > necessary for this patch set.
> > The problem in tests arises if we turn on wal_compression, absolutely 
> > independently from wal compression method.
> > We turn on wal_compression in this test only for CI purposes.
> 
> I rearranged the patches to reflect this.
> Change to zlib and zstd to level=1.
> Add support for negative "zstd fast" levels.
> Use correct length accounting for "hole" in LZ4 and ZSTD.
> Fixed Solution.pm for zstd on windows.
> Switch to zstd by default (for CI).
> Add docs.

Changes:
- Allocate buffer sufficient to accommodate any supported compression method;
- Use existing info flags argument rather than adding another byte for storing
  the compression method; this seems to be what was anticipated by commit
  57aa5b2bb and what Michael objected to.

I think the existing structures are ugly, so maybe this suggests using a GUC
assign hook to support arbitrary compression level, and maybe other options.

-- 
Justin
>From cbb75085d8755d5db09905a51e12ca47df151775 Mon Sep 17 00:00:00 2001
From: Andrey Borodin 
Date: Sat, 27 Feb 2021 09:03:50 +0500
Subject: [PATCH 01/10] Allow alternate compression methods for wal_compression

TODO: bump XLOG_PAGE_MAGIC
---
 doc/src/sgml/config.sgml  | 17 +
 src/backend/Makefile  |  2 +-
 src/backend/access/transam/xlog.c | 10 +++
 src/backend/access/transam/xloginsert.c   | 67 ---
 src/backend/access/transam/xlogreader.c   | 64 +-
 src/backend/utils/misc/guc.c  | 11 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 src/include/access/xlog_internal.h| 16 +
 src/include/access/xlogrecord.h   | 11 ++-
 10 files changed, 187 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a218d78bef..7fb2a84626 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3072,6 +3072,23 @@ include_dir 'conf.d'
   
  
 
+ 
+  wal_compressionion_method (enum)
+  
+   wal_compression_method configuration parameter
+  
+  
+  
+   
+This parameter selects the compression method used to compress WAL when
+wal_compression is enabled.
+The supported methods are pglz and zlib.
+The default value is pglz.
+Only superusers can change this setting.
+   
+  
+ 
+
  
   wal_init_zero (boolean)
   
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
 LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
 
 # The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
 
 ifeq ($(with_systemd),yes)
 LIBS += -lsystemd
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f4d1ce5dea..15da91a8dd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -99,6 +99,7 @@ bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
 bool		wal_compression = false;
+int			wal_compression_method = WAL_COMPRESSION_PGLZ;
 char	   *wal_consistency_checking_string = NULL;
 bool	   *wal_consistency_checking = NULL;
 bool		wal_init_zero = true;
@@ -180,6 +181,15 @@ const struct config_enum_entry recovery_target_action_options[] = {
 	{NULL, 0, false}
 };
 
+/* Note that due to conditional compilation, offsets within the array are not static */
+const struct config_enum_entry wal_compression_options[] = {
+	{"pglz", WAL_COMPRESSION_PGLZ, false},
+#ifdef  HAVE_LIBZ
+	{"zlib", 

Re: Parser Hook

2021-03-15 Thread Pavel Stehule
po 15. 3. 2021 v 18:54 odesílatel Julien Rouhaud 
napsal:

> On Mon, Mar 15, 2021 at 06:41:36PM +0100, Pavel Stehule wrote:
> > po 15. 3. 2021 v 18:18 odesílatel Julien Rouhaud 
> > napsal:
> >
> > > On Mon, Mar 15, 2021 at 06:05:52PM +0100, Pavel Stehule wrote:
> > > >
> > > > Possibility to work with a parser is one main reason for forking
> > > postgres.
> > > > Lot of interestings projects fail on the cost of maintaining their
> own
> > > fork.
> > > >
> > > > Maybe a good enough possibility is the possibility to inject an own
> > > parser
> > > > called before Postgres parser. Then it can do a transformation from
> > > "CREATE
> > > > PIPE ..." to "SELECT extparse("CREATE PIPE()". There can be a switch
> if
> > > > returned content is string for reparsing or already prepared AST.
> > >
> > > Having a hook that returns a reformatted query string would definitely
> be
> > > easier to write compared to generating an AST, but the overhead of
> parsing
> > > the
> > > query twice plus deparsing it will probably make that approach way too
> > > expensive in many usecases, so we shouldn't go that way.
> > >
> >
> > yes - so it can be nice to have more possibilities.
> >
> > parsing is expensive - but on today computers, the cost of parsing is
> low -
> > the optimization is significantly more expensive.
> >
> > I wrote some patches in this area (all rejected by Tom :)), and a lot of
> > work can be done after parser and before the analysis stage. Probably,
> the
> > parser hook is not good enough, there should be an analysis stage hook
> too.
>
> If you need an parse/analyse hook, it means that you're extending the AST,
> so
> you probably also need executor support for that right?  Or is it only to
> support syntactic sugar in the analysis rather than parsing?
>

I think all necessary executor's hooks are available already. On the
executor level, I was able to do all what I wanted.

I miss just preparsing, and postparsing hooks.

Regards

Pavel


Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 10:50 AM, Julien Rouhaud  wrote:
> 
> On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:
>> I'm saying that your patch seems to call down to 
>> get_collation_actual_version() via get_collation_version_for_oid() from your 
>> new function do_check_index_has_outdated_collation(), but I'm not seeing how 
>> that gets exercised.
> 
> It's a little bit late here so sorry if I'm missing something.
> 
> do_check_index_has_outdated_collation() is called from
> index_has_outdated_collation() which is called from
> index_has_outdated_dependency() which is called from
> RelationGetIndexListFiltered(), and that function is called when passing the
> OUTDATED option to REINDEX (and reindexdb --outdated).  So this is exercised
> with added tests for both matching and non matching collation version.

Ok, fair enough.  I was thinking about the case where the collation actually 
returns a different version number because it (the C library providing the 
collation) got updated, but I think you've answered already that you are not 
planning to test that case, only the case where pg_depend is modified to have a 
bogus version number.

It seems a bit odd to me that a feature intended to handle cases where 
collations are updated is not tested via having a collation be updated during 
the test.  It leaves open the possibility that something differs between the 
test and reindexed run after real world collation updates.  But that's for the 
committer who picks up your patch to decide, and perhaps it is unfair to make 
your patch depend on addressing that issue.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Parser Hook

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 06:41:36PM +0100, Pavel Stehule wrote:
> po 15. 3. 2021 v 18:18 odesílatel Julien Rouhaud 
> napsal:
> 
> > On Mon, Mar 15, 2021 at 06:05:52PM +0100, Pavel Stehule wrote:
> > >
> > > Possibility to work with a parser is one main reason for forking
> > postgres.
> > > Lot of interestings projects fail on the cost of maintaining their own
> > fork.
> > >
> > > Maybe a good enough possibility is the possibility to inject an own
> > parser
> > > called before Postgres parser. Then it can do a transformation from
> > "CREATE
> > > PIPE ..." to "SELECT extparse("CREATE PIPE()". There can be a switch if
> > > returned content is string for reparsing or already prepared AST.
> >
> > Having a hook that returns a reformatted query string would definitely be
> > easier to write compared to generating an AST, but the overhead of parsing
> > the
> > query twice plus deparsing it will probably make that approach way too
> > expensive in many usecases, so we shouldn't go that way.
> >
> 
> yes - so it can be nice to have more possibilities.
> 
> parsing is expensive - but on today computers, the cost of parsing is low -
> the optimization is significantly more expensive.
> 
> I wrote some patches in this area (all rejected by Tom :)), and a lot of
> work can be done after parser and before the analysis stage. Probably, the
> parser hook is not good enough, there should be an analysis stage hook too.

If you need an parse/analyse hook, it means that you're extending the AST, so
you probably also need executor support for that right?  Or is it only to
support syntactic sugar in the analysis rather than parsing?




Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:
> I'm saying that your patch seems to call down to 
> get_collation_actual_version() via get_collation_version_for_oid() from your 
> new function do_check_index_has_outdated_collation(), but I'm not seeing how 
> that gets exercised.

It's a little bit late here so sorry if I'm missing something.

do_check_index_has_outdated_collation() is called from
index_has_outdated_collation() which is called from
index_has_outdated_dependency() which is called from
RelationGetIndexListFiltered(), and that function is called when passing the
OUTDATED option to REINDEX (and reindexdb --outdated).  So this is exercised
with added tests for both matching and non matching collation version.




Re: Parser Hook

2021-03-15 Thread Pavel Stehule
po 15. 3. 2021 v 18:18 odesílatel Julien Rouhaud 
napsal:

> On Mon, Mar 15, 2021 at 06:05:52PM +0100, Pavel Stehule wrote:
> >
> > Possibility to work with a parser is one main reason for forking
> postgres.
> > Lot of interestings projects fail on the cost of maintaining their own
> fork.
> >
> > Maybe a good enough possibility is the possibility to inject an own
> parser
> > called before Postgres parser. Then it can do a transformation from
> "CREATE
> > PIPE ..." to "SELECT extparse("CREATE PIPE()". There can be a switch if
> > returned content is string for reparsing or already prepared AST.
>
> Having a hook that returns a reformatted query string would definitely be
> easier to write compared to generating an AST, but the overhead of parsing
> the
> query twice plus deparsing it will probably make that approach way too
> expensive in many usecases, so we shouldn't go that way.
>

yes - so it can be nice to have more possibilities.

parsing is expensive - but on today computers, the cost of parsing is low -
the optimization is significantly more expensive.

I wrote some patches in this area (all rejected by Tom :)), and a lot of
work can be done after parser and before the analysis stage. Probably, the
parser hook is not good enough, there should be an analysis stage hook too.


Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 10:34 AM, Julien Rouhaud  wrote:
> 
> On Mon, Mar 15, 2021 at 10:13:55AM -0700, Mark Dilger wrote:
>> 
>> 
>>> On Mar 15, 2021, at 9:52 AM, Julien Rouhaud  wrote:
>>> 
>>> But there are also the tests in collate.icu.utf8.out which will fake 
>>> outdated
>>> collations (that's the original tests for the collation tracking patches) 
>>> and
>>> then check that outdated indexes are reindexed with both REINDEX and REINDEX
>>> (OUDATED).
>>> 
>>> So I think that all cases are covered.  Do you want to have more test cases?
>> 
>> I thought that just checked cases where a bogus 'not a version' was put into 
>> pg_catalog.pg_depend.  I'm talking about having a collation provider who 
>> returns a different version string and has genuinely different collation 
>> rules between versions, thereby breaking the index until it is updated.  Is 
>> that being tested?
> 
> No, we're only checking that the infrastructure works as intended.
> 
> Are you saying that you want to implement a simplistic collation provider with
> "tunable" ordering, so that you can actually check that an ordering change 
> will
> be detected as a corrupted index, as in you'll get some error or incorrect
> results?

I'm saying that your patch seems to call down to get_collation_actual_version() 
via get_collation_version_for_oid() from your new function 
do_check_index_has_outdated_collation(), but I'm not seeing how that gets 
exercised.  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2021-03-15 Thread Tom Lane
David Steele  writes:
> On 12/15/20 9:03 AM, Peter Eisentraut wrote:
>> Here is a new patch for this.  This now follows the implementation that 
>> Tom has suggested:  Leave date_part() alone, add a new set of extract() 
>> functions, and map the SQL EXTRACT construct to those.  I have basically 
>> just copied over the implementations from my previous patch and placed 
>> them next to the existing date_part() implementations.  So all the 
>> behavior is still the same as in the previous patches.
>> 
>> One thing I still need to look into is how to not lose all the test 
>> coverage for date_part().  But that should be fairly mechanical, so I'm 
>> leaving it off in this version.

> Tom, what do you think of the updated patch?

Oh, I didn't think I was on the hook to review this ;-)

Anyway, taking a quick look at the v4 patch, the only complaint
I have is that it seems a bit bulky and brute-force to duplicate
so much code.  Is it feasible to share most of the implementation
between old and new functions, returning (say) an int64 that can
then be converted to either numeric or float8 by a wrapper?  That
would also reduce the pressure to duplicate all the test cases.

(I don't intend this complaint as a deal-breaker; Peter may well
have considered this alternative already and rejected it for good
reasons.)

regards, tom lane




Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 10:13:55AM -0700, Mark Dilger wrote:
> 
> 
> > On Mar 15, 2021, at 9:52 AM, Julien Rouhaud  wrote:
> > 
> > But there are also the tests in collate.icu.utf8.out which will fake 
> > outdated
> > collations (that's the original tests for the collation tracking patches) 
> > and
> > then check that outdated indexes are reindexed with both REINDEX and REINDEX
> > (OUDATED).
> > 
> > So I think that all cases are covered.  Do you want to have more test cases?
> 
> I thought that just checked cases where a bogus 'not a version' was put into 
> pg_catalog.pg_depend.  I'm talking about having a collation provider who 
> returns a different version string and has genuinely different collation 
> rules between versions, thereby breaking the index until it is updated.  Is 
> that being tested?

No, we're only checking that the infrastructure works as intended.

Are you saying that you want to implement a simplistic collation provider with
"tunable" ordering, so that you can actually check that an ordering change will
be detected as a corrupted index, as in you'll get some error or incorrect
results?

I don't think that this infrastructure is the right place to do that, and I'm
not sure what would be the benefit here.  If a library was updated, the
underlying indexes may or may not be corrupted, and we only warn about the
discrepancy with a low overhead.




Re: Parser Hook

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 06:05:52PM +0100, Pavel Stehule wrote:
> 
> Possibility to work with a parser is one main reason for forking postgres.
> Lot of interestings projects fail on the cost of maintaining their own fork.
> 
> Maybe a good enough possibility is the possibility to inject an own parser
> called before Postgres parser. Then it can do a transformation from "CREATE
> PIPE ..." to "SELECT extparse("CREATE PIPE()". There can be a switch if
> returned content is string for reparsing or already prepared AST.

Having a hook that returns a reformatted query string would definitely be
easier to write compared to generating an AST, but the overhead of parsing the
query twice plus deparsing it will probably make that approach way too
expensive in many usecases, so we shouldn't go that way.




Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 9:52 AM, Julien Rouhaud  wrote:
> 
> But there are also the tests in collate.icu.utf8.out which will fake outdated
> collations (that's the original tests for the collation tracking patches) and
> then check that outdated indexes are reindexed with both REINDEX and REINDEX
> (OUDATED).
> 
> So I think that all cases are covered.  Do you want to have more test cases?

I thought that just checked cases where a bogus 'not a version' was put into 
pg_catalog.pg_depend.  I'm talking about having a collation provider who 
returns a different version string and has genuinely different collation rules 
between versions, thereby breaking the index until it is updated.  Is that 
being tested?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: documentation fix for SET ROLE

2021-03-15 Thread Bossart, Nathan
On 3/15/21, 7:06 AM, "Laurenz Albe"  wrote:
> On Fri, 2021-03-12 at 21:41 +, Bossart, Nathan wrote:
>> On 3/12/21, 11:14 AM, "Joe Conway"  wrote:
>> > Looking back at the commit history it seems to me that this only works
>> > accidentally. Perhaps it would be best to fix RESET ROLE and be done with 
>> > it.
>>
>> That seems reasonable to me.
>
> +1 from me too.

Here's my latest attempt.  I think it's important to state that it
sets the role to the current session user identifier unless there is a
connection-time setting.  If there is no connection-time setting, it
will reset the role to the current session user, which might be
different if you've run SET SESSION AUTHORIZATION.

diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml
index 739f2c5cdf..f02babf3af 100644
--- a/doc/src/sgml/ref/set_role.sgml
+++ b/doc/src/sgml/ref/set_role.sgml
@@ -53,9 +53,16 @@ RESET ROLE
   

   
-   The NONE and RESET forms reset the 
current
-   user identifier to be the current session user identifier.
-   These forms can be executed by any user.
+   SET ROLE NONE sets the current user identifier to the
+   current session user identifier, as returned by
+   session_user.  RESET ROLE sets the
+   current user identifier to the connection-time setting specified by the
+   command-line options,
+   ALTER ROLE, or
+   ALTER DATABASE,
+   if any such settings exist.  Otherwise, RESET ROLE sets
+   the current user identifier to the current session user identifier.  These
+   forms can be executed by any user.
   
  

Nathan



Re: Parser Hook

2021-03-15 Thread Pavel Stehule
> Also, I'm not sure that many extensions would really benefit from custom
> utility command, as you can already do pretty much anything you want using
> SQL
> functions.  For instance it would be nice for hypopg to be able to support
>
> CREATE HYPOTHETICAL INDEX ...
>
> rather than
>
> SELECT hypopg_create_index('CREATE INDEX...')
>
> But really the only benefit would be autocompletion, which still wouldn't
> be
> possible as psql autocompletion won't be extended.  And even if it somehow
> was,
> I wouldn't expect all psql clients to be setup as needed.
>

The extending parser can be interesting for two cases

a) compatibility with other databases

b) experimental supports of some features standard (current or future)

c) some experiments - using

CREATE PIPE xxx(xxx), or CREATE HYPERCUBE xxx (xxx) is more readable more
SQLish (more natural syntax) than

SELECT create_pipe('name', 'a1', 'int', ...) or SELECT ext('CREATE PIPE ...)

Possibility to work with a parser is one main reason for forking postgres.
Lot of interestings projects fail on the cost of maintaining their own fork.

Maybe a good enough possibility is the possibility to inject an own parser
called before Postgres parser. Then it can do a transformation from "CREATE
PIPE ..." to "SELECT extparse("CREATE PIPE()". There can be a switch if
returned content is string for reparsing or already prepared AST.

It can be very interesting feature.

Pavel


Re: Parser Hook

2021-03-15 Thread Jim Mlodgenski
On Mon, Mar 15, 2021 at 12:58 PM Joel Jacobson  wrote:

> On Mon, Mar 15, 2021, at 16:48, Jim Mlodgenski wrote:
>
> The example I have is adding a CREATE JOB command that a scheduler may
> use.
>
>
> This CREATE JOB thing sounds interesting.
>
> Are you working on adding the ability to schedule SQL-commands to run in
> the background,
> similar to cronjob and/or adding ampersand ("&") to a command in the
> terminal?
>

No, it was just a sample of how the parser could be extended to all an
extension like pg_cron can use CREATE JOB instead of calling a function
like SELECT cron.schedule(...)


Re: pg_amcheck contrib application

2021-03-15 Thread Tom Lane
Looks like we're not quite out of the woods, as hornet and tern are
still unhappy:

#   Failed test 'pg_amcheck excluding all corrupt schemas status (got 2 vs 
expected 0)'
#   at t/003_check.pl line 498.

#   Failed test 'pg_amcheck excluding all corrupt schemas stdout /(?^:^$)/'
#   at t/003_check.pl line 498.
#   'heap table "db1"."pg_catalog"."pg_statistic", block 2, 
offset 1, attribute 27:
# final toast chunk number 0 differs from expected value 1
# heap table "db1"."pg_catalog"."pg_statistic", block 2, offset 1, attribute 27:
# toasted value for attribute 27 missing from toast table
# '
# doesn't match '(?^:^$)'
# Looks like you failed 2 tests of 60.
[12:18:06] t/003_check.pl ... 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/60 subtests 

These animals have somewhat weird alignment properties: MAXALIGN is 8
but ALIGNOF_DOUBLE is only 4.  I speculate that that is affecting their
choices about whether an out-of-line TOAST value is needed, breaking
this test case.

regards, tom lane




Re: Parser Hook

2021-03-15 Thread Jim Mlodgenski
On Mon, Mar 15, 2021 at 12:43 PM Julien Rouhaud  wrote:

> On Mon, Mar 15, 2021 at 11:48:58AM -0400, Jim Mlodgenski wrote:
> >
> > Going deeper on this, I created another POC as an example. Yes, having a
> > hook at the top of the parser does mean an extension needs to copy the
> > existing grammar and modify it. Without a total redesign of how the
> grammar
> > is handled, I'm not seeing how else this could be accomplished. The
> example
> > I have is adding a CREATE JOB command that a scheduler may use. The
> amount
> > of effort needed for an extension maintainer doesn't appear to be that
> > onerous. Its not ideal having to copy and patch gram.y, but certainly
> > doable for someone wanting to extend the parser.
>
> AFAIK nothing in bison prevents you from silently ignoring unhandled
> grammar
> rather than erroring out.  So you could have a parser hook called first,
> and
> if no valid command was recognized fall back on the original parser.  I'm
> not
> saying that it's a good idea or will be performant (although the added
> grammar
> will likely be very small, so it may not be that bad), but you could
> definitely
> avoid the need to duplicate the whole grammar in each and every extension,
> and
> allow multiple extensions extending the grammar.
>
>
That's a good point. That does simplify it


> That won't reduce the difficulty of producing a correct parse tree if you
> want
> to implement some syntactic sugar for already handled DML though.
>

Complex DML like Oracle's outer join syntax is tricky no matter which way
you slice it.


> > However we would want to modify the parser to allow it to be more
> plugable
> > in the future, we would very likely need to have a hook at the top of the
> > parser to intiailize things like keywords. Having a hook at the top of
> the
> > parser along with the existing ProcessUtility_hook allows extension to
> add
> > their own utility commands if they wish. I would image that covers many
> > existing use cases for extensions today.
>
> What happens if multiple extensions want to add their own new grammar?
> There
> will at least be possible conflicts with the additional node tags.
>

The extensions would need to play nice with one another like they do with
other hooks and properly call the previous hook.


> Also, I'm not sure that many extensions would really benefit from custom
> utility command, as you can already do pretty much anything you want using
> SQL
> functions.  For instance it would be nice for hypopg to be able to support
>
> CREATE HYPOTHETICAL INDEX ...
>
> rather than
>
> SELECT hypopg_create_index('CREATE INDEX...')
>
> But really the only benefit would be autocompletion, which still wouldn't
> be
> possible as psql autocompletion won't be extended.  And even if it somehow
> was,
> I wouldn't expect all psql clients to be setup as needed.
>

Having the functionality exposed through DDL gives it a more native feel to
it for users and for some more likely use the exentions.


Re: Parser Hook

2021-03-15 Thread Joel Jacobson
On Mon, Mar 15, 2021, at 16:48, Jim Mlodgenski wrote:
> The example I have is adding a CREATE JOB command that a scheduler may use. 

This CREATE JOB thing sounds interesting.

Are you working on adding the ability to schedule SQL-commands to run in the 
background,
similar to cronjob and/or adding ampersand ("&") to a command in the terminal?

I couldn't figure it out by reading the patch.
I noted the "insert into extended_parser.jobs" query,
which to me sounds like the job would be some kind of parsing job,
but that seems strange.

/Joel

Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 09:30:43AM -0700, Mark Dilger wrote:
> 
> In the docs, 0001, "Fow now, the only dependency handled currently",
> 
> "Fow now" is misspelled, and "For now" seems redundant when used with 
> "currently".
> 
> 
> In the docs, 0002, "For now only dependency on collations are supported."
> 
> "dependency" is singular, "are" is conjugated for plural.
> 
> 
> In the docs, 0002, you forgot to update doc/src/sgml/ref/reindexdb.sgml with 
> the documentation for the --outdated switch.

Thanks, I'll fix those and do a full round a doc / comment proofreading.

> In the tests, you check that REINDEX (OUTDATED) doesn't do anything crazy, 
> but you are not really testing the functionality so far as I can see, as you 
> don't have any tests which cause the collation to be outdated.   Am I right 
> about that?  I wonder if you could modify DefineCollation.  In addition to 
> the providers "icu" and "libc" that it currently accepts, I wonder if it 
> might accept "test" or similar, and then you could create a test in 
> src/test/modules that compiles a "test" provider, creates a database with 
> indexes dependent on something from that provider, stops the database, 
> updates the test collation, ...?  


Indeed the tests in create_index.sql (and similarly in 090_reindexdb.pl) check
that REINDEX (OUTDATED) will ignore non outdated indexes as expected.

But there are also the tests in collate.icu.utf8.out which will fake outdated
collations (that's the original tests for the collation tracking patches) and
then check that outdated indexes are reindexed with both REINDEX and REINDEX
(OUDATED).

So I think that all cases are covered.  Do you want to have more test cases?




Re: Parser Hook

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 11:48:58AM -0400, Jim Mlodgenski wrote:
> 
> Going deeper on this, I created another POC as an example. Yes, having a
> hook at the top of the parser does mean an extension needs to copy the
> existing grammar and modify it. Without a total redesign of how the grammar
> is handled, I'm not seeing how else this could be accomplished. The example
> I have is adding a CREATE JOB command that a scheduler may use. The amount
> of effort needed for an extension maintainer doesn't appear to be that
> onerous. Its not ideal having to copy and patch gram.y, but certainly
> doable for someone wanting to extend the parser.

AFAIK nothing in bison prevents you from silently ignoring unhandled grammar
rather than erroring out.  So you could have a parser hook called first, and
if no valid command was recognized fall back on the original parser.  I'm not
saying that it's a good idea or will be performant (although the added grammar
will likely be very small, so it may not be that bad), but you could definitely
avoid the need to duplicate the whole grammar in each and every extension, and
allow multiple extensions extending the grammar.

That won't reduce the difficulty of producing a correct parse tree if you want
to implement some syntactic sugar for already handled DML though.

> However we would want to modify the parser to allow it to be more plugable
> in the future, we would very likely need to have a hook at the top of the
> parser to intiailize things like keywords. Having a hook at the top of the
> parser along with the existing ProcessUtility_hook allows extension to add
> their own utility commands if they wish. I would image that covers many
> existing use cases for extensions today.

What happens if multiple extensions want to add their own new grammar?  There
will at least be possible conflicts with the additional node tags.

Also, I'm not sure that many extensions would really benefit from custom
utility command, as you can already do pretty much anything you want using SQL
functions.  For instance it would be nice for hypopg to be able to support

CREATE HYPOTHETICAL INDEX ...

rather than

SELECT hypopg_create_index('CREATE INDEX...')

But really the only benefit would be autocompletion, which still wouldn't be
possible as psql autocompletion won't be extended.  And even if it somehow was,
I wouldn't expect all psql clients to be setup as needed.




Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 14, 2021, at 8:33 PM, Julien Rouhaud  wrote:
> 
> 


In the docs, 0001, "Fow now, the only dependency handled currently",

"Fow now" is misspelled, and "For now" seems redundant when used with 
"currently".


In the docs, 0002, "For now only dependency on collations are supported."

"dependency" is singular, "are" is conjugated for plural.


In the docs, 0002, you forgot to update doc/src/sgml/ref/reindexdb.sgml with 
the documentation for the --outdated switch.


In the tests, you check that REINDEX (OUTDATED) doesn't do anything crazy, but 
you are not really testing the functionality so far as I can see, as you don't 
have any tests which cause the collation to be outdated.   Am I right about 
that?  I wonder if you could modify DefineCollation.  In addition to the 
providers "icu" and "libc" that it currently accepts, I wonder if it might 
accept "test" or similar, and then you could create a test in src/test/modules 
that compiles a "test" provider, creates a database with indexes dependent on 
something from that provider, stops the database, updates the test collation, 
...?  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Confusing behavior of psql's \e

2021-03-15 Thread Laurenz Albe
On Fri, 2021-03-12 at 13:12 -0500, Tom Lane wrote:
> I pushed the race-condition-fixing part of this, since that's an
> unarguable bug fix and hence seems OK to back-patch.  (I added a
> check on change of file size, because why not.)

Thank you!

> Attached is the rest, just to keep the cfbot happy.

Thanks for that as well.

> I don't think this is quite committable yet.  The division of
> labor between do_edit() and its callers seems to need more
> thought: in particular, I see that \ef now fails to print
> "No changes" when I would expect it to.

Hm.  My idea was that "No changes" is short for "No changes to
the query buffer" and is printed only if you quit the editor,
but the query buffer is retained.

But it also makes sense to interpret it as "No changes to the
function or view definition", so I have put it back according
to your expectations.

> But the real question
> is whether there is any non-error condition in which do_edit
> should not set the query_buffer, either to the edit result
> or empty.

I don't follow.  That's exactly what happens...
But I guess the confusion is due to my inadequate comments.

> If we're going to improve the header comment for
> do_edit, I would expect it to specify what happens to the
> query_buf, and the fact that I can't write a concise spec
> leads me to think that a bit more design effort is needed.

I have described the fate of the query buffer in the function
comment.  I hope it is clearer now.

> Also, somewhat cosmetic, but: I feel like the hunk that is
> setting discard_on_quit in exec_command_edit is assuming
> more than it ought to about what copy_previous_query will do.
> Perhaps it's worth making copy_previous_query return a bool
> indicating whether it copied previous_buf, and then
> exec_command_edit becomes
> 
> discard_on_quit = copy_previous_query(query_buf, previous_buf);

That is a clear improvement, and I have done it like that.


Perhaps I should restate the problem the patch is trying to solve:

test=> INSERT INTO dummy VALUES (DEFAULT);
INSERT 0 1
test=> -- quit the editor during the following
test=> \e script.sql
INSERT 0 1

Ouch.  A second line was inserted into "dummy".

The same happens with plain \e: if I edit the previous query
and quit, I don't expect it to be executed again.
Currently, the only way to achieve that is to empty the temporary
file and then save it, which is annoying.


Attached is version 6.

Yours,
Laurenz Albe
From 436f48e21adcf207dc267db834a2e80f846fb666 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 15 Mar 2021 17:09:22 +0100
Subject: [PATCH] Improve \e, \ef and \ev if the editor is quit

Before, if you edited a function or view definition or a
script file, quitting the editor without saving would cause
the *previous query* to be executed, which is certainly not
what anyone would expect.

It is better to execute nothing and clear the query buffer
in this case.

The behavior when editing the current query buffer is
unchanged: quitting without saving will retain it.

In passing, fix the misplaced function comment for do_edit().

Author: Laurenz Albe 
Reviewed-by: Chapman Flack, Jacob Champion
Discussion: https://postgr.es/m/0ba3f2a658bac6546d9934ab6ba63a805d46a49b.camel%40cybertec.at
---
 doc/src/sgml/ref/psql-ref.sgml | 10 --
 src/bin/psql/command.c | 63 +-
 2 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 13c1edfa4d..0bf69174ff 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1970,7 +1970,9 @@ testdb=
 
 
 
-The new contents of the query buffer are then re-parsed according to
+If you edit a file or the previous query, and you quit the editor without
+modifying the file, the query buffer is cleared.
+Otherwise, the new contents of the query buffer are re-parsed according to
 the normal rules of psql, treating the
 whole buffer as a single line.  Any complete queries are immediately
 executed; that is, if the query buffer contains or ends with a
@@ -2039,7 +2041,8 @@ Tue Oct 26 21:40:57 CEST 1999
  in the form of a CREATE OR REPLACE FUNCTION or
  CREATE OR REPLACE PROCEDURE command.
  Editing is done in the same way as for \edit.
- After the editor exits, the updated command is executed immediately
+ If you quit the editor without saving, the statement is discarded.
+ If you save and exit the editor, the updated command is executed immediately
  if you added a semicolon to it.  Otherwise it is redisplayed;
  type semicolon or \g to send it, or \r
  to cancel.
@@ -2115,7 +2118,8 @@ Tue Oct 26 21:40:57 CEST 1999
  This command fetches and edits the definition of the named view,
  in the form of a CREATE OR REPLACE VIEW command.
  Editing is done in the same way as for \edit.

Re: Postgres crashes at memcopy() after upgrade to PG 13.

2021-03-15 Thread Peter Geoghegan
On Mon, Mar 15, 2021 at 6:56 AM Avinash Kumar
 wrote:
> psql:amchecksql.sql:17: DEBUG:  leaf block 1043751 of index "idx_id_mtime" 
> has no first data item

That one is harmless.

> And one error as follows.
>
> psql:amchecksql.sql:17: ERROR:  down-link lower bound invariant violated for 
> index "some_other_index"

That indicates corruption. Can you tie this back to the crash? Is it
the same index?

-- 
Peter Geoghegan




Re: Proposal: Save user's original authenticated identity for logging

2021-03-15 Thread Jacob Champion
On Tue, 2021-03-09 at 19:10 +, Jacob Champion wrote:
> And v5 is rebased over this morning's SSL test changes.
Rebased again after the SSL test revert (this is the same as v4).

--Jacob
From 470bf11f4b8feb6c22dc72626f6f3fcb7971ac26 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 3 Feb 2021 11:04:42 -0800
Subject: [PATCH v6 1/3] test/kerberos: only search forward in logs

The log content tests searched through the entire log file, from the
beginning, for every match. If two tests shared the same expected log
content, it was possible for the second test to get a false positive by
matching against the first test's output. (This could be seen by
modifying one of the expected-failure tests to expect the same output as
a previous happy-path test.)

Store the offset of the previous match, and search forward from there
during the next match, resetting the offset every time the log file
changes. This could still result in false positives if one test spits
out two or more matching log lines, but it should be an improvement over
what's there now.
---
 src/test/kerberos/t/001_auth.pl | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 079321bbfc..c721d24dbd 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -182,6 +182,9 @@ $node->safe_psql('postgres', 'CREATE USER test1;');
 
 note "running tests";
 
+my $current_log_name = '';
+my $current_log_position;
+
 # Test connection success or failure, and if success, that query returns true.
 sub test_access
 {
@@ -221,18 +224,37 @@ sub test_access
 		$lfname =~ s/^stderr //;
 		chomp $lfname;
 
+		if ($lfname ne $current_log_name)
+		{
+			$current_log_name = $lfname;
+
+			# Search from the beginning of this new file.
+			$current_log_position = 0;
+			note "current_log_position = $current_log_position";
+		}
+
 		# might need to retry if logging collector process is slow...
 		my $max_attempts = 180 * 10;
 		my $first_logfile;
 		for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
 		{
 			$first_logfile = slurp_file($node->data_dir . '/' . $lfname);
-			last if $first_logfile =~ m/\Q$expect_log_msg\E/;
+
+			# Don't include previously matched text in the search.
+			$first_logfile = substr $first_logfile, $current_log_position;
+			if ($first_logfile =~ m/\Q$expect_log_msg\E/g)
+			{
+$current_log_position += pos($first_logfile);
+last;
+			}
+
 			usleep(100_000);
 		}
 
 		like($first_logfile, qr/\Q$expect_log_msg\E/,
 			 'found expected log file content');
+
+		note "current_log_position = $current_log_position";
 	}
 
 	return;
-- 
2.25.1

From 7a0aded279d3d1f2e0ea5da4d69af20dd2628143 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 8 Feb 2021 10:53:20 -0800
Subject: [PATCH v6 2/3] ssl: store client's DN in port->peer_dn

Original patch by Andrew Dunstan:

https://www.postgresql.org/message-id/fd96ae76-a8e3-ef8e-a642-a592f5b76771%40dunslane.net

but I've taken out the clientname=DN functionality; all that will be
needed for the next patch is the DN string.
---
 src/backend/libpq/be-secure-openssl.c | 53 +++
 src/include/libpq/libpq-be.h  |  1 +
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 8c37381add..a7b09534a8 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -546,22 +546,25 @@ aloop:
 	/* Get client certificate, if available. */
 	port->peer = SSL_get_peer_certificate(port->ssl);
 
-	/* and extract the Common Name from it. */
+	/* and extract the Common Name / Distinguished Name from it. */
 	port->peer_cn = NULL;
+	port->peer_dn = NULL;
 	port->peer_cert_valid = false;
 	if (port->peer != NULL)
 	{
 		int			len;
+		X509_NAME  *x509name = X509_get_subject_name(port->peer);
+		char	   *peer_cn;
+		char	   *peer_dn;
+		BIO		   *bio = NULL;
+		BUF_MEM*bio_buf = NULL;
 
-		len = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		NID_commonName, NULL, 0);
+		len = X509_NAME_get_text_by_NID(x509name, NID_commonName, NULL, 0);
 		if (len != -1)
 		{
-			char	   *peer_cn;
-
 			peer_cn = MemoryContextAlloc(TopMemoryContext, len + 1);
-			r = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		  NID_commonName, peer_cn, len + 1);
+			r = X509_NAME_get_text_by_NID(x509name, NID_commonName, peer_cn,
+		  len + 1);
 			peer_cn[len] = '\0';
 			if (r != len)
 			{
@@ -585,6 +588,36 @@ aloop:
 
 			port->peer_cn = peer_cn;
 		}
+
+		bio = BIO_new(BIO_s_mem());
+		if (!bio)
+		{
+			pfree(port->peer_cn);
+			port->peer_cn = NULL;
+			return -1;
+		}
+		/* use commas instead of slashes */
+		X509_NAME_print_ex(bio, x509name, 0, XN_FLAG_RFC2253);
+		BIO_get_mem_ptr(bio, _buf);
+		peer_dn = MemoryContextAlloc(TopMemoryContext, bio_buf->length + 1);
+		

Re: Parser Hook

2021-03-15 Thread Jim Mlodgenski
On Mon, Feb 22, 2021 at 3:52 PM Andres Freund  wrote:

> Hi,
>
> On 2021-02-22 11:20:54 -0500, Jim Mlodgenski wrote:
> > As Jan mentioned in his thread about a pluggable wire protocol [0], AWS
> is
> > working on a set of extensions for Babelfish. The intention is to not
> > necessarily have it as a single monolithic extension, but be possible for
> > people to use pieces of it as they need when they are migrating to
> > PostgreSQL. Some may just need the functions or data types. Others may
> need
> > the stored procedure language. Many times when enterprises are migrating
> > databases, they have satellite applications that they may not be able to
> > change or they are on a different schedules than the main application so
> > the database still needs to support some of the old syntax. A common need
> > in these situations is the parser.
> >
> > Attached is a patch to place a hook at the top of the parser to allow
> for a
> > pluggable parser. It is modeled after the planner_hook [1]. To test the
> > hook, I have also attached a simple proof of concept that wraps the
> parser
> > in a TRY/CATCH block to catch any parse errors. That could potentially
> help
> > a class of users who are sensitive to parse errors ending up in the logs
> > and leaking PII data or passwords.
>
> I don't think these are really comparable. In case of the planner hook
> you can reuse the normal planner pieces, and just deal with the one part
> you need to extend. But we have pretty much no infrastructure to use the
> parser in a piecemeal fashion (there's a tiny bit for plpgsql).
>
> Which in turn means that to effectively use the proposed hook to
> *extend* what postgres accepts, you need to copy the existing parser,
> and hack in your extensions. Which in turn invariably will lead to
> complaints about parser changes / breakages the community will get
> complaints about in minor releases etc.
>

Going deeper on this, I created another POC as an example. Yes, having a
hook at the top of the parser does mean an extension needs to copy the
existing grammar and modify it. Without a total redesign of how the grammar
is handled, I'm not seeing how else this could be accomplished. The example
I have is adding a CREATE JOB command that a scheduler may use. The amount
of effort needed for an extension maintainer doesn't appear to be that
onerous. Its not ideal having to copy and patch gram.y, but certainly
doable for someone wanting to extend the parser. I also extended the patch
to add another hook in parse_expr.c to see what we would need to add
another keyword and have it call a function like SYSDATE. That appears to
be a lot of work to get all of the potentail hook points that an extension
may want to add and there may not be that many usecases worth the effort.



> I think the cost incurred for providing a hook that only allows
> extensions to replace the parser with a modified copy of ours will be
> higher than the gain.  Note that I'm not saying that I'm against
> extending the parser, or hooks - just that I don't think just adding the
> hook is a step worth doing on its own.
>
>
However we would want to modify the parser to allow it to be more plugable
in the future, we would very likely need to have a hook at the top of the
parser to intiailize things like keywords. Having a hook at the top of the
parser along with the existing ProcessUtility_hook allows extension to add
their own utility commands if they wish. I would image that covers many
existing use cases for extensions today.


> Imo a functional approach would really need to do the work to allow to
> extend & reuse the parser in a piecemeal fashion and *then* add a hook.
>
> Greetings,
>
> Andres Freund
>


poc_parser_hook.patch
Description: Binary data


poc_extended_parser.tar.gz
Description: GNU Zip compressed data


Re: Improve join selectivity estimation using extended statistics

2021-03-15 Thread Konstantin Knizhnik




On 11.03.2021 03:47, Tomas Vondra wrote:

Hi Konstantin,

Thanks for working on this! Using extended statistics to improve join
cardinality estimates was definitely on my radar, and this patch seems
like a good start.

I had two basic ideas about how we might improve join estimates:

(a) use per-table extended statistics to estimate join conditions

(b) invent multi-table extended statistics (requires inventing how to
sample the tables in a coordinated way, etc.)

This patch aims to do (a) which is perfectly reasonable - I think we can
achieve significant improvements this way. I have some ideas about (b),
but it seems harder and for a separate thread/patch.


The patch includes some *very* interesting ideas, but I think it's does
them too late and at the wrong level of abstraction. I mean that:

1) I don't think the code in clausesel.c should deal with extended
statistics directly - it requires far too much knowledge about different
types of extended stats, what clauses are supported by them, etc.
Allowing stats on expressions will make this even worse.

Better do that in extended_stats.c, like statext_clauselist_selectivity.

2) in clauselist_selectivity_ext, functional dependencies are applied in
the part that processes remaining clauses, not estimated using extended
statistics. That seems a bit confusing, and I suspect it may lead to
issues - for example, it only processes the clauses incrementally, in a
particular order. That probably affects the  result, because it affects
which functional dependencies we can apply.

In the example query that's not an issue, because it only has two Vars,
so it either can't apply anything (with one Var) or it can apply
everything (with two Vars). But with 3 or more Vars the order would
certainly matter, so it's problematic.


Moreover, it seems a bit strange that this considers dependencies only
on the inner relation. Can't that lead to issues with different join
orders producing different cardinality estimates?


I think a better approach would be to either modify the existing block
dealing with extended stats for a single relation to also handle join
conditions. Or perhaps we should invent a separate block, dealing with
*pairs* of relations? And it should deal with *all* join clauses for
that pair of relations at once, not one by one.

As for the exact implementation, I'd imagine we call overall logic to be
something like (for clauses on two joined relations):

- pick a subset of clauses with the same type of extended statistics on
both sides (MCV, ndistinct, ...), repeat until we can't apply more
statistics

- estimate remaining clauses either using functional dependencies or in
the regular (old) way


As for how to use other types of extended statistics, I think eqjoinsel
could serve as an inspiration. We should look for an MCV list and
ndistinct stats on both sides of the join (possibly on some subset of
clauses), and then do the same thing eqjoinsel does, just with multiple
columns.

Note: I'm not sure what to do when we find the stats only on one side.
Perhaps we should assume the other side does not have correlations and
use per-column statistics (seems reasonable), or maybe just not apply
anything (seems a bit too harsh).

Anyway, if there are some non-estimated clauses, we could try applying
functional dependencies similarly to what this patch does. It's also
consistent with statext_clauselist_selectivity - that also tries to
apply MCV lists first, and only then we try functional dependencies.


BTW, should this still rely on oprrest (e.g. F_EQSEL). That's the
selectivity function for restriction (non-join) clauses, so maybe we
should be looking at oprjoin when dealing with joins? Not sure.


One bit that I find *very* interesting is the calc_joinrel_size_estimate
part, with this comment:

   /*
* Try to take in account functional dependencies between attributes
* of clauses pushed-down to joined relations and retstrictlist
* clause. Right now we consider only case of restrictlist consists of
* one clause.
*/

If I understand the comment and the code after it, it essentially tries
to apply extended statistics from both the join clauses and filters at
the relation level. That is, with a query like

 SELECT * FROM t1 JOIN t2 ON (t1.a = t2.a) WHERE t1.b = 10

we would be looking at statistics on t1(a,b), because we're interested
in estimating conditional probability distribution

P(t1.a = a? | t1.b = 10)

I think that's extremely interesting and powerful, because it allows us
to "restrict" the multi-column MCV lists, we could probably estimate
number of distinct "a" values in rows with "b=10" like:

 ndistinct(a,b) / ndistinct(b)

and do various interesting stuff like this.

That will require some improvements to the extended statistics code (to
allow passing a list of conditions), but that's quite doable. I think
the code actually did something like that originally ;-)


Obviously, none of this is achievable for PG14, as 

Re: MultiXact\SLRU buffers configuration

2021-03-15 Thread Gilles Darold
Le 12/03/2021 à 13:44, Andrey Borodin a écrit :
>
>> 11 марта 2021 г., в 20:50, Gilles Darold  написал(а):
>>
>>
>> The patch doesn't apply anymore in master cause of error: patch failed: 
>> src/backend/utils/init/globals.c:150
>>
>>
>>
>> An other remark about this patch is that it should be mentionned in the 
>> documentation (doc/src/sgml/config.sgml) that the new configuration 
>> variables need a server restart, for example by adding "This parameter can 
>> only be set at server start." like for shared_buffers. Patch on 
>> postgresql.conf mention it.
>>
>> And some typo to be fixed:
>>
>>
>>
>> s/Tipically/Typically/
>>
>> s/asincronous/asyncronous/
>>
>> s/confugured/configured/
>>
>> s/substrnsactions/substransactions/
>>
>>
> Thanks, Gilles! Fixed.
>
> Best regards, Andrey Borodin.
>

Hi Andrey,

I found two problems in this patch, first in src/include/miscadmin.h
multixact_members_slru_buffers is declared twice:


 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_slru_buffers;
+extern PGDLLIMPORT int multixact_members_slru_buffers;
+extern PGDLLIMPORT int multixact_members_slru_buffers;  <-
+extern PGDLLIMPORT int subtrans_slru_buffers;


In file src/backend/access/transam/multixact.c the second variable
should be multixact_buffers_slru_buffers and not
multixact_offsets_slru_buffers.


@@ -1848,13 +1848,13 @@ MultiXactShmemInit(void)
    MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;

    SimpleLruInit(MultiXactOffsetCtl,
- "MultiXactOffset",
NUM_MULTIXACTOFFSET_BUFFERS, 0,
+ "MultiXactOffset",
multixact_offsets_slru_buffers, 0,
  MultiXactOffsetSLRULock,
"pg_multixact/offsets",
  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
  SYNC_HANDLER_MULTIXACT_OFFSET);
    SlruPagePrecedesUnitTests(MultiXactOffsetCtl,
MULTIXACT_OFFSETS_PER_PAGE);
    SimpleLruInit(MultiXactMemberCtl,
- "MultiXactMember",
NUM_MULTIXACTMEMBER_BUFFERS, 0,
+ "MultiXactMember",
multixact_offsets_slru_buffers, 0,    <--
  MultiXactMemberSLRULock,
"pg_multixact/members",
  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
  SYNC_HANDLER_MULTIXACT_MEMBER);



Please fix them so that I can end the review.


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/





Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-03-15 Thread Mark Dilger



> On Mar 2, 2021, at 6:08 AM, Pavel Borisov  wrote:
> 
> I completely agree that checking uniqueness requires looking at the heap, but 
> I don't agree that every caller of bt_index_check on an index wants that 
> particular check to be performed.  There are multiple ways in which an index 
> might be corrupt, and Peter wrote the code to only check some of them by 
> default, with options to expand the checks to other things.  This is why 
> heapallindexed is optional.  If you don't want to pay the price of checking 
> all entries in the heap against the btree, you don't have to.
> 
> I've got the idea and revised the patch accordingly. Thanks!
> Pfa v4 of a patch. I've added an optional argument to allow uniqueness checks 
> for the unique indexes.
> Also, I added a test variant to make them work on 32-bit systems. 
> Unfortunately, converting the regression test to TAP would be a pain for me. 
> Hope it can be used now as a 2-variant regression test for 32 and 64 bit 
> systems.
> 
> Thank you for your consideration!
> 
> -- 
> Best regards,
> Pavel Borisov
> 
> Postgres Professional: http://postgrespro.com
> 

Looking over v4, here are my review comments...

I created the file contrib/amcheck/amcheck--1.2--1.3.sql during the v14 
development cycle, so that is not released yet.  If your patch goes out in v14, 
does it need to create an amcheck--1.3--1.4.sql, or could you fit your changes 
into the 1.2--1.3.sql file?  (Does the project have a convention governing 
this?)  This is purely a question.  I'm not advising you to change anything 
here.

You need to update doc/src/sgml/amcheck.sgml to reflect the changes you made to 
the bt_index_check and bt_index_parent_check functions.

You need to update the recently committed src/bin/pg_amcheck project to include 
--checkunique as an option.  This client application already has flags for 
heapallindexed and rootdescend.  I can help with that if it isn't obvious what 
needs to be done.  Note that pg_amcheck/t contains TAP tests that exercise the 
options, so you'll need to extend code coverage to include this new option.


> On Mar 2, 2021, at 7:10 PM, Peter Geoghegan  wrote:
> 
> I don't think that it's acceptable for your new check to raise a
> WARNING instead of an ERROR.

You already responded to Peter, and I can see that after raising WARNINGs about 
an index, the code raises an ERROR.  That is different from behavior that 
pg_amcheck currently expects from contrib/amcheck functions.  It will be 
interesting to see if that makes integration harder.


> On Mar 2, 2021, at 6:54 PM, Peter Geoghegan  wrote:
> 
>> The regression test you provided is not portable.  I am getting lots of 
>> errors due to differing output of the form "page lsn=0/4DAD7E0".  You might 
>> turn this into a TAP test and use a regular expression to check the output.
> 
> I would test this using a custom opclass that does simple fault
> injection. For example, an opclass that indexes integers, but can be
> configured to dynamically make 0 values equal or unequal to each
> other. That's more representative of real-world problems.
> 
> You "break the warranty" by updating pg_index, even compared to
> updating other system catalogs. In particular, you break the
> "indcheckxmin wait -- wait for xmin to be old before using index"
> stuff in get_relation_info(). So it seems worse than updating
> pg_attribute, for example (which is something that the tests do
> already).

Take a look at src/bin/pg_amcheck/t/005_opclass_damage.pl for an example of 
changing an opclass to test btree index breakage.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Calendar support in localization

2021-03-15 Thread Surafel Temesgen
Hi all,
My country(Ethiopia) is one of the nations that uses different kind of
calendar than what PostgreSQL have so we are deprived from the benefit of
data datatype. We just uses String to store date that limits our
application quality greatly. The lag became even worst once application and
system time support is available and it seems to me it is not fair to
suggest to add other date data type kind and implementation for just
different calendar that even not minor user group. Having calendar support
to localization will be very very very very exciting feature for none
Gregorian calendar user group and make so loved. As far as i can see the
difficult thing is understanding different calendar. I can prepare a patch
for Ethiopian calendar once we have consensus.

I cc Thomas Munro and Vik because they have interest on this area

Please don't suggests to fork from PostgreSQL just for this feature

regards
Surafel


Re: locking [user] catalog tables vs 2pc vs logical rep

2021-03-15 Thread vignesh C
On Tue, Feb 23, 2021 at 3:59 AM Andres Freund  wrote:
>
> Hi,
>
> The 2pc decoding added in
>
> commit a271a1b50e9bec07e2ef3a05e38e7285113e4ce6
> Author: Amit Kapila 
> Date:   2021-01-04 08:34:50 +0530
>
> Allow decoding at prepare time in ReorderBuffer.
>
> has a deadlock danger when used in a way that takes advantage of
> separate decoding of the 2PC PREPARE.
>
>
> I assume the goal of decoding the 2PC PREPARE is so one can wait for the
> PREPARE to have logically replicated, before doing the COMMIT PREPARED.
>
>
> However, currently it's pretty easy to get into a state where logical
> decoding cannot progress until the 2PC transaction has
> committed/aborted. Which essentially would lead to undetected deadlocks.
>
> The problem is that catalog tables accessed during logical decoding need
> to get locked (otherwise e.g. a table rewrite could happen
> concurrently). But if the prepared transaction itself holds a lock on a
> catalog table, logical decoding will block on that lock - which won't be
> released until replication progresses. A deadlock.
>
> A trivial example:
>
> SELECT pg_create_logical_replication_slot('test', 'test_decoding');
> CREATE TABLE foo(id serial primary key);
> BEGIN;
> LOCK pg_class;
> INSERT INTO foo DEFAULT VALUES;
> PREPARE TRANSACTION 'foo';
>
> -- hangs waiting for pg_class to be unlocked
> SELECT pg_logical_slot_get_changes('test', NULL, NULL, 'two-phase-commit', 
> '1');
>
>
> Now, more realistic versions of this scenario would probably lock a
> 'user catalog table' containing replication metadata instead of
> pg_class, but ...
>
>
> At first this seems to be a significant issue. But on the other hand, if
> you were to shut the cluster down in this situation (or disconnect all
> sessions), you have broken cluster on your hand - without logical
> decoding being involved. As it turns out, we need to read pg_class to
> log in...  And I can't remember this being reported to be a problem?
>
>
> Perhaps all that we need to do is to disallow 2PC prepare if [user]
> catalog tables have been locked exclusively? Similar to how we're
> disallowing preparing tables with temp table access.
>

Even I felt we should not allow prepare a transaction that has locked
system tables, as it does not allow creating a new session after
restart and also causes the deadlock while logical decoding of
prepared transaction.
I have made a patch to make the prepare transaction fail in this
scenario. Attached the patch for the same.
Thoughts?

Regards,
Vignesh
From 93014a12a6a7c576cdd11028eaa0e7fa2afa7205 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Mon, 15 Mar 2021 17:21:04 +0530
Subject: [PATCH v1] Fail prepared transaction if transaction has locked system
 tables/user catalog tables.

Don't allow PREPARE TRANSACTION if we've locked a system table or an user
catalog table in this transaction. Allowing the prepared  transactions when it
locked system tables will result in hang while logical decoding of prepared
transactions as logical decoding of prepared transactions locks the system
table.
---
 doc/src/sgml/ref/prepare_transaction.sgml |   8 ++
 src/backend/access/transam/xact.c |  14 +++
 src/backend/commands/lockcmds.c   |  17 +++
 src/include/access/xact.h |   7 ++
 src/test/regress/expected/prepared_xacts.out  | 102 
 .../regress/expected/prepared_xacts_1.out | 112 ++
 src/test/regress/sql/prepared_xacts.sql   |  75 
 7 files changed, 335 insertions(+)

diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml
index f4f6118ac3..ee8e4072f5 100644
--- a/doc/src/sgml/ref/prepare_transaction.sgml
+++ b/doc/src/sgml/ref/prepare_transaction.sgml
@@ -106,6 +106,14 @@ PREPARE TRANSACTION transaction_id
tied to the current session to be useful in a transaction to be prepared.
   
 
+  
+   It is not currently allowed to PREPARE a transaction that
+   has locked system table(s) or user defined catalog table(s). These features
+   are too tightly tied to logical decoding of
+   PREPARE TRANSACTION and creation of new login session to
+   be useful in a transaction to be prepared.
+  
+
   
If the transaction modified any run-time parameters with SET
(without the LOCAL option),
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 6395a9b240..d5d54a9e3b 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2448,6 +2448,20 @@ PrepareTransaction(void)
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot PREPARE a transaction that has operated on temporary objects")));
 
+	/*
+	 * Don't allow PREPARE TRANSACTION if we've locked a system table or an user
+	 * catalog table in this transaction. Allowing the prepared transactions
+	 * when it locked system tables will result in deadlock while logical
+	 * decoding of prepared transactions as logical decoding of prepared
+	 * 

Re: Freeze the inserted tuples during CTAS?

2021-03-15 Thread Paul Guo
>   to set visibility map bits on materialized views. I'll start a new
>thread to discuss that.

Thanks. Also I withdrew the patch.



Re: fdatasync performance problem with large number of DB files

2021-03-15 Thread Paul Guo


> On 2021/3/15, 7:34 AM, "Thomas Munro"  wrote:

>> On Mon, Mar 15, 2021 at 11:52 AM Thomas Munro  
wrote:
>> Time being of the essence, here is the patch I posted last year, this
>> time with a GUC and some docs.  You can set sync_after_crash to
>> "fsync" (default) or "syncfs" if you have it.

>  Cfbot told me to add HAVE_SYNCFS to Solution.pm, and I fixed a couple of 
> typos.

By the way, there is a usual case that we could skip fsync: A fsync-ed already 
standby generated by pg_rewind/pg_basebackup.
The state of those standbys are surely not 
DB_SHUTDOWNED/DB_SHUTDOWNED_IN_RECOVERY, so the
pgdata directory is fsync-ed again during startup when starting those pg 
instances. We could ask users to not fsync
during pg_rewind_basebackup, but we probably want to just fsync some files 
in pg_rewind (see [1]), so better
let the startup process skip the unnecessary fsync? As to the solution, using 
guc or writing something in some files like
backup_label(?) does not seem to be good ideas since
1. Use guc, we still expect fsync after real crash recovery so we need to reset 
the guc also need to specify pgoptions in pg_ctl command.
2. Write some hint information to files like backup_label(?) in 
pg_rewind/pg_basebackup, but people might
 copy the pgdata directory and then we still need fsync.
The only one simple solution I can think out is to let user touch a file to 
hint startup, before starting the pg instance.

[1] 
https://www.postgresql.org/message-id/flat/25CFBDF2-5551-4CC3-ADEB-434B6B1BAD16%40vmware.com#734e7dc77f0760a3a64e808476ecc592



Re: Regression tests vs SERIALIZABLE

2021-03-15 Thread Tom Lane
Thomas Munro  writes:
> On Mon, Mar 15, 2021 at 5:24 PM Thomas Munro  wrote:
>> However, since commit 862ef372d6b, there *is* one test that fails if
>> you run make installcheck against a cluster running with -c
>> default_transaction_isolation=serializable: transaction.sql.  Is that
>> a mistake?  Is it a goal to be able to run this test suite against all
>> 3 isolation levels?

> Here's a fix.

Usually, if we issue a SET in the regression tests, we explicitly RESET
as soon thereafter as practical, so as to have a well-defined scope
where the script is running under unusual conditions.

regards, tom lane




Re: 2019-03 CF now in progress

2021-03-15 Thread David Steele

On 3/1/21 10:30 AM, David Steele wrote:

Hackers,

The 2019-03 commitfest is now in progress. It's a big one as usual.

Needs review: 213.
Waiting on Author: 21.
Ready for Committer: 28.
Committed: 29.
Withdrawn: 3.
Total: 294.


We are now halfway through the 2021-03 commitfest, though historically 
this CF goes a bit long.


Here is the updated summary:

Needs review: 152. (-61)
Waiting on Author: 42. (+21)
Ready for Committer: 26. (-2)
Committed: 55. (+26)
Moved to next CF: 5. (+5)
Returned with Feedback: 4. (+4)
Rejected: 1. (+1)
Withdrawn: 9. (+6)
Total: 294.

In short, 42 of 262 entries open at the beginning of the CF have been 
closed (16%).


Regards,
--
-David
da...@pgmasters.net




Re: documentation fix for SET ROLE

2021-03-15 Thread Laurenz Albe
On Fri, 2021-03-12 at 21:41 +, Bossart, Nathan wrote:
> On 3/12/21, 11:14 AM, "Joe Conway"  wrote:
> > Looking back at the commit history it seems to me that this only works
> > accidentally. Perhaps it would be best to fix RESET ROLE and be done with 
> > it.
> 
> That seems reasonable to me.

+1 from me too.

Yours,
Laurenz Albe





Re: dynamic result sets support in extended query protocol

2021-03-15 Thread David Steele

Hi Peter,

On 12/30/20 9:33 AM, Peter Eisentraut wrote:

On 2020-10-09 20:46, Andres Freund wrote:

Is there really a good reason for forcing the client to issue
NextResult, Describe, Execute for each of the dynamic result sets? It's
not like there's really a case for allowing the clients to skip them,
right?  Why aren't we sending something more like

S: CommandPartiallyComplete
S: RowDescription
S: DataRow...
S: CommandPartiallyComplete
S: RowDescription
S: DataRow...
...
S: CommandComplete
C: Sync


I want to post my current patch, to keep this discussion moving.


CFBot reports that tests are failing, although the patch applies.

Also, you dropped all the driver authors from the thread. Not sure if 
that was intentional, but you might want to add them back if you need 
their input.


Regards,
--
-David
da...@pgmasters.net




Re: Postgres crashes at memcopy() after upgrade to PG 13.

2021-03-15 Thread Avinash Kumar
Hi,

On Sun, Mar 14, 2021 at 11:24 PM Peter Geoghegan  wrote:

> On Sun, Mar 14, 2021 at 6:54 PM Avinash Kumar
>  wrote:
> > Following may be helpful to understand what I meant.
> >
> > I have renamed the table and index names before adding it here.
>
> It should be possible to run amcheck on your database, which will
> detect corrupt posting list tuples on Postgres 13. It's a contrib
> extension, so you must first run "CREATE EXTENSION amcheck;". From
> there, you can run a query like the following (you may want to
> customize this):
>
> SELECT bt_index_parent_check(index => c.oid, heapallindexed => true),
> c.relname,
> c.relpages
> FROM pg_index i
> JOIN pg_opclass op ON i.indclass[0] = op.oid
> JOIN pg_am am ON op.opcmethod = am.oid
> JOIN pg_class c ON i.indexrelid = c.oid
> JOIN pg_namespace n ON c.relnamespace = n.oid
> WHERE am.amname = 'btree'
> -- Don't check temp tables, which may be from another session:
> AND c.relpersistence != 't'
> -- Function may throw an error when this is omitted:
> AND c.relkind = 'i' AND i.indisready AND i.indisvalid
> ORDER BY c.relpages DESC;
>
> If this query takes too long to complete you may find it useful to add
> something to limit the indexes check, such as: AND n.nspname =
> 'public' -- that change to the SQL will make the query just test
> indexes from the public schema.
>
> Do "SET client_min_messages=DEBUG1 " to get a kind of rudimentary
> progress indicator, if that seems useful to you.
>
I see that there are 26 Indexes for which there are 100 to thousands of
entries similar to the following. All are of course btree indexes.

psql:amchecksql.sql:17: DEBUG:  leaf block 1043751 of index "idx_id_mtime"
has no first data item

And one error as follows.

psql:amchecksql.sql:17: ERROR:  down-link lower bound invariant violated
for index "some_other_index"

>
> The docs have further information on what this bt_index_parent_check
> function does, should you need it:
> https://www.postgresql.org/docs/13/amcheck.html
>
> --
> Peter Geoghegan
>


-- 
Regards,
Avi.


RE: Enhance traceability of wal_level changes for backup management

2021-03-15 Thread osumi.takami...@fujitsu.com
On Friday, March 12, 2021 5:04 PM Peter Eisentraut 
 wrote:
> On 08.03.21 03:45, osumi.takami...@fujitsu.com wrote:
> > OK. The basic idea is to enable backup management tools to recognize
> > wal_level drop between*snapshots*.
> > When you have a snapshot of the cluster at one time and another one at
> > different time, with this new parameter, you can see if anything that
> > causes discontinuity from the drop happens in the middle of the two
> > snapshots without efforts to have a look at the WALs in between.
> 
> Is this an actual problem?  Changing wal_level requires a restart.  Are users
> frequently restarting their servers to change wal_level and then wonder why
> their backups are misbehaving or incomplete?  Why?  Just like fsync is
> "breaks your database", wal_level might as well be "breaks your backups".  Is
> it not documented well enough?
I understand what you mean.
However, this thread partly came from a concern of
another thread to introduce a new wal_level
which would make the wal_level change like above more common.
Therefore, I think it's preferable to discuss better safeguard or
better notification way to users instead of just leaving it without doing 
anything.


Best Regards,
Takamichi Osumi





Re: Postgres crashes at memcopy() after upgrade to PG 13.

2021-03-15 Thread Avinash Kumar
Hi Thomas,

On Sun, Mar 14, 2021 at 10:01 PM Avinash Kumar 
wrote:

> Hi Thomas,
>
> On Sun, Mar 14, 2021 at 9:40 PM Thomas Munro 
> wrote:
>
>> On Mon, Mar 15, 2021 at 1:29 PM Avinash Kumar
>>  wrote:
>> > Is this expected when replication is happening between PostgreSQL
>> databases hosted on different OS versions like Ubuntu 16 and Ubuntu 20 ?
>> Or, do we think this is some sort of corruption ?
>>
>> Is this index on a text datatype, and using a collation other than "C"?
>>
> Its en_US.UTF-8
>


> Also the datatype is bigint
>


>
>

>> https://wiki.postgresql.org/wiki/Locale_data_changes
>>
>> Not that I expect it to crash if that's the cause, I thought it'd just
>> get confused.
>>
> On Ubuntu 16 server,
>
> *$* ldd --version
>
> ldd (Ubuntu GLIBC 2.23-0ubuntu11.2) 2.23
>
> On New Server Ubuntu 20,
>
> *$* ldd --version
>
> ldd (Ubuntu GLIBC 2.31-0ubuntu9.2) 2.31
>
>
> --
> Regards,
> Avi.
>


-- 
Regards,
Avinash Vallarapu
+1-902-221-5976


Re: Postgres crashes at memcopy() after upgrade to PG 13.

2021-03-15 Thread Avinash Kumar
Hi Thomas,

On Sun, Mar 14, 2021 at 9:40 PM Thomas Munro  wrote:

> On Mon, Mar 15, 2021 at 1:29 PM Avinash Kumar
>  wrote:
> > Is this expected when replication is happening between PostgreSQL
> databases hosted on different OS versions like Ubuntu 16 and Ubuntu 20 ?
> Or, do we think this is some sort of corruption ?
>
> Is this index on a text datatype, and using a collation other than "C"?
>
Its en_US.UTF-8

>
> https://wiki.postgresql.org/wiki/Locale_data_changes
>
> Not that I expect it to crash if that's the cause, I thought it'd just
> get confused.
>
On Ubuntu 16 server,

*$* ldd --version

ldd (Ubuntu GLIBC 2.23-0ubuntu11.2) 2.23

On New Server Ubuntu 20,

*$* ldd --version

ldd (Ubuntu GLIBC 2.31-0ubuntu9.2) 2.31


-- 
Regards,
Avi.


Re: Postgres crashes at memcopy() after upgrade to PG 13.

2021-03-15 Thread Thomas Munro
On Mon, Mar 15, 2021 at 1:29 PM Avinash Kumar
 wrote:
> Is this expected when replication is happening between PostgreSQL databases 
> hosted on different OS versions like Ubuntu 16 and Ubuntu 20 ? Or, do we 
> think this is some sort of corruption ?

Is this index on a text datatype, and using a collation other than "C"?

https://wiki.postgresql.org/wiki/Locale_data_changes

Not that I expect it to crash if that's the cause, I thought it'd just
get confused.




Re: allow partial union-all and improve parallel subquery costing

2021-03-15 Thread David Steele

Hi Luc,

On 12/30/20 8:54 AM, Luc Vlaming wrote:


Created a commitfest entry assuming this is the right thing to do so 
that someone can potentially pick it up during the commitfest.


Providing an updated patch based on latest master.


Looks like you need another rebase: 
http://cfbot.cputube.org/patch_32_2787.log. Marked as Waiting for Author.


You may also want to give a more detailed description of what you have 
done here and why it improves execution plans. This may help draw some 
reviewers.


Regards,
--
-David
da...@pgmasters.net




Re: Detecting File Damage & Inconsistencies

2021-03-15 Thread David Steele

On 11/18/20 5:23 AM, Simon Riggs wrote:

On Wed, 18 Nov 2020 at 06:42, Craig Ringer
 wrote:


On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs  wrote:



What I'm proposing is an option to add 16 bytes onto each COMMIT
record



Would it make sense to write this at the time we write a topxid assignment to 
WAL instead?

Otherwise it won't be accessible to streaming-mode logical decoding.


Do you mean extend the xl_xact_assignment record? My understanding is
that is not sent in all cases, so not sure what you mean by "instead".


Craig, can you clarify?

Cleysson, you are signed up as a reviewer. Do you know when you'll have 
a change to have a look?


Regards,
--
-David
da...@pgmasters.net




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2021-03-15 Thread David Steele

On 12/23/20 2:27 PM, Stephen Frost wrote:

* Justin Pryzby (pry...@telsasoft.com) wrote:

On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote:

* I don't think it's okay to change the existing signatures of
pg_ls_logdir() et al.  Even if you can make an argument that it's
not too harmful to add more output columns, replacing pg_stat_file's
isdir output with something of a different name and datatype is most
surely not OK --- there is no possible way that doesn't break existing
user queries.

I think possibly a more acceptable approach is to leave these functions
alone but add documentation explaining how to get the additional info.
You could say things along the lines of "pg_ls_waldir() is the same as
pg_ls_dir_metadata('pg_wal') except for showing fewer columns."


On Mon, Nov 23, 2020 at 06:06:19PM -0500, Tom Lane wrote:

I'm mostly concerned about removing the isdir output of pg_stat_file().
Maybe we could compromise to the extent of keeping that, allowing it
to be partially duplicative of a file-type-code output column.


On Tue, Nov 24, 2020 at 11:53:22AM -0500, Stephen Frost wrote:

I don't have any particular issue with keeping isdir as a convenience
column.  I agree it'll now be a bit duplicative but that seems alright.


Maybe we should do what Tom said, and leave pg_ls_* unchanged, but also mark
them as deprecated in favour of:
| pg_ls_dir_metadata(dir), dir={'pg_wal/archive_status', 'log', 'pg_wal', 
'base/pgsql_tmp'}


Haven't really time to review the patches here in detail right now
(maybe next month), but in general, I dislike marking things as
deprecated.  

Stephen, are you still planning to review these patches?

Regards,
--
-David
da...@pgmasters.net




Re: Strange behavior with polygon and NaN

2021-03-15 Thread David Steele

On 12/21/20 3:30 AM, Kyotaro Horiguchi wrote:

At Tue, 01 Dec 2020 10:03:42 -0500, Tom Lane  wrote in

I think it should be "needs review" now.


Conflicted with some commit(s) uncertain to me. Rebased.


Tom, Georgios, thoughts on the new patch?

Regards,
--
-David
da...@pgmasters.net




subscriptionCheck failures

2021-03-15 Thread Thomas Munro
Hi,

This seems to be a new low frequency failure, I didn't see it mentioned already:

#   Failed test 'DROP SUBSCRIPTION during error can clean up the slots
on the publisher'
#   at t/004_sync.pl line 171.
#  got: '1'
# expected: '0'

 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2021-03-15%2010:40:10
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet=2021-03-15%2001:10:14
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2021-03-15%2001:10:09
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2021-02-26%2006:44:45
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2021-02-23%2019:00:09
 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae=2021-02-23%2009:40:08




Re: jsonpath syntax extensions

2021-03-15 Thread David Steele

On 3/3/21 9:44 AM, David Steele wrote:

On 3/4/20 3:18 PM, Nikita Glukhov wrote:

On 04.03.2020 19:13, David Steele wrote:

On 2/27/20 10:57 AM, Nikita Glukhov wrote:


Attached patches implement several useful jsonpath syntax extensions.
I already published them two years ago in the original SQL/JSON thread,
but then after creation of separate threads for SQL/JSON functions and
JSON_TABLE I forgot about them.


Are these improvements targeted at PG13 or PG14?  This seems to be a 
pretty big change for the last CF of PG13.  I know these have been 
submitted before but that was a few years ago so I think they count 
as new.


I believe that some of these improvements can get into PG13.  There is 
no need

to review all of them, we can choose only the simplest ones.

Another year has passed without any comment or review on this patch set.

I'm not sure why the feature is not generating any interest, but you 
might want to ask people who have been involved in JSON path before if 
they are interested in reviewing.


Since this is still essentially a new feature with no review before this 
CF I still don't think it is a good candidate for v14 but let's see if 
it gets some review.

Target version updated to 15.

Regards,
--
-David
da...@pgmasters.net




Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2021-03-15 Thread David Steele

On 12/15/20 9:03 AM, Peter Eisentraut wrote:
Here is a new patch for this.  This now follows the implementation that 
Tom has suggested:  Leave date_part() alone, add a new set of extract() 
functions, and map the SQL EXTRACT construct to those.  I have basically 
just copied over the implementations from my previous patch and placed 
them next to the existing date_part() implementations.  So all the 
behavior is still the same as in the previous patches.


One thing I still need to look into is how to not lose all the test 
coverage for date_part().  But that should be fairly mechanical, so I'm 
leaving it off in this version.


Tom, what do you think of the updated patch?

Regards,
--
-David
da...@pgmasters.net




Re: PATCH: Attempt to make dbsize a bit more consistent

2021-03-15 Thread gkokolatos





‐‐‐ Original Message ‐‐‐
On Monday, March 15, 2021 7:10 AM, Michael Paquier  wrote:

 On Wed, Feb 24, 2021 at 02:35:51PM +, gkokola...@pm.me wrote:

  Now with attachment. Apologies for the chatter.

 The patch has no documentation for the two new functions, so it is a
 bit hard to understand what is the value brought here, and what is the
 goal wanted just by reading the patch, except that this switches the
 size reported for views to NULL instead of zero bytes by reading the
 regression tests.

Understood and agreed. Thank you very much for looking.


 Please note that reporting zero is not wrong for views IMO as these
 have no physical storage, so, while it is possible to argue that a
 size of zero could imply that this relation size could not be zero, it
 will never change, so I'd rather let this behavior as it is. I
 would bet, additionally, that this could break existing use cases.
 Reporting NULL, on the contrary, as your patch does, makes things
 worse in some ways because that's what pg_relation_size would report
 when a relation does not exist anymore. Imagine for example the case
 of a DROP TABLE run in parallel of a scan of pg_class using
 pg_relation_size(). So it becomes impossible to know if a relation
 has been removed or not. This joins some points raised by Soumyadeep
 upthread.

Yeah, I would have agreed with you before actually looking a bit closer.
I think that the gist of it is that the intention of the functions and
the actual usage/documentation of those have diverged. I honestly thought
that pg_relation_size() was a general function intended for any kind of
relation, whereas pg_table_size() was a function intended only for tables
(toast and mat views included).

Then I read the comment describing pg_relation_size() which reads
'disk space usage for the main fork of the specified table or index' and
'disk space usage for the specified fork of a table or index', found
initially in commit 358a897fa1d and maintained since.

But I digress.



 Anyway, as mentioned by other people upthread, I am not really
 convinced either that we should have more flavors of size functions,
 particularly depending on the relkind as this would be confusing for
 the end-user. pg_relation_size() can already do this job for all
 relkinds that use segment files, so it should still be able to hold
 its ground and maintain a consistent behavior with what it does
 currently.

I must have missunderstood the other people upthread and I thought
that new flavours were requested. Thank you for clarifying and
correcting me.


 +static int64
 +calculate_table_fork_size(Relation rel, ForkNumber forkNum)
 +{

 -   return (int64)table_relation_size(rel, forkNum);
 +}
 Why isn't that just unsigned, like table_relation_size()? This is an
 internal function so it does not matter to changes its signature, but
 I think that you had better do a cast at a higher level instead.

 for (int i = 0; i  MAX_FORKNUM; i++)


 - nblocks += smgrnblocks(rel-rd_smgr, i);



 - nblocks += smgrexists(rel-rd_smgr, i)


 - ? smgrnblocks(rel-rd_smgr, i)


 - : 0;



 Is that actually the part for views? Why is that done this way?

No, it is not for views. The codebase should never reach this function
for a view or it would be a serious bug elsewhere.

This is addressing a bug in table_relation_size(). This function is correctly
not requiring for its caller to know any specifics about the forks. A heap
table is not required to have created any fsm, or vm, forks neither. Finally
smgrnblocks() assumes that the fork actually exists or errors out.

This change, makes certain that calling table_relation_size() with a
non existing fork will not generate an error.


 -   if (rel-rd_rel-relkind == RELKIND_RELATION ||

 - rel-rd_rel-relkind == RELKIND_TOASTVALUE ||


 - rel-rd_rel-relkind == RELKIND_MATVIEW)


 - size = calculate_table_fork_size(rel,


 -  forkname_to_number(text_to_cstring(forkName)));


 -   else if (rel-rd_rel-relkind == RELKIND_INDEX)

 - size = calculate_relation_size((rel-rd_node), 
rel-rd_backend,


 -  forkname_to_number(text_to_cstring(forkName)));


 -   else

 -   {

 - relation_close(rel, AccessShareLock);


 - PG_RETURN_NULL();


 -   }
 The approach you are taking to bring some consistency in all that by
 making the size estimations go through table AMs via
 table_relation_size() is promising. However, this code breaks the
 size estimation for sequences, which is not good. If attempting to
 use an evaluation that's based on a table AM, shouldn't this code use
 a check based on rd_tableam rather than the relkind then? We assume
 that rd_tableam is set depending on the relkind in relcache.c, for
 one.


Thank you for the remarks. Please find v5 attached which is a minimal
patch to try to use the table am api 

  1   2   >