Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

2021-07-29 Thread Greg Nancarrow
On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila  wrote:
>
> > Besides, I think we need a new default value about parallel dml safety. 
> > Maybe
> > 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is
> > likely to alter the safety to the default value to get the automatic safety
> > check, a independent default value can make it more clear.
> >
>
> Hmm, but auto won't work for partitioned tables, right? If so, that
> might appear like an inconsistency to the user and we need to document
> the same. Let me summarize the discussion so far in this thread so
> that it is helpful to others.
>

To avoid that inconsistency, UNSAFE could be the default for
partitioned tables (and we would disallow setting AUTO for these).
So then AUTO is the default for non-partitioned tables only.

Regards,
Greg Nancarrow
Fujitsu Australia




RE: Skipping logical replication transactions on subscriber side

2021-07-29 Thread houzj.f...@fujitsu.com
On July 29, 2021 1:48 PM Masahiko Sawada  wrote:
> 
> Sorry I've attached wrong ones. Reattached the correct version patches.

Hi,

I had some comments on the new version patches.

1)

-   relstate = (SubscriptionRelState *) 
palloc(sizeof(SubscriptionRelState));
-   relstate->relid = subrel->srrelid;
+   relstate = (SubscriptionRelState *) hash_search(htab, (void *) 
&subrel->srrelid,
+   HASH_ENTER, NULL);

I found the new version patch changes the List type 'relstate' to hash table 
type
'relstate'. Will this bring significant performance improvements ?

2)
+ * PgStat_StatSubRelErrEntry represents a error happened during logical

a error => an error

3)
+CREATE VIEW pg_stat_subscription_errors AS
+SELECT
+   d.datname,
+   sr.subid,
+   s.subname,

It seems the 'subid' column is not mentioned in the document of the
pg_stat_subscription_errors view.


4)
+
+   if (fread(&nrels, 1, sizeof(long), fpin) != sizeof(long))
+   {
...
+   for (int i = 0; i < nrels; i++)

the type of i(int) seems different of the type or 'nrels'(long), it might be
better to use the same type.

Best regards,
houzj


RE: Added schema level support for publication.

2021-07-29 Thread houzj.f...@fujitsu.com
On July 28, 2021 6:44 PM vignesh C  wrote:
> Thanks for the patch, I have merged the changes. Attached v16 patch has the
> fixes for the same.

Thanks for the new version patches.
Here are a few comments:

1)

+   /* Identify which schemas should be dropped */
+   foreach(oldlc, oldschemaids)
+   {
+   Oid oldschemaid = lfirst_oid(oldlc);
+
+   if (!list_member_oid(schemaoidlist, oldschemaid))
+   delschemas = lappend_oid(delschemas, oldschemaid);
+   }
+

We can use list_difference here to simplify the code.

2)

+
+   /* Filter out duplicates if user specifies "sch1, sch1" */
+   if (list_member_oid(schemaoidlist, schemaoid))
+   continue;
+
+   schemaoidlist = lappend_oid(schemaoidlist, schemaoid);

It might be more concise to use list_append_unique_oid() here.

3)

+  
+   Create a publication that publishes all changes for all the tables present 
in
+the schema "production":
+

The second line seems not aligned.
After:
+  
+   Create a publication that publishes all changes for all the tables present 
in
+   the schema "production":
+

4)

+   resetPQExpBuffer(query);
+
+   /* Get the publication membership for the schema */
+   appendPQExpBuffer(query,
+ "SELECT ps.psnspcid, ps.oid, p.pubname, p.oid AS 
pubid "
+ "FROM pg_publication_schema ps, pg_publication p "
+ "WHERE ps.psnspcid = '%u' "
+ "AND p.oid = ps.pspubid",
+ nsinfo->dobj.catId.oid);

It seems we can use printfPQExpBuffer() here which is a convenience routine
that does the same thing as resetPQExpBuffer() followed by appendPQExpBuffer().

Best regards,
Houzj


Use generation context to speed up tuplesorts

2021-07-29 Thread David Rowley
Hackers,

While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum
sorts for single column sorts), I noticed that we use a separate
memory context to store tuple data and we just reset when we're done
if the sort fits in memory, or we dump the tuples to disk in the same
order they're added and reset the context when it does not.  There is
a little pfree() work going on via writetup_heap() which I think
possibly could just be removed to get some additional gains.

Anyway, this context that stores tuples uses the standard aset.c
allocator which has the usual power of 2 wastage and additional
overheads of freelists etc.  I wondered how much faster it would go if
I set it to use a generation context instead of an aset.c one.

After running make installcheck to make the tenk1 table, running the
attached tuplesortbench script, I get this:

Master:
work_mem = '4MB';
   Sort Method: external merge  Disk: 2496kB
work_mem = '4GB';
   Sort Method: quicksort  Memory: 5541kB

Patched:
work_mem = '4MB';
   Sort Method: quicksort  Memory: 3197kB

So it seems to save quite a bit of memory getting away from rounding
up allocations to the next power of 2.

Performance-wise, there's some pretty good gains. (Results in TPS)

work_mem = '4GB';
Testmastergen sortcompare
Test1317.2665.6210%
Test2228.6388.9170%
Test3207.4330.7159%
Test4185.5279.4151%
Test5292.2563.9193%

If I drop the work_mem down to standard the unpatched version does to
disk, but the patched version does not.  The gains get a little
bigger.

work_mem = '4MB';
Testmastergen sortcompare
Test1177.5658.2371%
Test2149.7385.2257%
Test3137.5330.0240%
Test4129.0275.1213%
Test5161.7546.4338%

The patch is just a simple 1-liner at the moment.  I likely do need to
adjust what I'm passing as the blockSize to GenerationContextCreate().
  Maybe a better number would be something that's calculated from
work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64))
so that we just allocate at most a 16th of work_mem per chunk, but not
bigger than 8MB. I don't think changing this will affect the
performance of the above very much.

David
#!/bin/bash

sec=10

# This should improve
echo "select * from tenk1 order by two offset 100" > bench.sql
echo Test1
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

# This should improve
echo "select * from tenk1 order by tenthous offset 100" > bench.sql
echo Test2
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

# This should improve
echo "select * from tenk1 order by stringu1 offset 100" > bench.sql
echo Test3
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

# This should improve
echo "select * from tenk1 order by stringu2 offset 100" > bench.sql
echo Test4
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

# This should improve
echo "select * from tenk1 order by twenty offset 100" > bench.sql
echo Test5
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps




generation_tuplesort.patch
Description: Binary data


Re: [HACKERS] logical decoding of two-phase transactions

2021-07-29 Thread Greg Nancarrow
On Fri, Jul 30, 2021 at 2:02 PM Peter Smith  wrote:
>
> Please find attached the latest patch set v100*
>
> v99-0002 --> v100-0001
>

A few minor comments:

(1) doc/src/sgml/protocol.sgml

In the following description, is the word "large" really needed? Also
"the message ... for a ... message" sounds a bit odd, as does
"two-phase prepare".

What about the following:

BEFORE:
+Identifies the message as a two-phase prepare for a
large in-progress transaction message.
AFTER:
+Identifies the message as a prepare for an
in-progress two-phase transaction.

(2) src/backend/replication/logical/worker.c

Similar format comment, but one uses a full-stop and the other
doesn't, looks a bit odd, since the lines are near each other.

* 1. Replay all the spooled operations - Similar code as for

* 2. Mark the transaction as prepared. - Similar code as for

(3) src/test/subscription/t/023_twophase_stream.pl

Shouldn't the following comment mention, for example, "with streaming"
or something to that effect?

# logical replication of 2PC test


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Fix around conn_duration in pgbench

2021-07-29 Thread Fujii Masao




On 2021/07/30 14:43, Yugo NAGATA wrote:

This patch fixes three issues of connection time measurement:

1. The initial connection time is measured and stored into conn_duration
but the result is never used.
2. The disconnection time are not measured even when -C is specified.
3. The disconnection time measurement at the end of threadRun() is
not necessary.

The first one exists only in v14 and master, but others are also in v13 and
before. So, I think we can back-patch these fixes.


Yes, you're right.




But the patch fails to be back-patched to v13 or before because
pgbench's time logic was changed by commit 547f04e734.
Do you have the patches that can be back-patched to v13 or before?


I attached the patch for v13.


Thanks for the patch!

+   /*
+* In CSTATE_FINISHED state, this finishCon() 
is a no-op
+* under -C/--connect because all the 
connections that this
+* thread established should have already been 
closed at the end
+* of transactions. So we don't need to measure 
the disconnection
+* delays here.

In v13, the disconnection time needs to be measured in CSTATE_FINISHED
because the connection can be closed here when -C is not specified?


/* tps is about actually executed transactions */
tps_include = ntx / time_include;
tps_exclude = ntx /
(time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / 
nclients));

BTW, this is a bit different topic from the patch, but in v13,
tps excluding connection establishing is calculated in the above way.
The total connection time is divided by the number of clients,
but why do we need this division? Do you have any idea?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

2021-07-29 Thread Amit Kapila
Note: Changing the subject as I felt the topic has diverted from the
original reported case and also it might help others to pay attention.

On Wed, Jul 28, 2021 at 8:22 AM houzj.f...@fujitsu.com
 wrote:
> >
> > Consider below ways to allow the user to specify the parallel-safety option:
> >
> > (a)
> > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } 
> > ...
> > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
> >
> > OR
> >
> > (b)
> > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)
>
> Personally, I think the approach (a) might be better. Since it's similar to
> ALTER FUNCTION PARALLEL XXX which user might be more familiar with.
>

Okay, and I think for (b) true/false won't be sufficient because one
might want to specify restricted.

> Besides, I think we need a new default value about parallel dml safety. Maybe
> 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is
> likely to alter the safety to the default value to get the automatic safety
> check, a independent default value can make it more clear.
>

Hmm, but auto won't work for partitioned tables, right? If so, that
might appear like an inconsistency to the user and we need to document
the same. Let me summarize the discussion so far in this thread so
that it is helpful to others.

We would like to parallelize INSERT SELECT (first step INSERT +
parallel SELECT and then Parallel (INSERT + SELECT)) and for that, we
have explored a couple of ways. The first approach is to automatically
detect if it is safe to parallelize insert and then do it without user
intervention. To detect automatically, we need to determine the
parallel-safety of various expressions (like default column
expressions, check constraints, index expressions, etc.) at the
planning time which can be costly but we can avoid most of the cost if
we cache the parallel safety for the relation. So, the cost needs to
be paid just once. Now, we can't cache this for partitioned relations
because it can be very costly (as we need to lock all the partitions)
and has deadlock risks (while processing invalidation), this has been
explained in email [1].

Now, as we can't think of a nice way to determine parallel safety
automatically for partitioned relations, we thought of providing an
option to the user. The next thing to decide here is that if we are
providing an option to the user in one of the ways as mentioned above
in the email, what should we do if the user uses that option for
non-partitioned relations, shall we just ignore it or give an error
that this is not a valid syntax/option? The one idea which Dilip and I
are advocating is to respect the user's input for non-partitioned
relations and if it is not given then compute the parallel safety and
cache it.

To facilitate users for providing a parallel-safety option, we are
thinking to provide a utility function
"pg_get_table_parallel_dml_safety(regclass)" that
returns records of (objid, classid, parallel_safety) for all parallel
unsafe/restricted table-related objects from which the table's
parallel DML safety is determined. This will allow user to identify
unsafe objects and if the required user can change the parallel safety
of required functions and then use the parallel safety option for the
table.

Thoughts?

Note - This topic has been discussed in another thread as well [2] but
as many of the key technical points have been discussed here I thought
it is better to continue here.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1Jwz8xGss4b0-33eyX0i5W_1CnqT16DjB9snVC--DoOsQ%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709%40TYAPR01MB2990.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: Fix around conn_duration in pgbench

2021-07-29 Thread Yugo NAGATA
Hello Fujii-san,

On Fri, 30 Jul 2021 02:01:08 +0900
Fujii Masao  wrote:

> 
> 
> On 2021/07/29 2:23, Fujii Masao wrote:
> > 
> > 
> > On 2021/07/28 16:15, Yugo NAGATA wrote:
> >>> I found another disconnect_all().
> >>>
> >>> /* XXX should this be connection time? */
> >>> disconnect_all(state, nclients);
> >>>
> >>> The measurement is also not necessary here.
> >>> So the above comment should be removed or updated?
> >>
> >> I think this disconnect_all will be a no-op because all connections should
> >> be already closed in threadRun(), but I left it just to be sure that
> >> connections are all cleaned-up. I updated the comment for explaining above.
> >>
> >> I attached the updated patch. Could you please look at this?
> > 
> > Thanks for updating the patch! LGTM.
> 
> This patch needs to be back-patched because it fixes the bug
> in measurement of disconnection delays. Thought?

This patch fixes three issues of connection time measurement:

1. The initial connection time is measured and stored into conn_duration
   but the result is never used.
2. The disconnection time are not measured even when -C is specified.
3. The disconnection time measurement at the end of threadRun() is
   not necessary.

The first one exists only in v14 and master, but others are also in v13 and
before. So, I think we can back-patch these fixes.

> But the patch fails to be back-patched to v13 or before because
> pgbench's time logic was changed by commit 547f04e734.
> Do you have the patches that can be back-patched to v13 or before?

I attached the patch for v13.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 338c0152a2..3411556df8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3285,8 +3285,12 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 if (is_connect)
 {
+	instr_time	start = now;
+
+	INSTR_TIME_SET_CURRENT_LAZY(start);
 	finishCon(st);
-	INSTR_TIME_SET_ZERO(now);
+	INSTR_TIME_SET_CURRENT(now);
+	INSTR_TIME_ACCUM_DIFF(thread->conn_time, now, start);
 }
 
 if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
@@ -3310,6 +3314,17 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
  */
 			case CSTATE_ABORTED:
 			case CSTATE_FINISHED:
+/*
+ * In CSTATE_FINISHED state, this finishCon() is a no-op
+ * under -C/--connect because all the connections that this
+ * thread established should have already been closed at the end
+ * of transactions. So we don't need to measure the disconnection
+ * delays here.
+ *
+ * In CSTATE_ABORTED state, the measurement is no longer
+ * necessary because we cannot report complete results anyways
+ * in this case.
+ */
 finishCon(st);
 return;
 		}
@@ -6210,6 +6225,12 @@ main(int argc, char **argv)
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
+
+	/*
+	 * All connections should be already closed in threadRun(), so this
+	 * disconnect_all() will be a no-op, but clean up the connecions just
+	 * to be sure. We don't need to measure the disconnection delays here.
+	 */
 	disconnect_all(state, nclients);
 
 	/*
@@ -6237,8 +6258,7 @@ threadRun(void *arg)
 {
 	TState	   *thread = (TState *) arg;
 	CState	   *state = thread->state;
-	instr_time	start,
-end;
+	instr_time	start;
 	int			nstate = thread->nstate;
 	int			remains = nstate;	/* number of remaining clients */
 	socket_set *sockets = alloc_socket_set(nstate);
@@ -6502,10 +6522,7 @@ threadRun(void *arg)
 	}
 
 done:
-	INSTR_TIME_SET_CURRENT(start);
 	disconnect_all(state, nstate);
-	INSTR_TIME_SET_CURRENT(end);
-	INSTR_TIME_ACCUM_DIFF(thread->conn_time, end, start);
 	if (thread->logfile)
 	{
 		if (agg_interval > 0)


Re: [HACKERS] logical decoding of two-phase transactions

2021-07-29 Thread vignesh C
On Fri, Jul 30, 2021 at 9:32 AM Peter Smith  wrote:
>
> Please find attached the latest patch set v100*
>
> v99-0002 --> v100-0001
>
> Differences:
>
> * Rebased to HEAD @ today (needed because some recent commits [1][2] broke 
> v99)
>

The patch applies neatly, tests passes and documentation looks good.
A Few minor comments.
1) This blank line is not required:
+-- two_phase and streaming are compatible.
+CREATE SUBSCRIPTION regress_testsub CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, streaming = true, two_phase = true);
+

2) Few points have punctuation mark and few don't have, we can make it
consistent:
+###
+# Test 2PC PREPARE / ROLLBACK PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC
+# 3. Do rollback prepared.
+#
+# Expect data rolls back leaving only the original 2 rows.
+###

3) similarly here too:
+###
+# Do INSERT after the PREPARE but before ROLLBACK PREPARED.
+# 1. Table is deleted back to 2 rows which are replicated on subscriber.
+# 2. Data is streamed using 2PC.
+# 3. A single row INSERT is done which is after the PREPARE
+# 4. Then do a ROLLBACK PREPARED.
+#
+# Expect the 2PC data rolls back leaving only 3 rows on the subscriber.
+# (the original 2 + inserted 1)
+###

Regards,
Vignesh




Re: when the startup process doesn't (logging startup delays)

2021-07-29 Thread Nitin Jadhav
> Thanks. Can you please have a look at what I suggested down toward the
> bottom of 
> http://postgr.es/m/ca+tgmoap2wefsktmcgwt9lxuz7y99hnduyshpk7qnfuqb98...@mail.gmail.com
> ?
>
> If we're going to go the route of combining the functions, I agree
> that a Boolean is the way to go; I think we have some existing
> precedent for 'bool finished' rather than 'bool done'.
>
> But I kind of wonder if we should have an enum in the first place. It
> feels like we've got code in a bunch of places that just exists to
> decide which enum value to use, and then code someplace else that
> turns around and decides which message to produce. That would be
> sensible if we were using the same enum values in lots of places, but
> that's not the case here. So suppose we just moved the messages to the
> places where we're now calling LogStartupProgress() or
> LogStartupProgressComplete()? So something like this:

Sorry. I thought it is related to the discussion of deciding whether
LogStartupProgress() and LogStartupProgressComplete() should be
combined or not. I feel it's a really nice design. With this we avoid
a "action at a distance" issue and its easy to use. If we are
reporting the same kind of msgs at multiple places then the current
approach of using enum will be more suitable since we don't have to
worry about matching the log msg string. But in the current scenario,
we are not using the same kind of msgs at multiple places (I feel such
scenario will not occur in future also. Even if there is similar
operation, it can be distinguished like resetting unlogged relations
is distinguished by init and clean. Kindly mention if you can oversee
any such scenario), hence the approach you are suggesting will be a
best suit.

Thanks & Regards,
Nitin Jadhav

On Thu, Jul 29, 2021 at 9:48 PM Robert Haas  wrote:
>
> On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav
>  wrote:
> > Thanks for sharing the information. I have done the necessary changes
> > to show the logs during the latter case (postgres --single) and
> > verified the log messages.
>
> Thanks. Can you please have a look at what I suggested down toward the
> bottom of 
> http://postgr.es/m/ca+tgmoap2wefsktmcgwt9lxuz7y99hnduyshpk7qnfuqb98...@mail.gmail.com
> ?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com




Re: [BUG]Update Toast data failure in logical replication

2021-07-29 Thread Amit Kapila
On Mon, Jul 26, 2021 at 10:45 AM Dilip Kumar  wrote:
>
> I was thinking more about this idea, but IMHO, unless we send the key
> toasted tuple from the publisher how is the subscriber supposed to
> fetch it.  Because that is the key value for finding the tuple on the
> subscriber side and if we haven't sent the key value, how are we
> supposed to find the tuple on the subscriber side?
>

I was thinking of using toast pointer but that won't work because it
can be different on the subscriber-side. I don't see any better ideas
to fix this issue. This problem seems to be from the time Logical
Replication has been introduced, so adding others (who are generally
involved in this area) to see what they think about this bug? I think
people might not be using toasted columns for Replica Identity due to
which this problem has been reported yet but I feel this is quite a
fundamental issue and we should do something about this.

Let me summarize the problem for the ease of others.

The logical replica can go out of sync for UPDATES when there is a
toast column as part of REPLICA IDENTITY. In such cases, updates are
not replicated if the key column doesn't change because we don't log
the actual key value for the unchanged toast key. It is neither logged
as part of old_key_tuple nor for new tuple due to which we are not
able to find the tuple to be updated on the subscriber-side and the
update is ignored on the subscriber-side. We log this in DEBUG1 mode
but I don't think the user can do anything about this and the replica
will go out-of-sync. This works when the replica identity column value
is not toasted because then it will be part of the new tuple and we
use that to fetch the tuple on the subscriber.

Now, it is not clear if the key-value (for the toast column which is
part of replica identity) is not present in WAL how we can find the
tuple to update on subscriber? We can't use the toast pointer in the
new tuple to fetch the toast information as that can be different on
subscribers. The simple way is to WAL LOG the unchanged toasted value
as part of old_key_tuple, this will be required only for toast
columns.

Note that Delete works because we WAL Log the unchanged key tuple in that case.

-- 
With Regards,
Amit Kapila.




Re: [HACKERS] logical decoding of two-phase transactions

2021-07-29 Thread Peter Smith
On Thu, Jul 29, 2021 at 9:56 PM Amit Kapila  wrote:
>
> On Tue, Jul 27, 2021 at 11:41 AM Peter Smith  wrote:
> >
> > Please find attached the latest patch set v99*
> >
> > v98-0001 --> split into v99-0001 + v99-0002
> >
>
> Pushed the first refactoring patch after making few modifications as below.
> 1.
> - /* open the spool file for the committed transaction */
> + /* Open the spool file for the committed/prepared transaction */
>   changes_filename(path, MyLogicalRepWorker->subid, xid);
>
> In the above comment, we don't need to say prepared. It can be done as
> part of the second patch.
>
Updated comment in v100.

> 2.
> +apply_handle_prepare_internal(LogicalRepPreparedTxnData
> *prepare_data, char *gid)
>
> I don't think there is any need for this function to take gid as
> input. It can compute by itself instead of callers doing it.
>
OK.

> 3.
> +static TransactionId+logicalrep_read_prepare_common(StringInfo in,
> char *msgtype,
> +   LogicalRepPreparedTxnData *prepare_data)
>
> I don't think the above function needs to return xid because it is
> already present as part of prepare_data. Even, if it is required due
> to some reason for the second patch then let's do it as part of if but
> I don't think it is required for the second patch.
>
OK.

> 4.
>  /*
> - * Write PREPARE to the output stream.
> + * Common code for logicalrep_write_prepare and
> logicalrep_write_stream_prepare.
>   */
>
> Here and at a similar another place, we don't need to refer to
> logicalrep_write_stream_prepare as that is part of the second patch.
>
Updated comment in v100

> Few comments on 0002 patch:
> ==
> 1.
> +# -
> +# 2PC + STREAMING TESTS
> +# -
> +
> +# Setup logical replication (streaming = on)
> +
> +$node_B->safe_psql('postgres', "
> + ALTER SUBSCRIPTION tap_sub_B
> + SET (streaming = on);");
> +
> +$node_C->safe_psql('postgres', "
> + ALTER SUBSCRIPTION tap_sub_C
> + SET (streaming = on)");
> +
> +# Wait for subscribers to finish initialization
> +$node_A->wait_for_catchup($appname_B);
> +$node_B->wait_for_catchup($appname_C);
>
> This is not the right way to determine if the new streaming option is
> enabled on the publisher. Even if there is no restart of apply workers
> (and walsender) after you have enabled the option, the above wait will
> succeed. You need to do something like below as we are doing in
> 001_rep_changes.pl:
>
> $oldpid = $node_publisher->safe_psql('postgres',
> "SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub';"
> );
> $node_subscriber->safe_psql('postgres',
> "ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_ins_only WITH
> (copy_data = false)"
> );
> $node_publisher->poll_query_until('postgres',
> "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name
> = 'tap_sub';"
> ) or die "Timed out while waiting for apply to restart";
>
Fixed in v100 as suggested.

> 2.
> +# Create some pre-existing content on publisher (uses same DDL as
> 015_stream test)
>
> Here, in the comments, I don't see the need to same uses same DDL ...
>
Fixed in v100. Comment removed.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [HACKERS] logical decoding of two-phase transactions

2021-07-29 Thread Peter Smith
Please find attached the latest patch set v100*

v99-0002 --> v100-0001

Differences:

* Rebased to HEAD @ today (needed because some recent commits [1][2] broke v99)

* Addresses v99 review comments from Amit [1].


[1] 
https://github.com/postgres/postgres/commit/201a76183e2056c2217129e12d68c25ec9c559c8
[2] 
https://github.com/postgres/postgres/commit/91f9861242cd7dcf28fae216b1d8b47551c9159d
[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BNzcz%3DhzZJO16ZcqA7hZRa4RzGRwL_XXM%2Bdin8ehROaQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v100-0001-Add-prepare-API-support-for-streaming-transacti.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2021-07-29 Thread Amit Kapila
On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada  wrote:
>
> On Thu, Jul 29, 2021 at 2:04 PM Masahiko Sawada  wrote:
> >
> > > Yeah, it seems to be introduced by commit 0926e96c493. I've attached
> > > the patch for that.
> > >
> > > Also, I've attached the updated version patches. This version patch
> > > has pg_stat_reset_subscription_error() SQL function and sends a clear
> > > message after skipping the transaction. 0004 patch includes the
> > > skipping transaction feature and introducing RESET to ALTER
> > > SUBSCRIPTION. It would be better to separate them.
> > >

+1, to separate out the reset part.

> >
> > I've attached the new version patches that fix cfbot failure.
>
> Sorry I've attached wrong ones. Reattached the correct version patches.
>

Pushed the 0001* patch that removes the unused parameter.

Few comments on v4-0001-Add-errcontext-to-errors-of-the-applying-logical-
===
1.
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -78,6 +78,7 @@
 #include "partitioning/partbounds.h"
 #include "partitioning/partdesc.h"
 #include "pgstat.h"
+#include "replication/logicalworker.h"
 #include "rewrite/rewriteDefine.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
@@ -1899,6 +1900,9 @@ ExecuteTruncateGuts(List *explicit_rels,
  continue;
  }

+ /* Set logical replication error callback info if necessary */
+ set_logicalrep_error_context_rel(rel);
+
  /*
  * Build the lists of foreign tables belonging to each foreign server
  * and pass each list to the foreign data wrapper's callback function,
@@ -2006,6 +2010,9 @@ ExecuteTruncateGuts(List *explicit_rels,
  pgstat_count_truncate(rel);
  }

+ /* Reset logical replication error callback info */
+ reset_logicalrep_error_context_rel();
+

Setting up logical rep error context in a generic function looks a bit
odd to me. Do we really need to set up error context here? I
understand we can't do this in caller but anyway I think we are not
sending this to logical replication view as well, so not sure we need
to do it here.

2.
+/* Struct for saving and restoring apply information */
+typedef struct ApplyErrCallbackArg
+{
+ LogicalRepMsgType command; /* 0 if invalid */
+
+ /* Local relation information */
+ char*nspname; /* used for error context */
+ char*relname; /* used for error context */
+
+ TransactionId remote_xid;
+ TimestampTz committs;
+} ApplyErrCallbackArg;
+static ApplyErrCallbackArg apply_error_callback_arg =
+{
+ .command = 0,
+ .relname = NULL,
+ .nspname = NULL,
+ .remote_xid = InvalidTransactionId,
+ .committs = 0,
+};
+

Better to have a space between the above two declarations.

3. commit message:
This commit adds the error context to errors happening during applying
logical replication changes, showing the command, the relation
relation, transaction ID, and commit timestamp in the server log.

'relation' is mentioned twice.

The patch is not getting applied probably due to yesterday's commit in
this area.

-- 
With Regards,
Amit Kapila.




RE: [postgres_fdw] add local pid to fallback_application_name

2021-07-29 Thread kuroda.hay...@fujitsu.com
Dear Tom,

Thank you for replying!

> I don't think this is a great idea as-is.  People who need to do this
> sort of thing will all have their own ideas of what they need to track
> --- most obviously, it might be appropriate to include the originating
> server's name, else you don't know what machine the PID is for.

I thought this is not big problem because hostname (or IP address) can be
added to log_line_prefix. I added only local-pid because this is the only thing
that cannot be set in the parameter.

> So I think most people with this sort of requirement will be overriding
> the default application name anyway, so we might as well keep the
> default behavior simple.

Yeah, this patch assumed that application_name will be not overridden.
There is an another approach that PID adds to application_name, but it might be 
stupid.

> What would be better to think about is how to let users specify this
> kind of behavior for themselves.  I think it's possible to set
> application_name as part of a foreign server's connection options,
> but at present the result would only be a constant string.  Somebody
> who wished the PID to be in there would like to have some sort of
> formatting escape, say "%p" for PID.  Extrapolating wildly, maybe we
> could make all the %-codes known to log_line_prefix available here.

I think your argument is better than mine. I will try to implement this 
approach.
If anyone has another argument please tell me.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Reduce the number of special cases to build contrib modules on windows

2021-07-29 Thread David Rowley
On Thu, 29 Jul 2021 at 00:05, David Rowley  wrote:
> I pushed the v9 0001 and 0005 patch after adjusting the AddFile($self,
> ...) to become $self->AddFile(...)

I pushed all the patches, apart from the 0004 patch.

One weird thing I noticed with the -D define patch (245de4845) and the
LOWER_NODE adjustment is that in crc32.c we do:

#ifdef LOWER_NODE
#include 
#define TOLOWER(x) tolower((unsigned char) (x))
#else
#define TOLOWER(x) (x)
#endif

meaning when LOWER_NODE Is defined that the CRC is hashed all in lower
case, effectively making it case-insensitive. Whereas in ltree.h we
do:

#ifdef LOWER_NODE
#define FLG_CANLOOKSIGN(x) ( ( (x) & ( LQL_NOT | LVAR_ANYEND |
LVAR_SUBLEXEME ) ) == 0 )
#else
#define FLG_CANLOOKSIGN(x) ( ( (x) & ( LQL_NOT | LVAR_ANYEND |
LVAR_SUBLEXEME | LVAR_INCASE ) ) == 0 )
#endif

So effectively if LOWER_NODE is defined then we *don't* pass the
LVAR_INCASE which makes the comparisons case-sensitive!  I've never
used ltree before, so might just be misunderstanding something, but
this does smell quite buggy to me. However, I just made it work how it
always has worked.

> 0004 still needs work.

I adjusted this one so that it does the right thing for all the
existing .l and .y files and correctly adds the relevant .c file when
required, but to be honest, I just made this work by checking that the
each of the vcxproj files match before and after the change.

There is code that parses the likes of the following in the cube
contrib module's Makefile:

# cubescan is compiled as part of cubeparse
cubeparse.o: cubescan.c

Here, since cubescan.c is not added to the project files for
compilation, I made that just call the: AddDependantFiles function,
which just searches for .l and .y files that exist with the same name,
but does not add the actual file passed to the function. This means
that we add cubescan.l to the project but not cubscan.c.

This is different from what happens with ecpg with pgc.l. We also add
pgc.c to the project in that case because it's mentioned in OBJS in
src/interfaces/ecpg/preproc/Makefile.

The only change in the outputted vcxproj files that the attached
produces is an order change in the AdditionalDependencies of
libpq_pipeline.vcxproj

I also managed to remove libpq_pipeline from contrib_uselibpgport and
contrib_uselibpgcommon. The parsing for SHLIB_LINK_INTERNAL and
PG_LIBS_INTERNAL only allowed for = not +=.

Does anyone have any thoughts on where we should draw the line on
parsing Makefiles?  I'm a little worried that I'm adding pasing just
for exactly how the Makefiles are today and that it could easily be
broken if something is adjusted later. I'm not all that skilled with
make, so I'm struggling to judge this for myself.

David


v10-0001-Remove-some-special-cases-from-MSVC-build-script.patch
Description: Binary data


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Julien Rouhaud
On Thu, Jul 29, 2021 at 02:14:56PM -0400, Jan Wieck wrote:
> 
> I presume that pg_upgrade on a database with that extension installed would
> silently succeed and have the pg_catalog as well as public (or wherever)
> version of that function present.

I'll have to run a pg_upgrade with it to be 100% sure, but given that this is a
plpgsql function and since the created function is part of the extension
dependencies (and looking at pg_dump source code for binary-upgrade mode), I'm
almost certain that the upgraded cluster would have the pg96- version of the
function even if upgrading to pg9.6+.

Note that in that case the extension would appear to work normally, but the
only way to simulate missing_ok = true is to add a BEGIN/EXCEPTION block.

Since this wrapper function is extensively used, it seems quite possible to
lead to overflowing the snapshot subxip array, as the extension basically runs
every x minutes many functions in a single trannsaction to retrieve many
performance metrics.  This can ruin the performance.

This was an acceptable trade off for people still using pg96- in 2021, but
would be silly to have on more recent versions.

Unfortunately I don't see any easy way to avoid that, as there isn't any
guarantee that a new version will be available after the upgrade.  AFAICT the
only way to ensure that the correct version of the function is present from an
extension point of view would be to add a dedicated function to overwrite any
object that depends on the servers version and document the need to call that
after a pg_upgrade.




Re: speed up verifying UTF-8

2021-07-29 Thread John Naylor
On Mon, Jul 26, 2021 at 8:56 AM John Naylor 
wrote:
>
> >
> > Does that (and "len >= 32" condition) mean the patch does not improve
validation of the shorter strings (the ones less than 32 bytes)?
>
> Right. Also, the 32 byte threshold was just a temporary need for testing
32-byte stride -- testing different thresholds wouldn't hurt.  I'm not
terribly concerned about short strings, though, as long as we don't
regress.

I put together the attached quick test to try to rationalize the fast-path
threshold. (In case it isn't obvious, it must be at least 16 on all builds,
since wchar.c doesn't know which implementation it's calling, and SSE
register width sets the lower bound.) I changed the threshold first to 16,
and then 10, which will force using the byte-at-a-time code.

If we have only 16 bytes in the input, it still seems to be faster to use
SSE, even though it's called through a function pointer on x86. I didn't
test the DFA path, but I don't think the conclusion would be different.
I'll include the 16 threshold next time I need to update the patch.

Macbook x86, clang 12:

master + use 16:
 asc16 | asc32 | asc64 | mb16 | mb32 | mb64
---+---+---+--+--+--
   270 |   279 |   282 |  291 |  296 |  304

force byte-at-a-time:
 asc16 | asc32 | asc64 | mb16 | mb32 | mb64
---+---+---+--+--+--
   277 |   292 |   310 |  296 |  317 |  362

--
John Naylor
EDB: http://www.enterprisedb.com


mbverifystr-threshold.sql
Description: Binary data


Re: Autovacuum on partitioned table (autoanalyze)

2021-07-29 Thread Andres Freund
Hi,

CCing RMT because I think we need to do something about this for v14.

On 2021-07-27 19:23:42 -0700, Andres Freund wrote:
> On 2021-07-22 13:54:58 -0700, Andres Freund wrote:
> > On 2021-04-08 01:20:14 -0400, Alvaro Herrera wrote:
> > > On 2021-Apr-07, Alvaro Herrera wrote:
> > >
> > > > OK, I bit the bullet and re-did the logic in the way I had proposed
> > > > earlier in the thread: do the propagation on the collector's side, by
> > > > sending only the list of ancestors: the collector can read the tuple
> > > > change count by itself, to add it to each ancestor.  This seems less
> > > > wasteful.  Attached is v16 which does it that way and seems to work
> > > > nicely under my testing.
> > >
> > > Pushed with this approach.  Thanks for persisting with this.
> >
> > I'm looking at this in the context of rebasing & polishing the shared
> > memory stats patch.
> >
> > I have a few questions / concerns:
>
> Another one, and I think this might warrant thinking about for v14:
>
> Isn't this going to create a *lot* of redundant sampling?  Especially if you
> have any sort of nested partition tree. In the most absurd case a partition
> with n parents will get sampled n times, solely due to changes to itself.
>
> Look at the following example:
>
> BEGIN;
> DROP TABLE if exists p;
> CREATE TABLE p (i int) partition by range(i);
> CREATE TABLE p_0 PARTITION OF p FOR VALUES FROM (   0) to (5000) 
> partition by range(i);
> CREATE TABLE p_0_0 PARTITION OF p_0 FOR VALUES FROM (   0) to (1000);
> CREATE TABLE p_0_1 PARTITION OF p_0 FOR VALUES FROM (1000) to (2000);
> CREATE TABLE p_0_2 PARTITION OF p_0 FOR VALUES FROM (2000) to (3000);
> CREATE TABLE p_0_3 PARTITION OF p_0 FOR VALUES FROM (3000) to (4000);
> CREATE TABLE p_0_4 PARTITION OF p_0 FOR VALUES FROM (4000) to (5000);
> -- create some initial data
> INSERT INTO p select generate_series(0, 5000 - 1) data FROM 
> generate_series(1, 100) reps;
> COMMIT;
>
> UPDATE p_0_4 SET i = i;
>
>
> Whenever the update is executed, all partitions will be sampled at least twice
> (once for p and once for p_0), with p_0_4 sampled three times.
>
> Of course, this is an extreme example, but it's not hard to imagine cases
> where v14 will cause the number of auto-analyzes increase sufficiently to bog
> down autovacuum to a problematic degree.
>
>
> Additionally, while analyzing all child partitions for a partitioned tables
> are AccessShareLock'ed at once. If a partition hierarchy has more than one
> level, it actually is likely that multiple autovacuum workers will end up
> processing the ancestors separately.  This seems like it might contribute to
> lock exhaustion issues with larger partition hierarchies?


I started to write a patch rejiggering autovacuum.c portion of this
change. While testing it I hit the case of manual ANALYZEs leaving
changes_since_analyze for partitioned tables in a bogus state - without a
minimally invasive way to fix that. After a bit of confused staring I realized
that the current code has a very similar problem:

Using the same setup as above:

INSERT INTO p VALUES (0,0); /* repeat as many times as desired */
ANALYZE p_0_0;

At this point the system will have lost track of the changes to p_0_0, unless
an autovacuum worker was launched between the INSERTs and the ANALYZE (which
would cause pgstat_report_anl_ancestors() to report the change count upwards).

There appears to be code trying to address that, but I don't see how it
ever does anything meaningful?

/*
 * Now report ANALYZE to the stats collector.  For regular tables, we do
 * it only if not doing inherited stats.  For partitioned tables, we 
only
 * do it for inherited stats. (We're never called for not-inherited 
stats
 * on partitioned tables anyway.)
 *
 * Reset the changes_since_analyze counter only if we analyzed all
 * columns; otherwise, there is still work for auto-analyze to do.
 */
if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
pgstat_report_analyze(onerel, totalrows, totaldeadrows,
  (va_cols == NIL));

/*
 * If this is a manual analyze of all columns of a permanent leaf
 * partition, and not doing inherited stats, also let the collector know
 * about the ancestor tables of this partition.  Autovacuum does the
 * equivalent of this at the start of its run, so there's no reason to 
do
 * it there.
 */
if (!inh && !IsAutoVacuumWorkerProcess() &&
(va_cols == NIL) &&
onerel->rd_rel->relispartition &&
onerel->rd_rel->relkind == RELKIND_RELATION &&
onerel->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
{
pgstat_report_anl_ancestors(RelationGetRelid(onerel));
}

The pgstat_report_analyze() triggers pgstat_recv_analyze() to reset the
counter that pg

Re: Replace l337sp34k in comments.

2021-07-29 Thread Michael Paquier
On Fri, Jul 30, 2021 at 09:46:59AM +1200, Gavin Flower wrote:
> That 'feels' right to me.
> 
> Though in code, possibly it would be better to just use 'up-to-date' in code
> for consistency and to make the it easier to grep?

The change in llvmjit_expr.c may not look like an adjective though,
which I admit can be a bit confusing.  Still that does not look
completely wrong to me either.
--
Michael


signature.asc
Description: PGP signature


Re: A qsort template

2021-07-29 Thread John Naylor
DER BY 1 DESC OFFSET '$ROWS''
#		run_query $1 $wm 0 'SELECT * FROM txt_test ORDER BY 1 DESC OFFSET '$ROWS''
		run_index $1 $wm 0 'CREATE INDEX x ON txt_test (a); DROP INDEX x'

		run_query $1 $wm 0 'SELECT * FROM num_test ORDER BY 1 OFFSET '$ROWS''
#		run_query $1 $wm 0 'SELECT * FROM num_test ORDER BY 1 DESC OFFSET '$ROWS''
		run_index $1 $wm 0 'CREATE INDEX x ON num_test (a); DROP INDEX x'

	done

}

create_tables

log "sort benchmark $ROWS"

load_random
run_queries "random"

load_sorted
run_queries "sorted"

load_almost_sorted
run_queries "almost_sorted"

load_reversed
run_queries "reversed"

load_organ_pipe
run_queries "organ_pipe"

load_rotated
run_queries "rotated"

load_0_1
run_queries "0_1"


test-acc-tuplesort-20210729.ods
Description: application/vnd.oasis.opendocument.spreadsheet


0001-WIP-Accelerate-tuple-sorting-for-common-types.patch
Description: Binary data


Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep

2021-07-29 Thread Bossart, Nathan
On 7/29/21, 12:59 AM, "Kyotaro Horiguchi"  wrote:
> At Thu, 29 Jul 2021 09:52:08 +0900, Michael Paquier  
> wrote in
>> On Wed, Jul 28, 2021 at 08:28:12PM +, Bossart, Nathan wrote:
>> > On 7/28/21, 11:32 AM, "Tom Lane"  wrote:
>> >> I follow the idea of using WaitLatch to ensure that the delays are
>> >> interruptible by postmaster signals, but even that isn't worth a
>> >> lot given the expected use of these things.  I don't see a need to
>> >> expend any extra effort on wait-reporting.
>> >
>> > +1.  The proposed patch doesn't make the delay visibility any worse
>> > than what's already there.
>>
>> Agreed to just drop the patch (my opinion about this patch is
>> unchanged).  Not to mention that wait events are not available at SQL
>> level at this stage yet.
>
> I'm +1 to not adding wait event stuff at all. So the only advantage
> this patch would offer is interruptivity. I vote +-0.0 for adding that
> interruptivity (+1.0 from the previous opinion of mine:p).

I'm still in favor of moving to WaitLatch() for pre/post_auth_delay,
but I don't think we should worry about the wait reporting stuff.  The
patch doesn't add a tremendous amount of complexity, it improves the
behavior on postmaster crashes, and it follows the best practice
described in pgsleep.c of using WaitLatch() for long sleeps.

Nathan



Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Alvaro Herrera
On 2021-Jul-29, Bruce Momjian wrote:

> On Thu, Jul 29, 2021 at 06:29:11PM -0400, Tom Lane wrote:

> > No, because dump/restore does not have this issue.  Regular pg_dump just
> > issues "CREATE EXTENSION" commands, so you automatically get the target
> > server's default version.
> 
> Oh, so pg_upgrade does it differently so the oids are preserved?

Have a look at pg_dump --binary-upgrade output :-)

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 18:39, Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 06:29:11PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Thu, Jul 29, 2021 at 06:19:56PM -0400, Tom Lane wrote:
> > >> I suggested awhile ago that pg_upgrade should look into
> > >> pg_available_extensions in the new cluster, and prepare
> > >> a script with ALTER EXTENSION UPDATE commands for
> > >> anything that's installed but is not the (new cluster's)
> > >> default version.
> >
> > > I can do that, but I would think a pg_dump/restore would also have this
> > > issue, so should this be more generic?
> >
> > No, because dump/restore does not have this issue.  Regular pg_dump just
> > issues "CREATE EXTENSION" commands, so you automatically get the target
> > server's default version.
>
> Oh, so pg_upgrade does it differently so the oids are preserved?
>
>
I suspect this is part of --binary_upgrade mode

Dave

> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 06:29:11PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Thu, Jul 29, 2021 at 06:19:56PM -0400, Tom Lane wrote:
> >> I suggested awhile ago that pg_upgrade should look into
> >> pg_available_extensions in the new cluster, and prepare
> >> a script with ALTER EXTENSION UPDATE commands for
> >> anything that's installed but is not the (new cluster's)
> >> default version.
> 
> > I can do that, but I would think a pg_dump/restore would also have this
> > issue, so should this be more generic?
> 
> No, because dump/restore does not have this issue.  Regular pg_dump just
> issues "CREATE EXTENSION" commands, so you automatically get the target
> server's default version.

Oh, so pg_upgrade does it differently so the oids are preserved?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, Jul 29, 2021 at 06:19:56PM -0400, Tom Lane wrote:
>> I suggested awhile ago that pg_upgrade should look into
>> pg_available_extensions in the new cluster, and prepare
>> a script with ALTER EXTENSION UPDATE commands for
>> anything that's installed but is not the (new cluster's)
>> default version.

> I can do that, but I would think a pg_dump/restore would also have this
> issue, so should this be more generic?

No, because dump/restore does not have this issue.  Regular pg_dump just
issues "CREATE EXTENSION" commands, so you automatically get the target
server's default version.

regards, tom lane




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 06:19:56PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > I think we need to first give clear instructions on how to find out if
> > an extension update is available, and then how to update it.  I am
> > thinking we should supply a query which reports all extensions that can
> > be upgraded, at least for contrib.
> 
> I suggested awhile ago that pg_upgrade should look into
> pg_available_extensions in the new cluster, and prepare
> a script with ALTER EXTENSION UPDATE commands for
> anything that's installed but is not the (new cluster's)
> default version.

I can do that, but I would think a pg_dump/restore would also have this
issue, so should this be more generic?  If we had a doc section about
that, we could add it a step to run in the pg_upgrade instructions.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Tom Lane
Bruce Momjian  writes:
> I think we need to first give clear instructions on how to find out if
> an extension update is available, and then how to update it.  I am
> thinking we should supply a query which reports all extensions that can
> be upgraded, at least for contrib.

I suggested awhile ago that pg_upgrade should look into
pg_available_extensions in the new cluster, and prepare
a script with ALTER EXTENSION UPDATE commands for
anything that's installed but is not the (new cluster's)
default version.

regards, tom lane




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
 On Thu, Jul 29, 2021 at 05:01:24PM -0400, Tom Lane wrote:
> Dave Cramer  writes:
> > So back to the original motivation for bringing this up. Recall that a
> > cluster was upgraded. Everything appeared to work fine, except that the
> > definitions of the functions were slightly different enough to cause a
> > fatal issue on restoring a dump from pg_dump.
> > Since pg_upgrade is now part of the core project, we need to make sure this
> > is not possible or be much more insistent that the user needs to upgrade
> > any extensions that require it.
> 
> TBH, this seems like mostly the fault of the extension author.
> The established design is that the SQL-level objects will be
> carried forward verbatim by pg_upgrade.  Therefore, any newer-version
> shared library MUST be able to cope sanely with SQL objects from
> some previous revision.  The contrib modules provide multiple
> examples of ways to do this.
> 
> If particular extension authors don't care to go to that much
> trouble, it's on their heads to document that their extensions
> are unsafe to pg_upgrade.

I think we need to first give clear instructions on how to find out if
an extension update is available, and then how to update it.  I am
thinking we should supply a query which reports all extensions that can
be upgraded, at least for contrib.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Replace l337sp34k in comments.

2021-07-29 Thread Gavin Flower

On 30/07/21 12:51 am, Geoff Winkless wrote:
On Thu, 29 Jul 2021 at 11:22, Andrew Dunstan > wrote:


Personally, I would have written this as just "up to date", I don't
think the hyphens are required.

FWIW Mirriam-Webster and the CED suggest "up-to-date" when before a 
noun, so the changes should be "up-to-date answer" but "are up to date".


https://dictionary.cambridge.org/dictionary/english/up-to-date 



Geoff


That 'feels' right to me.

Though in code, possibly it would be better to just use 'up-to-date' in 
code for consistency and to make the it easier to grep?


As a minor aside: double quotes should be used for speech and single 
quotes for quoting!



Cheers,
Gavin





Re: Doc: Fixed the result of the bit_count example

2021-07-29 Thread Daniel Gustafsson
> On 29 Jul 2021, at 11:35, Daniel Gustafsson  wrote:

> I'll apply this shortly backpatched to 14 where bit_count was introduced.


And done, thanks!

--
Daniel Gustafsson   https://vmware.com/





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Tom Lane
Dave Cramer  writes:
> So back to the original motivation for bringing this up. Recall that a
> cluster was upgraded. Everything appeared to work fine, except that the
> definitions of the functions were slightly different enough to cause a
> fatal issue on restoring a dump from pg_dump.
> Since pg_upgrade is now part of the core project, we need to make sure this
> is not possible or be much more insistent that the user needs to upgrade
> any extensions that require it.

TBH, this seems like mostly the fault of the extension author.
The established design is that the SQL-level objects will be
carried forward verbatim by pg_upgrade.  Therefore, any newer-version
shared library MUST be able to cope sanely with SQL objects from
some previous revision.  The contrib modules provide multiple
examples of ways to do this.

If particular extension authors don't care to go to that much
trouble, it's on their heads to document that their extensions
are unsafe to pg_upgrade.

regards, tom lane




Re: Case expression pushdown

2021-07-29 Thread Tom Lane
Alexander Pyhalov  writes:
> [ 0001-Allow-pushing-CASE-expression-to-foreign-server-v7.patch ]

I looked this over.  It's better than before, but the collation
handling is still not at all correct.  We have to consider that a
CASE's arg expression supplies the collation for a contained
CaseTestExpr, otherwise we'll come to the wrong conclusions about
whether "CASE foreignvar WHEN ..." is shippable, if the foreignvar
is what's determining collation of the comparisons.

This means that the CaseExpr level of recursion has to pass data down
to the CaseTestExpr level.  In the attached, I did that by adding an
additional argument to foreign_expr_walker().  That's a bit invasive,
but it's not awful.  I thought about instead adding fields to the
foreign_loc_cxt struct.  But that seemed considerably messier in the
end, because we'd then have some fields that are information sourced
at one recursion level and some that are info sourced at another
level.

I also whacked the regression test cases around a lot.  They seemed
to spend a lot of time on irrelevant combinations, while failing to
check the things that matter, namely whether collation-based pushdown
decisions are made correctly.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c..d0cbfa8ad9 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -116,7 +116,8 @@ typedef struct deparse_expr_cxt
  */
 static bool foreign_expr_walker(Node *node,
 foreign_glob_cxt *glob_cxt,
-foreign_loc_cxt *outer_cxt);
+foreign_loc_cxt *outer_cxt,
+foreign_loc_cxt *case_arg_cxt);
 static char *deparse_type_name(Oid type_oid, int32 typemod);
 
 /*
@@ -158,6 +159,7 @@ static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node,
 static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context);
 static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context);
 static void deparseNullTest(NullTest *node, deparse_expr_cxt *context);
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
 static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
 			 deparse_expr_cxt *context);
@@ -254,7 +256,7 @@ is_foreign_expr(PlannerInfo *root,
 		glob_cxt.relids = baserel->relids;
 	loc_cxt.collation = InvalidOid;
 	loc_cxt.state = FDW_COLLATE_NONE;
-	if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt))
+	if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt, NULL))
 		return false;
 
 	/*
@@ -283,6 +285,10 @@ is_foreign_expr(PlannerInfo *root,
  *
  * In addition, *outer_cxt is updated with collation information.
  *
+ * case_arg_cxt is NULL if this subexpression is not inside a CASE-with-arg.
+ * Otherwise, it points to the collation info derived from the arg expression,
+ * which must be consulted by any CaseTestExpr.
+ *
  * We must check that the expression contains only node types we can deparse,
  * that all types/functions/operators are safe to send (they are "shippable"),
  * and that all collations used in the expression derive from Vars of the
@@ -294,7 +300,8 @@ is_foreign_expr(PlannerInfo *root,
 static bool
 foreign_expr_walker(Node *node,
 	foreign_glob_cxt *glob_cxt,
-	foreign_loc_cxt *outer_cxt)
+	foreign_loc_cxt *outer_cxt,
+	foreign_loc_cxt *case_arg_cxt)
 {
 	bool		check_type = true;
 	PgFdwRelationInfo *fpinfo;
@@ -432,17 +439,17 @@ foreign_expr_walker(Node *node,
  * result, so do those first and reset inner_cxt afterwards.
  */
 if (!foreign_expr_walker((Node *) sr->refupperindexpr,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 inner_cxt.collation = InvalidOid;
 inner_cxt.state = FDW_COLLATE_NONE;
 if (!foreign_expr_walker((Node *) sr->reflowerindexpr,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 inner_cxt.collation = InvalidOid;
 inner_cxt.state = FDW_COLLATE_NONE;
 if (!foreign_expr_walker((Node *) sr->refexpr,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 
 /*
@@ -478,7 +485,7 @@ foreign_expr_walker(Node *node,
  * Recurse to input subexpressions.
  */
 if (!foreign_expr_walker((Node *) fe->args,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 
 /*
@@ -526,7 +533,7 @@ foreign_expr_walker(Node *node,
  * Recurse to input subexpressions.
  */
 if (!foreign_expr_walker((Node *) oe->args,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 
 /*
@@ -566,7 +573,7 @@ foreign_expr_walker(Node *node,
  * Recurse to input subexpressions.
  */
 if (!foreign_expr_walker((Node *) oe->args,
-		 glob_c

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 12:28 PM Dave Cramer  wrote:

> So back to the original motivation for bringing this up. Recall that a
> cluster was upgraded. Everything appeared to work fine, except that the
> definitions of the functions were slightly different enough to cause a
> fatal issue on restoring a dump from pg_dump.
> Since pg_upgrade is now part of the core project, we need to make sure
> this is not possible or be much more insistent that the user needs to
> upgrade any extensions that require it.
>

I'm missing something here because I do not recall that level of detail
being provided.  The first email was simply an observation that the
pg_upgraded version and the create extension version were different in the
new cluster - which is working as designed (opinions of said design, at
least right here and now, are immaterial).

>From your piecemeal follow-on descriptions I do see that pg_dump seems to
be involved - though a self-contained demonstration is not available that I
can find.  But so far as pg_dump is concerned it just needs to export the
current version each database is running for a given extension, and
pg_restore issue a CREATE EXTENSION for the same version when prompted.  I
presume it does this correctly but admittedly haven't checked.  IOW, if
pg_dump is failing here it is more likely its own bug and should be fixed
rather than blame pg_upgrade.  Or its pg_stat_statement's bug and it should
be fixed.

In theory the procedure and requirements imposed by pg_upgrade here seem
reasonable.  Fewer moving parts during the upgrade is strictly better.  The
documentation was not clear on how things worked, and so its being cleaned
up, but the how hasn't been shown yet to be a problem nor that simply
running alter extension would be an appropriate solution for this single
case let alone in general.  Since running alter extension manually is
simple constructing such a test case and proving that the alter extension
at least works for it should be straight-forward.

Without that I cannot support changing the behavior or even saying that
users must run alter extension manually to overcome a limitation in
pg_upgrade.  They should do so in order to keep their code base current and
running supported code - but that is a judgement we've always left to the
DBA, with the exception of strongly discouraging not updating to the newest
point release and getting off unsupported major releases.

David J.


David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
Dave Cramer


On Thu, 29 Jul 2021 at 15:06, Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 01:43:09PM -0400, Álvaro Herrera wrote:
> > On 2021-Jul-29, Bruce Momjian wrote:
> >
> > > + If the old cluster used extensions, whether from
> > > + contrib or some other source, it used
> > > + shared object files (or DLLs) to implement these extensions,
> e.g.,
> > > + pgcrypto.so.  Now, shared object files
> matching
> > > + the new server binary must be installed in the new cluster,
> usually
> > > + via operating system commands.  Do not load the schema
> definitions,
> > > + e.g., CREATE EXTENSION pgcrypto, because these
> > > + will be copied from the old cluster.  (Extensions should be
> upgraded
> > > + later using ALTER EXTENSION ... UPGRADE.)
> >
> > I propose this:
> >
> > 
> >   If the old cluster used shared-object files (or DLLs) for extensions
> >   or other loadable modules, install recompiled versions of those files
> >   onto the new cluster.
> >   Do not install the extension themselves (i.e., do not run
> >   CREATE EXTENSION), because extension definitions
> >   will be carried forward from the old cluster.
> > 
> >
> > 
> >   Extensions can be upgraded after pg_upgrade completes using
> >   ALTER EXTENSION ... UPGRADE, on a per-database
> >   basis.
> > 
> >
> > I suggest " ... for extensions or other loadable modules" because
> > loadable modules aren't necessarily for extensions.  Also, it's
> > perfectly possible to have extension that don't have a loadable module.
>
> Yes, good point.
>
> > I suggest "extension definitions ... carried forward" instead of
> > "extensions ... copied" (your proposed text) to avoid the idea that
> > files are copied; use it instead of "schema definitions ... upgraded"
> > (the current docs) to avoid the idea that they are actually upgraded;
> > also, "schema definition" seems a misleading term to use here.
>
> I used the term "duplicated".
>
> > I suggest "can be upgraded" rather than "should be upgraded" because
> > we're not making a recommendation, merely stating the fact that it is
> > possible to do so.
>
> Agreed.  Most extensions don't have updates between major versions.
>
> > I suggest using the imperative mood, to be consistent with the
> > surrounding text.  (Applies to the first para; the second para is
> > informative.)
>
> OK.
>
> > I haven't seen it mentioned in the thread, but I think the final phrase
> > of this  should be a separate step,
> >
> > 
> >  Copy custom full-text search files
> >  
> >   Copy any custom full text search file (dictionary, synonym, thesaurus,
> >   stop word list) to the new server.
> >  
> > 
> >
> > While this is closely related to extensions, it's completely different.
>
> Agreed.  See attached patch.
>

So back to the original motivation for bringing this up. Recall that a
cluster was upgraded. Everything appeared to work fine, except that the
definitions of the functions were slightly different enough to cause a
fatal issue on restoring a dump from pg_dump.
Since pg_upgrade is now part of the core project, we need to make sure this
is not possible or be much more insistent that the user needs to upgrade
any extensions that require it.

I believe we should be doing more than making a recommendation.

Dave

>
>


Re: needless complexity in StartupXLOG

2021-07-29 Thread Andres Freund
Hi,

On 2021-07-29 12:49:19 +0800, Julien Rouhaud wrote:
> On Thu, Jul 29, 2021 at 3:28 AM Andres Freund  wrote:
> >
> > [1] I really wish somebody had the energy to just remove single user and
> > bootstrap modes. The degree to which they increase complexity in the rest of
> > the system is entirely unreasonable. There's not actually any reason
> > bootstrapping can't happen with checkpointer et al running, it's just
> > autovacuum that'd need to be blocked.
> 
> Any objection to adding an entry for that in the wiki TODO?

Not sure there's enough concensus on the idea for that. I personally
think that's a good approach at reducing relevant complexity, but I
don't know if anybody agrees...

Greetings,

Andres Freund




Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Andres Freund
Hi,

On 2021-07-29 13:15:53 -0400, Robert Haas wrote:
> I don't know if this is better, but I do kind of like the fact that
> the basic representation is just an array. It makes it really easy to
> predict how much memory will be needed for a given number of dead
> TIDs, and it's very DSM-friendly as well.

I think those advantages are far outstripped by the big disadvantage of
needing to either size the array accurately from the start, or to
reallocate the whole array.  Our current pre-allocation behaviour is
very wasteful for most vacuums but doesn't handle large work_mem at all,
causing unnecessary index scans.

Greetings,

Andres Freund




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 02:19:17PM -0400, Álvaro Herrera wrote:
> On 2021-Jul-29, Dave Cramer wrote:
> 
> > > I suggest "can be upgraded" rather than "should be upgraded" because
> > > we're not making a recommendation, merely stating the fact that it is
> > > possible to do so.
> >
> > Why not recommend it? I was going to suggest that we actually run alter
> > extension upgrade ... on all of them as a solution.
> > 
> > What are the downsides to upgrading them all by default ? AFAIK if they
> > need upgrading this should run all of the SQL scripts, if they don't then
> > this should be a NO-OP.
> 
> I'm not aware of any downsides, and I think it would be a good idea to
> do so, but I also think that it would be good to sort out the docs
> precisely (a backpatchable doc change, IMV) and once that is done we can
> discuss how to improve pg_upgrade so that users no longer need to do
> that (a non-backpatchable code change).  Incremental improvements and
> all that ...

Agreed.  I don't think we have any consistent set of steps for detecting
and upgrading extensions --- that needs a lot more research.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 01:43:09PM -0400, Álvaro Herrera wrote:
> On 2021-Jul-29, Bruce Momjian wrote:
> 
> > + If the old cluster used extensions, whether from
> > + contrib or some other source, it used
> > + shared object files (or DLLs) to implement these extensions, e.g.,
> > + pgcrypto.so.  Now, shared object files matching
> > + the new server binary must be installed in the new cluster, usually
> > + via operating system commands.  Do not load the schema definitions,
> > + e.g., CREATE EXTENSION pgcrypto, because these
> > + will be copied from the old cluster.  (Extensions should be upgraded
> > + later using ALTER EXTENSION ... UPGRADE.)
> 
> I propose this:
> 
> 
>   If the old cluster used shared-object files (or DLLs) for extensions
>   or other loadable modules, install recompiled versions of those files
>   onto the new cluster.
>   Do not install the extension themselves (i.e., do not run
>   CREATE EXTENSION), because extension definitions
>   will be carried forward from the old cluster.
> 
> 
> 
>   Extensions can be upgraded after pg_upgrade completes using
>   ALTER EXTENSION ... UPGRADE, on a per-database
>   basis.
> 
> 
> I suggest " ... for extensions or other loadable modules" because
> loadable modules aren't necessarily for extensions.  Also, it's
> perfectly possible to have extension that don't have a loadable module.

Yes, good point.

> I suggest "extension definitions ... carried forward" instead of
> "extensions ... copied" (your proposed text) to avoid the idea that
> files are copied; use it instead of "schema definitions ... upgraded"
> (the current docs) to avoid the idea that they are actually upgraded;
> also, "schema definition" seems a misleading term to use here.

I used the term "duplicated".

> I suggest "can be upgraded" rather than "should be upgraded" because
> we're not making a recommendation, merely stating the fact that it is
> possible to do so.

Agreed.  Most extensions don't have updates between major versions.

> I suggest using the imperative mood, to be consistent with the
> surrounding text.  (Applies to the first para; the second para is
> informative.)

OK.

> I haven't seen it mentioned in the thread, but I think the final phrase
> of this  should be a separate step,
> 
> 
>  Copy custom full-text search files
>  
>   Copy any custom full text search file (dictionary, synonym, thesaurus,
>   stop word list) to the new server.
>  
> 
> 
> While this is closely related to extensions, it's completely different.

Agreed.  See attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..0a71465805 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -299,17 +299,27 @@ make prefix=/usr/local/pgsql.new install

 

-Install custom shared object files
+Install extension shared object files
 
 
- Install any custom shared object files (or DLLs) used by the old cluster
- into the new cluster, e.g., pgcrypto.so,
- whether they are from contrib
- or some other source. Do not install the schema definitions, e.g.,
- CREATE EXTENSION pgcrypto, because these will be upgraded
- from the old cluster.
- Also, any custom full text search files (dictionary, synonym,
- thesaurus, stop words) must also be copied to the new cluster.
+ Many extensions and custom modules, whether from
+ contrib or another source, use shared object
+ files (or DLLs), e.g., pgcrypto.so.  If the old
+ cluster used these, shared object files matching the new server binary
+ must be installed in the new cluster, usually via operating system
+ commands.  Do not load the schema definitions, e.g., CREATE
+ EXTENSION pgcrypto, because these will be duplicated from
+ the old cluster.  (Extensions with available updates can be processed
+ later using ALTER EXTENSION ... UPDATE.)
+
+   
+
+   
+Copy custom full-text search files
+
+
+ Copy any custom full text search files (dictionary, synonym,
+ thesaurus, stop words) from the old to the new cluster.
 

 
@@ -494,10 +504,10 @@ pg_upgrade.exe
  
 
  
-  Install custom shared object files
+  Install extension shared object files
 
   
-   Install the same custom shared object files on the new standbys
+   Install the same extension shared object files on the new standbys
that you installed in the new primary cluster.
   
  


Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov

Robert Haas писал 2021-07-29 20:15:
On Thu, Jul 29, 2021 at 5:11 AM Masahiko Sawada  
wrote:

Indeed. Given that the radix tree itself has other use cases, I have
no concern about using radix tree for vacuum's dead tuples storage. It
will be better to have one that can be generally used and has some
optimizations that are helpful also for vacuum's use case, rather than
having one that is very optimized only for vacuum's use case.


What I'm about to say might be a really stupid idea, especially since
I haven't looked at any of the code already posted, but what I'm
wondering about is whether we need a full radix tree or maybe just a
radix-like lookup aid. For example, suppose that for a relation <= 8MB
in size, we create an array of 1024 elements indexed by block number.
Each element of the array stores an offset into the dead TID array.
When you need to probe for a TID, you look up blkno and blkno + 1 in
the array and then bsearch only between those two offsets. For bigger
relations, a two or three level structure could be built, or it could
always be 3 levels. This could even be done on demand, so you
initialize all of the elements to some special value that means "not
computed yet" and then fill them the first time they're needed,
perhaps with another special value that means "no TIDs in that block".


8MB relation is not a problem, imo. There is no need to do anything to
handle 8MB relation.

Problem is 2TB relation. It has 256M pages and, lets suppose, 3G dead
tuples.

Then offset array will be 2GB and tuple offset array will be 6GB (2 byte
offset per tuple). 8GB in total.

We can make offset array only for higher 3 bytes of block number.
We then will have 1M offset array weighted 8MB, and there will be array
of 3byte tuple pointers (1 remaining byte from block number, and 2 bytes
from Tuple) weighted 9GB.

But using per-batch compression schemes, there could be amortized
4 byte per page and 1 byte per tuple: 1GB + 3GB = 4GB memory.
Yes, it is not as guaranteed as in array approach. But 95% of time it is
such low and even lower. And better: more tuples are dead - better
compression works. Page with all tuples dead could be encoded as little
as 5 bytes. Therefore, overall memory consumption is more stable and
predictive.

Lower memory consumption of tuple storage means there is less chance
indexes should be scanned twice or more times. It gives more
predictability in user experience.


I don't know if this is better, but I do kind of like the fact that
the basic representation is just an array. It makes it really easy to
predict how much memory will be needed for a given number of dead
TIDs, and it's very DSM-friendly as well.


Whole thing could be encoded in one single array of bytes. Just give
"pointer-to-array"+"array-size" to constructor, and use "bump allocator"
inside. Complex logical structure doesn't imply "DSM-unfriendliness".
Hmm I mean if it is suitably designed.

In fact, my code uses bump allocator internally to avoid "per-allocation
overhead" of "aset", "slab" or "generational". And IntegerSet2 version
even uses it for all allocations since it has no reallocatable parts.

Well, if datastructure has reallocatable parts, it could be less 
friendly

to DSM.

regards,

---
Yura Sokolov
y.soko...@postgrespro.ru
funny.fal...@gmail.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 11:35 AM Bruce Momjian  wrote:

>
> Oh, and you can't use the same installation procedures as when you
> installed the extension because that probably included CREATE EXTENSION.
> This really highlights why this is tricky to explain --- we need the
> binaries, but not the SQL that goes with it.
>

Maybe...but the fact that "installation to the O/S" is cluster-wide and
"CREATE EXTENSION" is database-specific I believe this will generally take
care of itself in practice, especially if we leave the part (but ignore any
installation instructions that advise executing create extension).

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 11:28 AM Bruce Momjian  wrote:

> I am sorry but none of your suggestions are exciting me --- they seem to
> get into too much detail for the context.
>

Fair, I still need to consider Alvaro's anyway, but given the amount of
general angst surrounding performing a pg_upgrade I do not feel that being
detailed is necessarily a bad thing, so long as the detail is relevant.
But I'll keep this in mind for my next reply.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 02:28:55PM -0400, Bruce Momjian wrote:
> On Thu, Jul 29, 2021 at 10:00:12AM -0700, David G. Johnston wrote:
> > I'm warming up to "should" but maybe add a "why" such as "the old versions 
> > are
> > considered unsupported on the newer server".
> > 
> > I dislike "usually via operating system commands", just offload this to the
> > extension, i.e., "must be installed in the new cluster via installation
> > procedures specific to, and documented by, each extension (for contrib it is
> > usually enough to ensure the -contrib package was chosen to be installed 
> > along
> > with the -server package for your operating system.)"
> > 
> > I would simplify the first two sentences to just:
> > 
> > If the old cluster used extensions those same extensions must be installed 
> > in
> > the new cluster via installation procedures specific to, and documented by,
> > each extension.  For contrib extensions it is usually enough to install the
> > -contrib package via the same method you used to install the PostgreSQL 
> > server.

Oh, and you can't use the same installation procedures as when you
installed the extension because that probably included CREATE EXTENSION.
This really highlights why this is tricky to explain --- we need the
binaries, but not the SQL that goes with it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: amcheck verification for GiST and GIN

2021-07-29 Thread Nikolay Samokhvalov
Hello,

First of all, thank you all -- Andrey, Peter, Heikki and others -- for this
work, GIN support in amcheck is *really* needed, especially for OS upgrades
such as from Ubuntu 16.04 (which is EOL now) to 18.04 or 20.04

I was trying to check a bunch of GINs on some production after switching
from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
16.04 first (that is still used on prod for that DB), without any OS/glibc
changes.

On 16.04, I still saw errors and it was not really expected because this
should mean that production is corrupted too. So, REINDEX should fix it.
But it didn't -- see output below. I cannot give data and thinking how to
create a synthetic demo of this. Any suggestions?

And is this a sign that the tool is wrong rather that we have a real
corruption cases? (I assume if we did, we would see no errors after
REINDEXing -- of course, if GIN itself doesn't have bugs).

Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
slightly adjusted to work with PG12 (
https://gitlab.com/postgres/postgres/-/merge_requests/5) snippet used to
run amcheck:
https://gitlab.com/-/snippets/2001962 (see file #3)

Before reindex:


INFO:  [2021-07-29 17:44:42.525+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 117935; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 65754, offset
232


test=# reindex index index_XXX_trigram;

REINDEX



After REINDEX:


INFO:  [2021-07-29 18:01:23.339+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 70100; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 70048, offset
253



On Thu, Jul 15, 2021 at 00:03 Heikki Linnakangas  wrote:

> On 07/08/2020 00:33, Peter Geoghegan wrote:
> > On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko 
> wrote:
> >> Here is the patch which I (with Andrey as my advisor) built on the top
> of the last patch from this thread:
> https://commitfest.postgresql.org/25/1800/ .
> >> It adds an ability to verify validity  of GIN index. It is not polished
> yet, but it works and we wanted to show it to you so you can give us some
> feedback, and also let you know about this work if you have any plans of
> writing something like that yourselves, so that you do not redo what is
> already done.
> >
> > Can you rebase this patch, please?
> >
> > Also suggest breaking out the series into distinct patch files using
> > "git format-patch master".
>
> I rebased the GIN parts of this patch, see attached. I also ran pgindent
> and made some other tiny cosmetic fixes, but I didn't review the patch,
> only rebased it in the state it was.
>
> I was hoping that this would be useful to track down the bug we're
> discussing here:
>
> https://www.postgresql.org/message-id/CAJYBUS8aBQQL22oHsAwjHdwYfdB_NMzt7-sZxhxiOdEdn7cOkw%40mail.gmail.com.
>
> But now that I look what checks this performs, I doubt this will catch
> the kind of corruption that's happened there. I suspect it's more subtle
> than an inconsistencies between parent and child pages, because only a
> few rows are affected. But doesn't hurt to try.
>
> - Heikki
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 10:00:12AM -0700, David G. Johnston wrote:
> I'm warming up to "should" but maybe add a "why" such as "the old versions are
> considered unsupported on the newer server".
> 
> I dislike "usually via operating system commands", just offload this to the
> extension, i.e., "must be installed in the new cluster via installation
> procedures specific to, and documented by, each extension (for contrib it is
> usually enough to ensure the -contrib package was chosen to be installed along
> with the -server package for your operating system.)"
> 
> I would simplify the first two sentences to just:
> 
> If the old cluster used extensions those same extensions must be installed in
> the new cluster via installation procedures specific to, and documented by,
> each extension.  For contrib extensions it is usually enough to install the
> -contrib package via the same method you used to install the PostgreSQL 
> server.
> 
> I would consider my suggestion for "copy as-is/alter extension" wording in my
> previous email in lieu of the existing third and fourth sentences, with the
> "should" and "why" wording possibly worked in.  But the existing works ok.

I am sorry but none of your suggestions are exciting me --- they seem to
get into too much detail for the context.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Alvaro Herrera
On 2021-Jul-29, Dave Cramer wrote:

> > I suggest "can be upgraded" rather than "should be upgraded" because
> > we're not making a recommendation, merely stating the fact that it is
> > possible to do so.
>
> Why not recommend it? I was going to suggest that we actually run alter
> extension upgrade ... on all of them as a solution.
> 
> What are the downsides to upgrading them all by default ? AFAIK if they
> need upgrading this should run all of the SQL scripts, if they don't then
> this should be a NO-OP.

I'm not aware of any downsides, and I think it would be a good idea to
do so, but I also think that it would be good to sort out the docs
precisely (a backpatchable doc change, IMV) and once that is done we can
discuss how to improve pg_upgrade so that users no longer need to do
that (a non-backpatchable code change).  Incremental improvements and
all that ...

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Jan Wieck

On 7/29/21 2:04 PM, Julien Rouhaud wrote:

On Thu, Jul 29, 2021 at 11:46:12AM -0400, Jan Wieck wrote:



> I don't have a ready example of such an extension, but if we ever would
> convert the backend parts of Slony into an extension, it would be one.



FWIW I have an example of such an extension: powa-archivist extension
script runs an anonymous block code and creates if needed a custom
wrapper to emulate the current_setting(text, boolean) variant that
doesn't exist on pre-pg96 servers.



Thank you!

I presume that pg_upgrade on a database with that extension installed 
would silently succeed and have the pg_catalog as well as public (or 
wherever) version of that function present.



Regards, Jan

--
Jan Wieck




Re: Numeric x^y for negative x

2021-07-29 Thread Dean Rasheed
On Thu, 22 Jul 2021 at 16:19, Dean Rasheed  wrote:
>
> On Thu, 22 Jul 2021 at 06:13, Yugo NAGATA  wrote:
> >
> > Thank you for updating the patch. I am fine with the additional comments.
> > I don't think there is any other problem left, so I marked it 
> > Ready-for-Committers.
>
> Thanks for looking. Barring any further comments, I'll push this in a few 
> days.
>

So I have been testing this a lot over the last couple of days, and I
have concluded that the patch works well as far as it goes, but I did
manage to construct another case where numeric_power() loses
precision. I think, though, that it would be better to tackle that as
a separate patch.

In writing the commit message for this patch, I realised that it was
possible to tidy up the local_rscale calculation part of it a little,
to make it more obvious what was going wrong, so attached is a
slightly tweaked version. I'll hold off on committing it for a few
more days in case anyone else wants to have a look. Tom?

The other issue I found is related to the first part of power_var(),
where it does a low-precision calculation to get an estimate for the
weight of the result. It occurred to me that for certain input bases,
that calculation could be made to be quite inaccurate, and therefore
lead to choosing the wrong rscale later. This is the test case I
constructed:

  (1-1.5123456789012345678e-1000) ^ 1.15e1003

Here, the base is a sliver under 1, and so ln(base) is approximately
-1.5e-1000, and ln_dweight is -1000 (the decimal weight of ln(base)).
The problem is that the local_rscale used for the first low-precision
calculation is limited to NUMERIC_MAX_DISPLAY_SCALE (which is 1000),
so we only compute ln_base to a scale of 1000 at that stage, and the
result is rounded to exactly 2e-1000, which is off by a factor of
around 1.33.

That makes it think the result weight will be -998, when actually it's
-755, so it then chooses a local_rscale for the full calculation
that's far too small, and the result is very inaccurate.

To fix this, I think it's necessary to remove the line that limits the
initial local_rscale. I tried that in a debugger, and managed to get a
result that agreed with the result from "bc -l" with a scale of 2000.

The reason I think it will be OK to remove that line is that it only
ever comes into play when ln_dweight is a large negative number (and
the smallest it can be is -16383). But that can only happen in
instances like this, where the base is very very close to 1. In such
cases, the ln(base) calculation is very fast, because it basically
only has to do a couple of Taylor series terms, and it's done. This
will still only be a low-precision estimate of the result (about 8
significant digits, shifted down a long way).

It might also be necessary to re-think the choice of local_rscale for
the mul_var() that follows. If the weight of exp is much larger than
the weight of ln_base (or vice versa), it doesn't really make sense to
compute the product to the same local_rscale. That might be a source
of other inaccuracies. I'll try to investigate some more.

Anyway, I don't think any of that should get in the way of committing
the current patch.

Regards,
Dean
From 071d877af3ffcb8e2abd445cb5963d90a2e766bc Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Thu, 29 Jul 2021 17:26:08 +0100
Subject: [PATCH] Fix corner-case errors and loss of precision in
 numeric_power().

This fixes a couple of related problems that arise when raising
numbers to very large powers.

Firstly, when raising a negative number to a very large integer power,
the result should be well-defined, but the previous code would only
cope if the exponent was small enough to go through power_var_int().
Otherwise it would throw an internal error, attempting to take the
logarithm of a negative number. Fix this by adding suitable handling
to the general case in power_var() to cope with negative bases,
checking for integer powers there.

Next, when raising a (positive or negative) number whose absolute
value is slightly less than 1 to a very large power, the result should
approach zero as the power is increased. However, in some cases, for
sufficiently large powers, this would lose all precision and return 1
instead. This was due to the way that the local_rscale was being
calculated for the final full-precision calculation:

  local_rscale = rscale + (int) val - ln_dweight + 8

The first two terms on the right hand side are meant to give the
number of significant digits required in the result ("val" being the
estimated result weight). However, this failed to account for the fact
that rscale is clipped to a maximum of NUMERIC_MAX_DISPLAY_SCALE
(1000), and the result weight might be less then -1000, causing their
sum to be negative, leading to a loss of precision. Fix this by
forcing the number of significant digits to be nonnegative. It's OK
for it to be zero (when the result weight is less than -1000), since
the local_rscale value then includes a few extra digits to ensure an

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
Dave Cramer


On Thu, 29 Jul 2021 at 13:43, Alvaro Herrera 
wrote:

> On 2021-Jul-29, Bruce Momjian wrote:
>
> > + If the old cluster used extensions, whether from
> > + contrib or some other source, it used
> > + shared object files (or DLLs) to implement these extensions, e.g.,
> > + pgcrypto.so.  Now, shared object files
> matching
> > + the new server binary must be installed in the new cluster, usually
> > + via operating system commands.  Do not load the schema definitions,
> > + e.g., CREATE EXTENSION pgcrypto, because these
> > + will be copied from the old cluster.  (Extensions should be
> upgraded
> > + later using ALTER EXTENSION ... UPGRADE.)
>
> I propose this:
>
> 
>   If the old cluster used shared-object files (or DLLs) for extensions
>   or other loadable modules, install recompiled versions of those files
>   onto the new cluster.
>   Do not install the extension themselves (i.e., do not run
>   CREATE EXTENSION), because extension definitions
>   will be carried forward from the old cluster.
> 
>
> 
>   Extensions can be upgraded after pg_upgrade completes using
>   ALTER EXTENSION ... UPGRADE, on a per-database
>   basis.
> 
>
> I suggest " ... for extensions or other loadable modules" because
> loadable modules aren't necessarily for extensions.  Also, it's
> perfectly possible to have extension that don't have a loadable module.
>
> I suggest "extension definitions ... carried forward" instead of
> "extensions ... copied" (your proposed text) to avoid the idea that
> files are copied; use it instead of "schema definitions ... upgraded"
> (the current docs) to avoid the idea that they are actually upgraded;
> also, "schema definition" seems a misleading term to use here.
>

I like "carried forward", however it presumes quite a bit of knowledge of
what is going on inside pg_upgrade.
That said I don't have a better option short of explaining the whole thing
which is clearly out of scope.

>
> I suggest "can be upgraded" rather than "should be upgraded" because
> we're not making a recommendation, merely stating the fact that it is
> possible to do so.
>
> Why not recommend it? I was going to suggest that we actually run alter
extension upgrade ... on all of them as a solution.

What are the downsides to upgrading them all by default ? AFAIK if they
need upgrading this should run all of the SQL scripts, if they don't then
this should be a NO-OP.

Dave


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Julien Rouhaud
On Fri, Jul 30, 2021 at 12:14 AM Bruce Momjian  wrote:
>
> On Thu, Jul 29, 2021 at 11:46:12AM -0400, Jan Wieck wrote:
> >
> > This assumes that the scripts executed during CREATE EXTENSION have no
> > conditional code in them that depends on the server version. Running the
> > same SQL script on different server versions can have different effects.
> >
> > I don't have a ready example of such an extension, but if we ever would
> > convert the backend parts of Slony into an extension, it would be one.
>
> The bottom line is that we have _no_ mechanism to handle this except
> uninstalling the extension before the upgrade and re-installing it
> afterward, which isn't clearly spelled out for each extension, as far as
> I know, and would not work for extensions that create data types.
>
> Yes, I do feel this is a big hold in our upgrade instructions.

FWIW I have an example of such an extension: powa-archivist extension
script runs an anonymous block code and creates if needed a custom
wrapper to emulate the current_setting(text, boolean) variant that
doesn't exist on pre-pg96 servers.




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Alvaro Herrera
On 2021-Jul-29, Bruce Momjian wrote:

> + If the old cluster used extensions, whether from
> + contrib or some other source, it used
> + shared object files (or DLLs) to implement these extensions, e.g.,
> + pgcrypto.so.  Now, shared object files matching
> + the new server binary must be installed in the new cluster, usually
> + via operating system commands.  Do not load the schema definitions,
> + e.g., CREATE EXTENSION pgcrypto, because these
> + will be copied from the old cluster.  (Extensions should be upgraded
> + later using ALTER EXTENSION ... UPGRADE.)

I propose this:


  If the old cluster used shared-object files (or DLLs) for extensions
  or other loadable modules, install recompiled versions of those files
  onto the new cluster.
  Do not install the extension themselves (i.e., do not run
  CREATE EXTENSION), because extension definitions
  will be carried forward from the old cluster.



  Extensions can be upgraded after pg_upgrade completes using
  ALTER EXTENSION ... UPGRADE, on a per-database
  basis.


I suggest " ... for extensions or other loadable modules" because
loadable modules aren't necessarily for extensions.  Also, it's
perfectly possible to have extension that don't have a loadable module.

I suggest "extension definitions ... carried forward" instead of
"extensions ... copied" (your proposed text) to avoid the idea that
files are copied; use it instead of "schema definitions ... upgraded"
(the current docs) to avoid the idea that they are actually upgraded;
also, "schema definition" seems a misleading term to use here.

I suggest "can be upgraded" rather than "should be upgraded" because
we're not making a recommendation, merely stating the fact that it is
possible to do so.

I suggest using the imperative mood, to be consistent with the
surrounding text.  (Applies to the first para; the second para is
informative.)


I haven't seen it mentioned in the thread, but I think the final phrase
of this  should be a separate step,


 Copy custom full-text search files
 
  Copy any custom full text search file (dictionary, synonym, thesaurus,
  stop word list) to the new server.
 


While this is closely related to extensions, it's completely different.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)




Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 5:11 AM Masahiko Sawada  wrote:
> Indeed. Given that the radix tree itself has other use cases, I have
> no concern about using radix tree for vacuum's dead tuples storage. It
> will be better to have one that can be generally used and has some
> optimizations that are helpful also for vacuum's use case, rather than
> having one that is very optimized only for vacuum's use case.

What I'm about to say might be a really stupid idea, especially since
I haven't looked at any of the code already posted, but what I'm
wondering about is whether we need a full radix tree or maybe just a
radix-like lookup aid. For example, suppose that for a relation <= 8MB
in size, we create an array of 1024 elements indexed by block number.
Each element of the array stores an offset into the dead TID array.
When you need to probe for a TID, you look up blkno and blkno + 1 in
the array and then bsearch only between those two offsets. For bigger
relations, a two or three level structure could be built, or it could
always be 3 levels. This could even be done on demand, so you
initialize all of the elements to some special value that means "not
computed yet" and then fill them the first time they're needed,
perhaps with another special value that means "no TIDs in that block".

I don't know if this is better, but I do kind of like the fact that
the basic representation is just an array. It makes it really easy to
predict how much memory will be needed for a given number of dead
TIDs, and it's very DSM-friendly as well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 13:13, Alvaro Herrera 
wrote:

> On 2021-Jul-29, Dave Cramer wrote:
>
> > > If the old cluster used extensions those same extensions must be
> > > installed in the new cluster via installation procedures specific
> > > to, and documented by, each extension.  For contrib extensions it is
> > > usually enough to install the -contrib package via the same method
> > > you used to install the PostgreSQL server.
> >
> > Well this is not strictly true. There are many extensions that would
> > work just fine with the current pg_upgrade. It may not even be
> > necessary to recompile them.
>
> It is always necessary to recompile because of the PG_MODULE_MAGIC
> declaration, which is mandatory and contains a check that the major
> version matches.  Copying the original shared object never works.
>

Thx, I knew I was on shaky ground with that last statement :)

Dave

>
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Alvaro Herrera
On 2021-Jul-29, Dave Cramer wrote:

> > If the old cluster used extensions those same extensions must be
> > installed in the new cluster via installation procedures specific
> > to, and documented by, each extension.  For contrib extensions it is
> > usually enough to install the -contrib package via the same method
> > you used to install the PostgreSQL server.
> 
> Well this is not strictly true. There are many extensions that would
> work just fine with the current pg_upgrade. It may not even be
> necessary to recompile them.

It is always necessary to recompile because of the PG_MODULE_MAGIC
declaration, which is mandatory and contains a check that the major
version matches.  Copying the original shared object never works.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"But static content is just dynamic content that isn't moving!"
http://smylers.hates-software.com/2007/08/15/fe244d0c.html




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
>
>
>
> I would simplify the first two sentences to just:
>
> If the old cluster used extensions those same extensions must be installed
> in the new cluster via installation procedures specific to, and documented
> by, each extension.  For contrib extensions it is usually enough to install
> the -contrib package via the same method you used to install the PostgreSQL
> server.
>

Well this is not strictly true. There are many extensions that would work
just fine with the current pg_upgrade. It may not even be necessary to
recompile them.

Dave


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:55 AM Jan Wieck  wrote:

> How exactly do you envision that we, the PostgreSQL project, make sure
> that extension developers provide those mechanisms in the future?
>
>
I have no suggestions here, and don't really plan to get deeply involved in
this area of the project anytime soon.  But it is definitely a topic worthy
of discussion on a new thread.

David J.


Re: Fix around conn_duration in pgbench

2021-07-29 Thread Fujii Masao




On 2021/07/29 2:23, Fujii Masao wrote:



On 2021/07/28 16:15, Yugo NAGATA wrote:

I found another disconnect_all().

/* XXX should this be connection time? */
disconnect_all(state, nclients);

The measurement is also not necessary here.
So the above comment should be removed or updated?


I think this disconnect_all will be a no-op because all connections should
be already closed in threadRun(), but I left it just to be sure that
connections are all cleaned-up. I updated the comment for explaining above.

I attached the updated patch. Could you please look at this?


Thanks for updating the patch! LGTM.


This patch needs to be back-patched because it fixes the bug
in measurement of disconnection delays. Thought?

But the patch fails to be back-patched to v13 or before because
pgbench's time logic was changed by commit 547f04e734.
Do you have the patches that can be back-patched to v13 or before?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:34 AM Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 12:22:59PM -0400, Dave Cramer wrote:
> > On Thu, 29 Jul 2021 at 12:16, Bruce Momjian  wrote:
> > Can you review the text I just posted?  Thanks.   I think we are
> making
> > progress.  ;-)
> >
> >
> > I am OK with Everything except
> >
> > Do not load the schema definitions,
> > e.g., CREATE EXTENSION pgcrypto, because these
> > will be recreated from the old cluster.  (The extensions may be
> > upgraded later using ALTER EXTENSION ... UPGRADE.)
> >
> >  I take issue with the word "recreated". This implies something new is
> created,
> > when in fact the old definitions are simply copied over.
> >
> > As I said earlier; using the wording "may be upgraded" is not nearly
> cautionary
> > enough.
>
> OK, I changed it to "copy" though I used "recreated" earlier since I was
> worried "copy" would be confused with file copy.  I changed the
> recommendation to "should be".
>
>
I'm warming up to "should" but maybe add a "why" such as "the old versions
are considered unsupported on the newer server".

I dislike "usually via operating system commands", just offload this to the
extension, i.e., "must be installed in the new cluster via installation
procedures specific to, and documented by, each extension (for contrib it
is usually enough to ensure the -contrib package was chosen to be installed
along with the -server package for your operating system.)"

I would simplify the first two sentences to just:

If the old cluster used extensions those same extensions must be installed
in the new cluster via installation procedures specific to, and documented
by, each extension.  For contrib extensions it is usually enough to install
the -contrib package via the same method you used to install the PostgreSQL
server.

I would consider my suggestion for "copy as-is/alter extension" wording in
my previous email in lieu of the existing third and fourth sentences, with
the "should" and "why" wording possibly worked in.  But the existing works
ok.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Jan Wieck

On 7/29/21 12:44 PM, David G. Johnston wrote:

... - but at present pg_upgrade simply 
doesn't care and hopes that the extension author's documentation deals 
with these possibilities in a sane manner.


pg_upgrade is not able to address this in a guaranteed, consistent 
fashion. At this point there is no way to even make sure that a 3rd 
party extension provides the scripts needed to upgrade from one 
extension version to the next. We don't have the mechanism to upgrade 
the same extension version from one server version to the next, in case 
there are any modifications in place necessary.


How exactly do you envision that we, the PostgreSQL project, make sure 
that extension developers provide those mechanisms in the future?



Regards, Jan

--
Jan Wieck




Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov

Yura Sokolov писал 2021-07-29 18:29:


I've attached IntegerSet2 patch for pgtools repo and benchmark results.
Branch https://github.com/funny-falcon/pgtools/tree/integerset2


Strange web-mail client... I never can be sure what it will attach...

Reattach benchmark results



regards,

Yura Sokolov
y.soko...@postgrespro.ru
funny.fal...@gmail.com
* Test 1
select prepare(
100, -- max block
10, -- # of dead tuples per page
20, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 57.23 MB  |  0.029 |   96.650 | 572.21 MB  | _ | ___
 intset | 46.88 MB  |  0.095 |   82.797 | 468.67 MB  | _ | ___
 radix  | 40.26 MB  |  0.089 |   12.082 | 401.27 MB  | 0.999 | 196.490
 rtbm   | 64.02 MB  |  0.164 |   14.300 | 512.02 MB  | 1.991 | 189.343
 svtm   | 17.78 MB  |  0.098 |   12.411 | 178.59 MB  | 0.973 | 210.901
 tbm| 96.01 MB  |  0.239 |8.486 | 768.01 MB  | 2.607 | 119.593
 intset2| 17.78 MB  |  0.220 |   19.203 | 175.84 MB  | 2.030 | 320.488

* Test 2
select prepare(
100, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 57.23 MB  |  0.030 |4.737 | 572.21 MB  | _ | ___
 intset | 46.88 MB  |  0.096 |4.077 | 468.67 MB  | _ | ___
 radix  | 9.95 MB   |  0.051 |0.752 | 98.38 MB   | 0.531 |  18.642
 rtbm   | 34.02 MB  |  0.166 |0.424 | 288.02 MB  | 2.036 |   6.294
 svtm   | 5.78 MB   |  0.037 |0.203 | 54.60 MB   | 0.400 |   5.682
 tbm| 96.01 MB  |  0.245 |0.423 | 768.01 MB  | 2.624 |   5.848
 intset2| 6.27 MB   |  0.097 |0.414 | 61.29 MB   | 1.002 |  12.823

* Test 3
select prepare(
100, -- max block
2, -- # of dead tuples per page
100, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 11.45 MB  |  0.006 |   53.623 | 114.45 MB  | _ | ___
 intset | 15.63 MB  |  0.024 |   56.291 | 156.23 MB  | _ | ___
 radix  | 40.26 MB  |  0.055 |   10.928 | 401.27 MB  | 0.622 | 173.953
 rtbm   | 36.02 MB  |  0.116 |8.452 | 320.02 MB  | 1.370 | 125.438
 svtm   |  9.28 MB  |  0.069 |6.780 | 102.10 MB  | 0.690 | 167.220
 tbm| 96.01 MB  |  0.188 |8.569 | 768.01 MB  | 1.935 | 118.841
 intset2|  4.28 MB  |  0.022 |5.624 |  42.04 MB  | 0.229 | 146.096

* Test 4
select prepare(
100, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 572.21 MB |  0.304 |   76.625 | 5722.05 MB | _ | ___
 intset | 93.74 MB  |  0.795 |   47.977 | 937.34 MB  | _ | ___
 radix  | 40.26 MB  |  0.264 |6.322 | 401.27 MB  | 2.674 | 101.887
 rtbm   | 36.02 MB  |  0.360 |4.195 | 320.02 MB  | 4.009 |  62.441
 svtm   | 7.28 MB   |  0.329 |2.729 | 73.60 MB   | 3.271 |  74.725
 tbm| 96.01 MB  |  0.564 |4.220 | 768.01 MB  | 5.711 |  59.047
 svtm   | 7.28 MB   |  0.908 |5.826 | 70.55 MB   | 9.089 | 137.902


* Test 5
select prepare(
100, -- max block
150, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
2 -- page interval
);

There are 1 consecutive pages that have 150 dead tuples at every
2 pages.

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 429.16 MB |  0.225 |   65.404 | 4291.54 MB | _ | ___
 intset | 46.88 MB  |  0.510 |   39.460 | 468.67 MB  | _ | ___
 radix  | 20.26 MB  |  0.192 |6.440 | 201.48 MB  | 1.954 | 111.539
 rtbm   | 18.02 MB  |  0.225 |7.077 | 160.02 MB  | 2.515 | 120.816
 svtm   | 3.66 MB   |  0.233 |3.493 | 37.10 MB   | 2.348 |  73.092
 tbm| 48.01 MB  |  0.341 |9.146 | 384.01 MB  | 3.544 | 161.435
 intset2| 3.78 MB   |  0.671 |5.081 | 35.29 MB   | 6.791 | 117.617

* Test 6
select prepare(
100, -- max block
10, -- # of

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:28 AM Jan Wieck  wrote:

> On 7/29/21 12:00 PM, David G. Johnston wrote:
> > Ok, looking at the flow again, where exactly would the user even be able
> > to execute "CREATE EXTENSION" meaningfully?  The relevant databases do
> > not exist (not totally sure what happens to the postgres database
> > created during the initdb step...) so at the point where the user is
> > "installing the extension" all they can reasonably do is a server-level
> > install (they could maybe create extension in the postgres database, but
> > does that even matter?).
> >
> > So, I'd propose simplifying this all to something like:
> >
> > Install extensions on the new server
>
> Extensions are not installed on the server level. Their binary
> components (shared objects) are, but the actual catalog modifications
> that make them accessible are performed per database by CREATE
> EXTENSION, which executes the SQL files associated with the extension.
> And they can be performed differently per database, like for example
> placing one and the same extension into different schemas in different
> databases.
>
> pg_upgrade is not (and should not be) concerned with placing the
> extension's installation components into the new version's lib and share
> directories. But it is pg_upgrade's job to perform the correct catalog
> modification per database during the upgrade.
>

That is exactly the point I am making.  The section is informing the user
of things to do that the server will not do.  Which is "install extension
code into the O/S" and that mentioning CREATE EXTENSION at this point in
the process is talking about something that is simply out-of-scope.



> > Any extensions that are used by the old cluster need to be installed
> > into the new cluster.  Each database in the old cluster will have its
> > current version of all extensions migrated to the new cluster as-is.
> > You can use the ALTER EXTENSION command, on a per-database basis, to
> > update its extensions post-upgrade.
>
> That assumes that the extension SQL files are capable of detecting a
> server version change and perform the necessary (if any) steps to alter
> the extension's objects accordingly.
>
> Off the top of my head I don't remember what happens when one executes
> ALTER EXTENSION ... UPGRADE ... when it is already on the latest version
> *of the extension*. Might be an error or a no-op.
>
> And to make matters worse, it is not possible to work around this with a
> DROP EXTENSION ... CREATE EXTENSION. There are extensions that create
> objects, like user defined data types and functions, that will be
> referenced by end user objects like tables and views.
>
>
These are all excellent points - but at present pg_upgrade simply doesn't
care and hopes that the extension author's documentation deals with these
possibilities in a sane manner.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 12:22:59PM -0400, Dave Cramer wrote:
> On Thu, 29 Jul 2021 at 12:16, Bruce Momjian  wrote:
> Can you review the text I just posted?  Thanks.   I think we are making
> progress.  ;-)
> 
> 
> I am OK with Everything except
> 
> Do not load the schema definitions,
> e.g., CREATE EXTENSION pgcrypto, because these
> will be recreated from the old cluster.  (The extensions may be
> upgraded later using ALTER EXTENSION ... UPGRADE.)
> 
>  I take issue with the word "recreated". This implies something new is 
> created,
> when in fact the old definitions are simply copied over.
> 
> As I said earlier; using the wording "may be upgraded" is not nearly 
> cautionary
> enough.

OK, I changed it to "copy" though I used "recreated" earlier since I was
worried "copy" would be confused with file copy.  I changed the
recommendation to "should be".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..0e69f26628 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -299,15 +299,18 @@ make prefix=/usr/local/pgsql.new install

 

-Install custom shared object files
+Install extension shared object files
 
 
- Install any custom shared object files (or DLLs) used by the old cluster
- into the new cluster, e.g., pgcrypto.so,
- whether they are from contrib
- or some other source. Do not install the schema definitions, e.g.,
- CREATE EXTENSION pgcrypto, because these will be upgraded
- from the old cluster.
+ If the old cluster used extensions, whether from
+ contrib or some other source, it used
+ shared object files (or DLLs) to implement these extensions, e.g.,
+ pgcrypto.so.  Now, shared object files matching
+ the new server binary must be installed in the new cluster, usually
+ via operating system commands.  Do not load the schema definitions,
+ e.g., CREATE EXTENSION pgcrypto, because these
+ will be copied from the old cluster.  (Extensions should be upgraded
+ later using ALTER EXTENSION ... UPGRADE.)
  Also, any custom full text search files (dictionary, synonym,
  thesaurus, stop words) must also be copied to the new cluster.
 
@@ -494,10 +497,10 @@ pg_upgrade.exe
  
 
  
-  Install custom shared object files
+  Install extension shared object files
 
   
-   Install the same custom shared object files on the new standbys
+   Install the same extension shared object files on the new standbys
that you installed in the new primary cluster.
   
  


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Jan Wieck

On 7/29/21 12:00 PM, David G. Johnston wrote:
Ok, looking at the flow again, where exactly would the user even be able 
to execute "CREATE EXTENSION" meaningfully?  The relevant databases do 
not exist (not totally sure what happens to the postgres database 
created during the initdb step...) so at the point where the user is 
"installing the extension" all they can reasonably do is a server-level 
install (they could maybe create extension in the postgres database, but 
does that even matter?).


So, I'd propose simplifying this all to something like:

Install extensions on the new server


Extensions are not installed on the server level. Their binary 
components (shared objects) are, but the actual catalog modifications 
that make them accessible are performed per database by CREATE 
EXTENSION, which executes the SQL files associated with the extension. 
And they can be performed differently per database, like for example 
placing one and the same extension into different schemas in different 
databases.


pg_upgrade is not (and should not be) concerned with placing the 
extension's installation components into the new version's lib and share 
directories. But it is pg_upgrade's job to perform the correct catalog 
modification per database during the upgrade.


Any extensions that are used by the old cluster need to be installed 
into the new cluster.  Each database in the old cluster will have its 
current version of all extensions migrated to the new cluster as-is.  
You can use the ALTER EXTENSION command, on a per-database basis, to 
update its extensions post-upgrade.


That assumes that the extension SQL files are capable of detecting a 
server version change and perform the necessary (if any) steps to alter 
the extension's objects accordingly.


Off the top of my head I don't remember what happens when one executes 
ALTER EXTENSION ... UPGRADE ... when it is already on the latest version 
*of the extension*. Might be an error or a no-op.


And to make matters worse, it is not possible to work around this with a 
DROP EXTENSION ... CREATE EXTENSION. There are extensions that create 
objects, like user defined data types and functions, that will be 
referenced by end user objects like tables and views.



Regards, Jan

--
Jan Wieck




Re: fixing pg_basebackup tests for modern Windows/msys2

2021-07-29 Thread Andrew Dunstan


On 7/28/21 9:31 AM, Andrew Dunstan wrote:
> While looking into issues with fairywren and pg_basebackup tests, I
> created a similar environment but with more modern Windows / msys2.
> Before it even got to the test that failed on fairywren it failed the
> first TAP test for a variety of reasons, all connected to
> TestLib::perl2host.
>
> First, this function is in some cases returning paths for directories
> with trailing slashes and or embedded double slashes.  Both of these can
> cause problems, especially when written to a tablespace map file. Also,
> the cygpath invocation is returning a path with backslashes whereas "pwd
> -W' returns a path with forward slashes.
>
> So the first attached patch rectifies these problems. It fixes issues
> with doubles and trailing slashes and makes cygpath return a path with
> forward slashes just like the non-cygpath branch.
>
> However, there is another problem, which is that if called on a path
> that includes a symlink, on the test platform I set up it actually
> resolves that link rather than just following it. The end result is that
> the use of a shorter path via a symlink is effectively defeated. I
> haven't found any way to stop this behaviour.
>
> The second patch therefore adjusts the test to avoid calling perl2host
> on such a path. It just calls perl2host on the symlink's parent, and
> thereafter uses that result.
>
>


I've pushed these in master and REL_14_STABLE.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
Dave Cramer


On Thu, 29 Jul 2021 at 12:16, Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 09:00:36AM -0700, David G. Johnston wrote:
> > On Thu, Jul 29, 2021 at 8:42 AM Bruce Momjian  wrote:
> >
> > On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> > >
> > >
> > > I have an issue with the fragment "whether they are from
> contrib" -
> > my
> > > understanding at this point is that because of the way we
> package and
> > > version contrib it should not be necessary to copy those shared
> > object
> > > files from the old to the new server (maybe, just maybe, with a
> > > qualification that you are upgrading between two versions that
> were
> > in
> > > support during the same time period).
> > >
> > >
> > > Just to clarify. In no case are binaries copied from the old
> server to
> > the new
> > > server. Whether from contrib or otherwise.
> >
> > Right.  Those are _binaries_ and therefore made to match a specific
> > Postgres binary.  They might work or might not, but copying them is
> > never a good idea --- they should be recompiled to match the new
> server
> > binary, even if the extension had no version/API changes.
> >
> >
> > Ok, looking at the flow again, where exactly would the user even be able
> to
> > execute "CREATE EXTENSION" meaningfully?  The relevant databases do not
> exist
> > (not totally sure what happens to the postgres database created during
> the
> > initdb step...) so at the point where the user is "installing the
> extension"
> > all they can reasonably do is a server-level install (they could maybe
> create
> > extension in the postgres database, but does that even matter?).
>
> They could technically start the new cluster and use "CREATE EXTENSION"
> before the upgrade, and then the ugprade would fail since there would be
> duplicate object errors.
>
> > So, I'd propose simplifying this all to something like:
> >
> > Install extensions on the new server
> >
> > Any extensions that are used by the old cluster need to be installed
> into the
> > new cluster.  Each database in the old cluster will have its current
> version of
> > all extensions migrated to the new cluster as-is.  You can use the ALTER
> > EXTENSION command, on a per-database basis, to update its extensions
> > post-upgrade.
>
> Can you review the text I just posted?  Thanks.   I think we are making
> progress.  ;-)
>

I am OK with Everything except

Do not load the schema definitions,
e.g., CREATE EXTENSION pgcrypto, because these
will be recreated from the old cluster.  (The extensions may be
upgraded later using ALTER EXTENSION ... UPGRADE.)

 I take issue with the word "recreated". This implies something new is
created, when in fact the old definitions are simply copied over.

As I said earlier; using the wording "may be upgraded" is not nearly
cautionary enough.

Dave


Re: needless complexity in StartupXLOG

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 1:47 AM Amul Sul  wrote:
> Can we have an elog() fatal error or warning to make sure that the
> last checkpoint is still readable? Since the case where the user
> (knowingly or unknowingly) or some buggy code has removed the WAL file
> containing the last checkpoint could be possible. If it is then we
> would have a hard time finding out when we get further unexpected
> behavior due to this. Thoughts?

Sure, we could, but I don't think we should. Such crazy things can
happen any time, not just at the point where this check is happening.
It's not particularly more likely to happen here vs. any other place
where we could insert a check. Should we check everywhere, all the
time, just in case?

> > I realize that conservatism may have played a role in this code ending
> > up looking the way that it does; someone seems to have thought it
> > would be better not to rely on a new idea in all cases. From my point
> > of view, though, it's scary to have so many cases, especially cases
> > that don't seem like they should ever be reached. I think that
> > simplifying the logic here and trying to do the same things in as many
> > cases as we can will lead to better robustness. Imagine if instead of
> > all the hairy logic we have now we just replaced this whole if
> > (IsInRecovery) stanza with this:
> >
> > if (InRecovery)
> > CreateEndOfRecoveryRecord();
>
> +1, and do the checkpoint at the end unconditionally as we are doing
> for the promotion.

Yeah, that was my thought, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: when the startup process doesn't (logging startup delays)

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav
 wrote:
> Thanks for sharing the information. I have done the necessary changes
> to show the logs during the latter case (postgres --single) and
> verified the log messages.

Thanks. Can you please have a look at what I suggested down toward the
bottom of 
http://postgr.es/m/ca+tgmoap2wefsktmcgwt9lxuz7y99hnduyshpk7qnfuqb98...@mail.gmail.com
?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Patch] ALTER SYSTEM READ ONLY

2021-07-29 Thread Robert Haas
On Wed, Jul 28, 2021 at 7:33 AM Amul Sul  wrote:
> I was too worried about how I could miss that & after thinking more
> about that, I realized that the operation for ArchiveRecoveryRequested
> is never going to be skipped in the startup process and that never
> left for the checkpoint process to do that later. That is the reason
> that assert was added there.
>
> When ArchiveRecoveryRequested, the server will no longer be in
> the wal prohibited mode, we implicitly change the state to
> wal-permitted. Here is the snip from the 0003 patch:

Ugh, OK. That makes sense, but I'm still not sure that I like it. I've
kind of been wondering: why not have XLogAcceptWrites() be the
responsibility of the checkpointer all the time, in every case? That
would require fixing some more things, and this is one of them, but
then it would be consistent, which means that any bugs would be likely
to get found and fixed. If calling XLogAcceptWrites() from the
checkpointer is some funny case that only happens when the system
crashes while WAL is prohibited, then we might fail to notice that we
have a bug.

This is especially true given that we have very little test coverage
in this area. Andres was ranting to me about this earlier this week,
and I wasn't sure he was right, but then I noticed that we have
exactly zero tests in the entire source tree that make use of
recovery_end_command. We really need a TAP test for that, I think.
It's too scary to do much reorganization of the code without having
any tests at all for the stuff we're moving around. Likewise, we're
going to need TAP tests for the stuff that is specific to this patch.
For example, we should have a test that crashes the server while it's
read only, brings it back up, checks that we still can't write WAL,
then re-enables WAL, and checks that we now can write WAL. There are
probably a bunch of other things that we should test, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 09:00:36AM -0700, David G. Johnston wrote:
> On Thu, Jul 29, 2021 at 8:42 AM Bruce Momjian  wrote:
> 
> On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> >
> >
> >     I have an issue with the fragment "whether they are from contrib" -
> my
> >     understanding at this point is that because of the way we package 
> and
> >     version contrib it should not be necessary to copy those shared
> object
> >     files from the old to the new server (maybe, just maybe, with a
> >     qualification that you are upgrading between two versions that were
> in
> >     support during the same time period).
> >
> >
> > Just to clarify. In no case are binaries copied from the old server to
> the new
> > server. Whether from contrib or otherwise.
> 
> Right.  Those are _binaries_ and therefore made to match a specific
> Postgres binary.  They might work or might not, but copying them is
> never a good idea --- they should be recompiled to match the new server
> binary, even if the extension had no version/API changes.
> 
> 
> Ok, looking at the flow again, where exactly would the user even be able to
> execute "CREATE EXTENSION" meaningfully?  The relevant databases do not exist
> (not totally sure what happens to the postgres database created during the
> initdb step...) so at the point where the user is "installing the extension"
> all they can reasonably do is a server-level install (they could maybe create
> extension in the postgres database, but does that even matter?).

They could technically start the new cluster and use "CREATE EXTENSION"
before the upgrade, and then the ugprade would fail since there would be
duplicate object errors.

> So, I'd propose simplifying this all to something like:
> 
> Install extensions on the new server
> 
> Any extensions that are used by the old cluster need to be installed into the
> new cluster.  Each database in the old cluster will have its current version 
> of
> all extensions migrated to the new cluster as-is.  You can use the ALTER
> EXTENSION command, on a per-database basis, to update its extensions
> post-upgrade.

Can you review the text I just posted?  Thanks.   I think we are making
progress.  ;-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 11:46:12AM -0400, Jan Wieck wrote:
> On 7/29/21 11:10 AM, Bruce Momjian wrote:
> > On Thu, Jul 29, 2021 at 11:01:43AM -0400, Dave Cramer wrote:
> > > Much better, however I'm unclear on whether CREATE EXTENSION is actually
> > > executed on the upgraded server.
> > > 
> > > From what I could gather it is not, otherwise the new function definitions
> > > should have been in place. I think what happens is that the function
> > > definitions are copied as part of the
> > > schema and pg_extension is also copied.
> > 
> > Yes, the _effect_ of CREATE EXTENSION in the old cluster is copied to
> > the new cluster as object definitions.  CREATE EXTENSION runs the SQL
> > files associated with the extension.
> > 
> 
> This assumes that the scripts executed during CREATE EXTENSION have no
> conditional code in them that depends on the server version. Running the
> same SQL script on different server versions can have different effects.
> 
> I don't have a ready example of such an extension, but if we ever would
> convert the backend parts of Slony into an extension, it would be one.

The bottom line is that we have _no_ mechanism to handle this except
uninstalling the extension before the upgrade and re-installing it
afterward, which isn't clearly spelled out for each extension, as far as
I know, and would not work for extensions that create data types.

Yes, I do feel this is a big hold in our upgrade instructions.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 08:39:32AM -0700, David G. Johnston wrote:
> On Thu, Jul 29, 2021 at 8:36 AM Dave Cramer  wrote:
> 
> 
> 
> 
> I have an issue with the fragment "whether they are from contrib" - my
> understanding at this point is that because of the way we package and
> version contrib it should not be necessary to copy those shared object
> files from the old to the new server (maybe, just maybe, with a
> qualification that you are upgrading between two versions that were in
> support during the same time period).
> 
> 
> Just to clarify. In no case are binaries copied from the old server to the
> new server. Whether from contrib or otherwise.
>
> I had used "binaries" when I should have written "shared object files".  I 
> just
> imagine a DLL as being a binary file so it seems accurate but we use the term
> differently I suppose?

Uh, technically, the _executable_ binary should only use shared object /
loadable libraries that were compiled against that binary's exported
API.  Sometimes mismatches work (if the API used by the shared object
has not changed in the binary) so people get used to it working, and
then one day it doesn't, but it is never a safe process.

If two people here are confused about this, obviously others will be as
well.  I think we were trying to do too much in that first sentence, so
I split it into two in the attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..7352936a94 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -299,15 +299,18 @@ make prefix=/usr/local/pgsql.new install

 

-Install custom shared object files
+Install extension shared object files
 
 
- Install any custom shared object files (or DLLs) used by the old cluster
- into the new cluster, e.g., pgcrypto.so,
- whether they are from contrib
- or some other source. Do not install the schema definitions, e.g.,
- CREATE EXTENSION pgcrypto, because these will be upgraded
- from the old cluster.
+ If the old cluster used extensions, whether from
+ contrib or some other source, it used
+ shared object files (or DLLs) to implement these extensions, e.g.,
+ pgcrypto.so.  Now, shared object files matching
+ the new server binary must be installed in the new cluster, usually
+ via operating system commands.  Do not load the schema definitions,
+ e.g., CREATE EXTENSION pgcrypto, because these
+ will be recreated from the old cluster.  (The extensions may be
+ upgraded later using ALTER EXTENSION ... UPGRADE.)
  Also, any custom full text search files (dictionary, synonym,
  thesaurus, stop words) must also be copied to the new cluster.
 
@@ -494,10 +497,10 @@ pg_upgrade.exe
  
 
  
-  Install custom shared object files
+  Install extension shared object files
 
   
-   Install the same custom shared object files on the new standbys
+   Install the same extension shared object files on the new standbys
that you installed in the new primary cluster.
   
  


Re: Out-of-memory error reports in libpq

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 9:57 AM Tom Lane  wrote:
> In the case at hand, I'd personally avoid a ternary op for the first
> assignment because then the line would run over 80 characters, and
> you'd have to make decisions about where to break it.  (We don't have
> a standardized convention about that, and none of the alternatives
> look very good to my eye.)

This is exactly why I rarely use ?:

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 8:42 AM Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> >
> >
> > I have an issue with the fragment "whether they are from contrib" -
> my
> > understanding at this point is that because of the way we package and
> > version contrib it should not be necessary to copy those shared
> object
> > files from the old to the new server (maybe, just maybe, with a
> > qualification that you are upgrading between two versions that were
> in
> > support during the same time period).
> >
> >
> > Just to clarify. In no case are binaries copied from the old server to
> the new
> > server. Whether from contrib or otherwise.
>
> Right.  Those are _binaries_ and therefore made to match a specific
> Postgres binary.  They might work or might not, but copying them is
> never a good idea --- they should be recompiled to match the new server
> binary, even if the extension had no version/API changes.
>

Ok, looking at the flow again, where exactly would the user even be able to
execute "CREATE EXTENSION" meaningfully?  The relevant databases do not
exist (not totally sure what happens to the postgres database created
during the initdb step...) so at the point where the user is "installing
the extension" all they can reasonably do is a server-level install (they
could maybe create extension in the postgres database, but does that even
matter?).

So, I'd propose simplifying this all to something like:

Install extensions on the new server

Any extensions that are used by the old cluster need to be installed into
the new cluster.  Each database in the old cluster will have its current
version of all extensions migrated to the new cluster as-is.  You can use
the ALTER EXTENSION command, on a per-database basis, to update its
extensions post-upgrade.

David J.


Re: alter table set TABLE ACCESS METHOD

2021-07-29 Thread Jeff Davis
On Thu, 2021-07-29 at 15:27 +0900, Michael Paquier wrote:
> Doing any checks around the hooks of objectaccess.h is very annoying,
> because we have no modules to check after them easily except sepgsql.
> Anyway, I have been checking that, with the hack-ish module attached
> and tracked down that swap_relation_files() calls
> InvokeObjectPostAlterHookArg() already (as you already spotted?), but
> that's an internal change when it comes to SET LOGGED/UNLOGGED/ACCESS
> METHOD :(
> 
> Attached is a small module I have used for those tests, for
> reference.  It passes on HEAD, and with the patch attached you can
> see
> the extra entries.

I see that ATExecSetTableSpace() also invokes the hook even for a no-
op. Should we do the same thing for setting the AM?

> > > > Also, I agree with Justin that it should fail when there are
> > > > multiple
> > > > SET ACCESS METHOD subcommands consistently, regardless of
> > > > whether
> > > > one
> > > > is a no-op, and it should probably throw a syntax error to
> > > > match
> > > > SET
> > > > TABLESPACE.
> > > 
> > > Hmm.  Okay.
> 
> I'd still disagree with that.

OK, I won't press for a change here.

Regards,
Jeff Davis






Re: needless complexity in StartupXLOG

2021-07-29 Thread Robert Haas
On Wed, Jul 28, 2021 at 3:28 PM Andres Freund  wrote:
> > Imagine if instead of
> > all the hairy logic we have now we just replaced this whole if
> > (IsInRecovery) stanza with this:
> >
> > if (InRecovery)
> > CreateEndOfRecoveryRecord();
> >
> > That would be WAY easier to reason about than the rat's nest we have
> > here today. Now, I am not sure what it would take to get there, but I
> > think that is the direction we ought to be heading.
>
> What are we going to do in the single user ([1]) case in this awesome future?
> I guess we could just not create a checkpoint until single user mode is shut
> down / creates a checkpoint for other reasons?

It probably depends on who writes this thus-far-hypothetical patch,
but my thought is that we'd unconditionally request a checkpoint after
writing the end-of-recovery record, same as we do now if (promoted).
If we happened to be in single-user mode, then that checkpoint request
would turn into performing a checkpoint on the spot in the one and
only process we've got, but StartupXLOG() wouldn't really need to care
what happens under the hood after it called RequestCheckpoint().

> [1] I really wish somebody had the energy to just remove single user and
> bootstrap modes. The degree to which they increase complexity in the rest of
> the system is entirely unreasonable. There's not actually any reason
> bootstrapping can't happen with checkpointer et al running, it's just
> autovacuum that'd need to be blocked.

I don't know whether this is the way forward or not. I think a lot of
the complexity of the current system is incidental rather than
intrinsic. If I were going to work on this, I'd probably working on
trying to tidy up the code and reduce the number of places that need
to care about IsUnderPostmaster and IsPostmasterEnvironment, rather
than trying to get rid of them. I suspect that's a necessary
prerequisite step anyway, and not a trivial effort either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 11:39, David G. Johnston 
wrote:

> On Thu, Jul 29, 2021 at 8:36 AM Dave Cramer  wrote:
>
>>
>>
>> I have an issue with the fragment "whether they are from contrib" - my
>>> understanding at this point is that because of the way we package and
>>> version contrib it should not be necessary to copy those shared object
>>> files from the old to the new server (maybe, just maybe, with a
>>> qualification that you are upgrading between two versions that were in
>>> support during the same time period).
>>>
>>
>> Just to clarify. In no case are binaries copied from the old server to
>> the new server. Whether from contrib or otherwise.
>>
>>
> I had used "binaries" when I should have written "shared object files".  I
> just imagine a DLL as being a binary file so it seems accurate but we use
> the term differently I suppose?
>

No, we are using the same term. pg_upgrade does not copy anything that was
compiled, whether it is called a DLL or otherwise.


Dave

>
> David J.
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Jan Wieck

On 7/29/21 11:10 AM, Bruce Momjian wrote:

On Thu, Jul 29, 2021 at 11:01:43AM -0400, Dave Cramer wrote:

Much better, however I'm unclear on whether CREATE EXTENSION is actually
executed on the upgraded server.

From what I could gather it is not, otherwise the new function definitions
should have been in place. 
I think what happens is that the function definitions are copied as part of the

schema and pg_extension is also copied.


Yes, the _effect_ of CREATE EXTENSION in the old cluster is copied to
the new cluster as object definitions.  CREATE EXTENSION runs the SQL
files associated with the extension.



This assumes that the scripts executed during CREATE EXTENSION have no 
conditional code in them that depends on the server version. Running the 
same SQL script on different server versions can have different effects.


I don't have a ready example of such an extension, but if we ever would 
convert the backend parts of Slony into an extension, it would be one.



Regards, Jan

--
Jan Wieck




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> 
> 
> I have an issue with the fragment "whether they are from contrib" - my
> understanding at this point is that because of the way we package and
> version contrib it should not be necessary to copy those shared object
> files from the old to the new server (maybe, just maybe, with a
> qualification that you are upgrading between two versions that were in
> support during the same time period).
> 
> 
> Just to clarify. In no case are binaries copied from the old server to the new
> server. Whether from contrib or otherwise.

Right.  Those are _binaries_ and therefore made to match a specific
Postgres binary.  They might work or might not, but copying them is
never a good idea --- they should be recompiled to match the new server
binary, even if the extension had no version/API changes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 08:28:12AM -0700, David G. Johnston wrote:
> On Thu, Jul 29, 2021 at 7:56 AM Bruce Momjian  wrote:
> 
> On Wed, Jul 28, 2021 at 09:35:28PM -0700, David G. Johnston wrote:
> > On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:
> >
> >     I came up with the attached patch.
> >
> >
> > Thank you.  It is an improvement but I think more could be done here 
> (not
> > exactly sure what - though removing the "copy binaries for contrib
> modules from
> > the old server" seems like a decent second step.)
> 
> Uh, I don't see that text.
> """
>  5. Install custom shared object files
> 
> Install any custom shared object files (or DLLs) used by the old cluster into
> the new cluster, e.g., pgcrypto.so, whether they are from contrib or some 
> other
> source. Do not install the schema definitions, e.g., CREATE EXTENSION 
> pgcrypto,
> because these will be upgraded from the old cluster. Also, any custom full 
> text
> search files (dictionary, synonym, thesaurus, stop words) must also be copied
> to the new cluster.
> """
> I have an issue with the fragment "whether they are from contrib" - my
> understanding at this point is that because of the way we package and version
> contrib it should not be necessary to copy those shared object files from the
> old to the new server (maybe, just maybe, with a qualification that you are
> upgrading between two versions that were in support during the same time
> period).

OK, so this is the confusion I was talking about.  You are supposed to
install _new_ _versions_ of the extensions that are in the old cluster
to the new cluster.  You are not supposed to _copy_ the files from the
old to new cluster.  I think my new patch makes that clearer, but can it
be improved?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 8:36 AM Dave Cramer  wrote:

>
>
> I have an issue with the fragment "whether they are from contrib" - my
>> understanding at this point is that because of the way we package and
>> version contrib it should not be necessary to copy those shared object
>> files from the old to the new server (maybe, just maybe, with a
>> qualification that you are upgrading between two versions that were in
>> support during the same time period).
>>
>
> Just to clarify. In no case are binaries copied from the old server to the
> new server. Whether from contrib or otherwise.
>
>
I had used "binaries" when I should have written "shared object files".  I
just imagine a DLL as being a binary file so it seems accurate but we use
the term differently I suppose?

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
I have an issue with the fragment "whether they are from contrib" - my
> understanding at this point is that because of the way we package and
> version contrib it should not be necessary to copy those shared object
> files from the old to the new server (maybe, just maybe, with a
> qualification that you are upgrading between two versions that were in
> support during the same time period).
>

Just to clarify. In no case are binaries copied from the old server to the
new server. Whether from contrib or otherwise.

Dave


Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov

Masahiko Sawada писал 2021-07-29 17:29:
On Thu, Jul 29, 2021 at 8:03 PM Yura Sokolov  
wrote:


Masahiko Sawada писал 2021-07-29 12:11:
> On Thu, Jul 29, 2021 at 3:53 AM Andres Freund 
> wrote:
>>
>> Hi,
>>
>> On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote:
>> > Apart from performance and memory usage points of view, we also need
>> > to consider the reusability of the code. When I started this thread, I
>> > thought the best data structure would be the one optimized for
>> > vacuum's dead tuple storage. However, if we can use a data structure
>> > that can also be used in general, we can use it also for other
>> > purposes. Moreover, if it's too optimized for the current TID system
>> > (32 bits block number, 16 bits offset number, maximum block/offset
>> > number, etc.) it may become a blocker for future changes.
>>
>> Indeed.
>>
>>
>> > In that sense, radix tree also seems good since it can also be used in
>> > gist vacuum as a replacement for intset, or a replacement for hash
>> > table for shared buffer as discussed before. Are there any other use
>> > cases?
>>
>> Yes, I think there are. Whenever there is some spatial locality it has
>> a
>> decent chance of winning over a hash table, and it will most of the
>> time
>> win over ordered datastructures like rbtrees (which perform very
>> poorly
>> due to the number of branches and pointer dispatches). There's plenty
>> hashtables, e.g. for caches, locks, etc, in PG that have a medium-high
>> degree of locality, so I'd expect a few potential uses. When adding
>> "tree compression" (i.e. skip inner nodes that have a single incoming
>> &
>> outgoing node) radix trees even can deal quite performantly with
>> variable width keys.
>
> Good point.
>
>>
>> > On the other hand, I’m concerned that radix tree would be an
>> > over-engineering in terms of vacuum's dead tuples storage since the
>> > dead tuple storage is static data and requires only lookup operation,
>> > so if we want to use radix tree as dead tuple storage, I'd like to see
>> > further use cases.
>>
>> I don't think we should rely on the read-only-ness. It seems pretty
>> clear that we'd want parallel dead-tuple scans at a point not too far
>> into the future?
>
> Indeed. Given that the radix tree itself has other use cases, I have
> no concern about using radix tree for vacuum's dead tuples storage. It
> will be better to have one that can be generally used and has some
> optimizations that are helpful also for vacuum's use case, rather than
> having one that is very optimized only for vacuum's use case.

Main portion of svtm that leads to memory saving is compression of 
many
pages at once (CHUNK). It could be combined with radix as a storage 
for

pointers to CHUNKs., bute

For a moment I'm benchmarking IntegerSet replacement based on Trie 
(HATM

like)
and CHUNK compression, therefore datastructure could be used for gist
vacuum as well.

Since it is generic (allows to index all 64bit) it lacks of trick used
to speedup svtm. Still on 10x test it is faster than radix.


I've attached IntegerSet2 patch for pgtools repo and benchmark results.
Branch https://github.com/funny-falcon/pgtools/tree/integerset2

SVTM is measured with couple of changes from commit 
5055ef72d23482dd3e11ce

in that branch: 1) more often compress bitmap, but slower, 2) couple of
popcount tricks.

IntegerSet consists of trie index to CHUNKS. CHUNKS is compressed bitmap
of 2^15 (6+9) bits (almost like in SVTM, but for fixed bit width).

Well, IntegerSet2 is always faster than IntegerSet and always uses
significantly less memory (radix uses more memory than IntegerSet in
couple of tests and uses comparable in others).

IntegerSet2 is not always faster than radix. It is more like radix.
That it because both are generic prefix trees with comparable amount of
memory accesses. SVTM did the trick being not multilevel prefix tree, 
but

just 1 level bitmap index to chunks.

I believe, trie part of IntegerSet could be replaced with radix.
Ie use radix as storage for pointers to CHUNKS.


BTW, how does svtm work when we add two sets of dead tuple TIDs to one
svtm? Dead tuple TIDs are unique sets but those sets could have TIDs
of the different offsets on the same block. The case I imagine is the
idea discussed on this thread[1]. With this idea, we store the
collected dead tuple TIDs somewhere and skip index vacuuming for some
reason (index skipping optimization, failsafe mode, or interruptions
etc.). Then, in the next lazy vacuum timing, we load the dead tuple
TIDs and start to scan the heap. During the heap scan in the second
lazy vacuum, it's possible that new dead tuples will be found on the
pages that we have already stored in svtm during the first lazy
vacuum. How can we efficiently update the chunk in the svtm?


If we store tidmap to disk, then it will be serialized. Since SVTM/
IntegerSet2 are ordered, they could be loaded in order. Then we
can just merge tuples in per page basis: deserialize page (or CHUNK),
put new tu

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 7:56 AM Bruce Momjian  wrote:

> On Wed, Jul 28, 2021 at 09:35:28PM -0700, David G. Johnston wrote:
> > On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:
> >
> > I came up with the attached patch.
> >
> >
> > Thank you.  It is an improvement but I think more could be done here (not
> > exactly sure what - though removing the "copy binaries for contrib
> modules from
> > the old server" seems like a decent second step.)
>
> Uh, I don't see that text.
>
>
"""
 5. Install custom shared object files

Install any custom shared object files (or DLLs) used by the old cluster
into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
some other source. Do not install the schema definitions, e.g., CREATE
EXTENSION pgcrypto, because these will be upgraded from the old cluster.
Also, any custom full text search files (dictionary, synonym, thesaurus,
stop words) must also be copied to the new cluster.
"""
I have an issue with the fragment "whether they are from contrib" - my
understanding at this point is that because of the way we package and
version contrib it should not be necessary to copy those shared object
files from the old to the new server (maybe, just maybe, with a
qualification that you are upgrading between two versions that were in
support during the same time period).

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 11:17:52AM -0400, Dave Cramer wrote:
> On Thu, 29 Jul 2021 at 11:10, Bruce Momjian  wrote:
> OK, I think we should be more clear as to what is happening.  Saying they will
> be recreated is misleading.
> The extension definitions are being copied from the old server to the new
> server.

I think my wording is says exactly that:

Do not load the schema definitions, e.g., CREATE EXTENSION
pgcrypto, because these will be recreated from the old
cluster.  

> I also think we should have stronger wording in the "The extensions may be
> upgraded ..." statement. 
> I'd suggest "Any new versions of extensions should be upgraded using"

I can't really comment on that since I see little mention of upgrading
extensions in our docs.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [Proposal] Global temporary tables

2021-07-29 Thread wenjing zeng


> 2021年7月28日 23:09,Tony Zhu  写道:
> 
> Hi Wenjing
> 
> would you please rebase the code?
Thank you for your attention.
According to the test, the latest pgmaster code can merge the latest patch and 
pass the test.
https://www.travis-ci.com/github/wjzeng/postgres/builds 

If you have any questions, please give me feedback.


Wenjing


> 
> Thank you very much
> Tony
> 
> The new status of this patch is: Waiting on Author



Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 11:10, Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 11:01:43AM -0400, Dave Cramer wrote:
> >
> >
> >
> >
> > OK, I went with this new text.  There is confusion over install vs
> copy,
> > and whether this is happening at the operating system level or the
> SQL
> > level.  I tried to clarify that, but I am not sure I was
> successful.  I
> > also used the word "extension" since this is more common than
> "custom".
> >
> >
> > Much better, however I'm unclear on whether CREATE EXTENSION is actually
> > executed on the upgraded server.
> >
> > From what I could gather it is not, otherwise the new function
> definitions
> > should have been in place.
> > I think what happens is that the function definitions are copied as part
> of the
> > schema and pg_extension is also copied.
>
> Yes, the _effect_ of CREATE EXTENSION in the old cluster is copied to
> the new cluster as object definitions.  CREATE EXTENSION runs the SQL
> files associated with the extension.
>

OK, I think we should be more clear as to what is happening.  Saying they
will be recreated is misleading.
The extension definitions are being copied from the old server to the new
server.

I also think we should have stronger wording in the "The extensions may be
upgraded ..." statement.
I'd suggest "Any new versions of extensions should be upgraded using"

Dave

>
>


Re: Replace l337sp34k in comments.

2021-07-29 Thread Andrew Dunstan


On 7/29/21 8:51 AM, Geoff Winkless wrote:
> On Thu, 29 Jul 2021 at 11:22, Andrew Dunstan  > wrote:
>
> Personally, I would have written this as just "up to date", I don't
> think the hyphens are required.
>
>  
> FWIW Mirriam-Webster and the CED suggest "up-to-date" when before a
> noun, so the changes should be "up-to-date answer" but "are up to date".
>
> https://dictionary.cambridge.org/dictionary/english/up-to-date
> 
>
>

Interesting, thanks. My (admittedly old) Concise OED only has the
version with spaces, while my (also old) Collins Concise has the
hyphenated version. I learn something new every day, no matter how trivial.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 11:01:43AM -0400, Dave Cramer wrote:
> 
> 
> 
> 
> OK, I went with this new text.  There is confusion over install vs copy,
> and whether this is happening at the operating system level or the SQL
> level.  I tried to clarify that, but I am not sure I was successful.  I
> also used the word "extension" since this is more common than "custom".
> 
> 
> Much better, however I'm unclear on whether CREATE EXTENSION is actually
> executed on the upgraded server.
> 
> From what I could gather it is not, otherwise the new function definitions
> should have been in place. 
> I think what happens is that the function definitions are copied as part of 
> the
> schema and pg_extension is also copied.

Yes, the _effect_ of CREATE EXTENSION in the old cluster is copied to
the new cluster as object definitions.  CREATE EXTENSION runs the SQL
files associated with the extension.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [postgres_fdw] add local pid to fallback_application_name

2021-07-29 Thread Tom Lane
"kuroda.hay...@fujitsu.com"  writes:
> I propose adding trackable information in postgres_fdw, in order to track 
> remote query correctly.

I don't think this is a great idea as-is.  People who need to do this
sort of thing will all have their own ideas of what they need to track
--- most obviously, it might be appropriate to include the originating
server's name, else you don't know what machine the PID is for.
So I think most people with this sort of requirement will be overriding
the default application name anyway, so we might as well keep the
default behavior simple.

What would be better to think about is how to let users specify this
kind of behavior for themselves.  I think it's possible to set
application_name as part of a foreign server's connection options,
but at present the result would only be a constant string.  Somebody
who wished the PID to be in there would like to have some sort of
formatting escape, say "%p" for PID.  Extrapolating wildly, maybe we
could make all the %-codes known to log_line_prefix available here.

Perhaps this is overkill.  But I think the patch you have here is
not going to make very many people happy: it'll either be detail
they don't want, or too little detail.

regards, tom lane




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
>
> OK, I went with this new text.  There is confusion over install vs copy,
> and whether this is happening at the operating system level or the SQL
> level.  I tried to clarify that, but I am not sure I was successful.  I
> also used the word "extension" since this is more common than "custom".
>

Much better, however I'm unclear on whether CREATE EXTENSION is actually
executed on the upgraded server.

>From what I could gather it is not, otherwise the new function definitions
should have been in place.
I think what happens is that the function definitions are copied as part of
the schema and pg_extension is also copied.

Dave

>
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Wed, Jul 28, 2021 at 09:35:28PM -0700, David G. Johnston wrote:
> On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:
> 
> I came up with the attached patch.
> 
> 
> Thank you.  It is an improvement but I think more could be done here (not
> exactly sure what - though removing the "copy binaries for contrib modules 
> from
> the old server" seems like a decent second step.)

Uh, I don't see that text.

> I'm not sure it really needs a parenthetical, and I personally dislike using
> "Consider" to start the sentence.
> 
> "Bringing extensions up to the newest version available on the new server can
> be done later using ALTER EXTENSION UPGRADE (after ensuring the correct
> binaries are installed)."

OK, I went with this new text.  There is confusion over install vs copy,
and whether this is happening at the operating system level or the SQL
level.  I tried to clarify that, but I am not sure I was successful.  I
also used the word "extension" since this is more common than "custom".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..5ab7c57cd9 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -299,15 +299,17 @@ make prefix=/usr/local/pgsql.new install

 

-Install custom shared object files
+Install extension shared object files
 
 
- Install any custom shared object files (or DLLs) used by the old cluster
- into the new cluster, e.g., pgcrypto.so,
- whether they are from contrib
- or some other source. Do not install the schema definitions, e.g.,
- CREATE EXTENSION pgcrypto, because these will be upgraded
- from the old cluster.
+ If the old cluster used extensions, whether from
+ contrib or some other source, install new
+ versions of the extension shared object files (or DLLs) into the
+ new cluster, e.g., pgcrypto.so.
+ Do not load the schema definitions, e.g.,
+ CREATE EXTENSION pgcrypto, because these will be
+ recreated from the old cluster.  (The extensions may be
+ upgraded later using ALTER EXTENSION ... UPGRADE.)
  Also, any custom full text search files (dictionary, synonym,
  thesaurus, stop words) must also be copied to the new cluster.
 
@@ -494,10 +496,10 @@ pg_upgrade.exe
  
 
  
-  Install custom shared object files
+  Install extension shared object files
 
   
-   Install the same custom shared object files on the new standbys
+   Install the same extension shared object files on the new standbys
that you installed in the new primary cluster.
   
  


Re: libpq compression

2021-07-29 Thread Daniil Zakhlystov
Hi!

I made some noticeable changes to the patch and fixed the previously mentioned 
issues.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby  wrote:
> Previously, I suggested that the server should have a "policy" GUC defining
> which compression methods are allowed.  Possibly including compression 
> "level".
> For example, the default might be to allow zstd, but only up to level 9.

Now libpq_compression GUC server setting controls the available client-server 
traffic compression methods.
It allows specifying the exact list of the allowed compression algorithms.

For example, to allow only and zlib, set the setting to "zstd,zlib". Also, 
maximal allowed compression level can be specified for each 
method, e.g. "zstd:1,zlib:2" setting will set the maximal compression level for 
zstd to 1 and zlib to 2. 
If a client requests the compression with a higher compression level, it will 
be set to the maximal allowed one. 
The default (and recommended) maximal compression level for each algorithm is 1.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby  wrote:
> There's commit messages claiming that the compression can be "asymmetric"
> between client-server vs server-client, but the commit seems to be unrelated,
> and the protocol documentation doesn't describe how this works.

On Dec 10, 2020, at 1:39 AM, Robert Haas  wrote:
> Another idea is that you could have a new message type that says "hey,
> the payload of this is 1 or more compressed messages." It uses the
> most-recently set compression method. This would make switching
> compression methods easier since the SetCompressionMethod message
> itself could always be sent uncompressed and/or not take effect until
> the next compressed message


I rewrote the compression initialization logic. Now it is asymmetric and the 
compression method is changeable on the fly.

Now compression initialization consists only of two phases:
1. Client sends startup packet with _pq_.compression containing the list of 
compression algorithms specified by the client with an optional
specification of compression level (e.g. “zstd:2,zlib:1”)
2. The server intersects the requested compression algorithms with the allowed 
ones (controlled via the libpq_compression server config setting).
If the intersection is not empty, the server responds with CompressionAck 
containing the final list of the compression algorithms that can be used for 
the compression of libpq messages between the client and server.
If the intersection is empty (server does not accept any of the requested 
algorithms), then it replies with CompressionAck containing the empty list.

After sending the CompressionAck message, the server can send the 
SetCompressionMethod message to set the current compression algorithm for 
server-to-client traffic compression.
After receiving the CompressionAck message, the client can send the 
SetCompressionMethod message to set the current compression algorithm for 
client-to-server traffic compression.

SetCompressionMethod message contains the index of the compression algorithm in 
the final list of the compression algorithms which is generated during 
compression initialization (described above).

Compressed data is wrapped into CompressedData messages.

Rules for compressing the protocol messages are defined in zpq_stream.c. For 
each protocol message, the preferred compression algorithm can be chosen.

On Wed, Apr 21, 2021 at 15:35 AM Ian Zagorskikh  
wrote:
> Let me drop my humble 2 cents in this thread. At this moment I checked only 
> 0001-Add-zlib-and-zstd-streaming-compression patch. With no order:
> 
> * No checks for failures. For example, value from malloc() is not checked for 
> NULL and used as-is right after the call. Also as a result possibly NULL 
> values passed into ZSTD functions which are explicitly not NULL-tolerant and 
> so dereference pointers without checks.
> 
> * Used memory management does not follow the schema used in the common 
> module. Direct calls to std C malloc/free are hard coded. AFAIU this is not 
> backend friendly. Looking at the code around I believe they should be wrapped 
> to ALLOC/FREE local macroses that depend on FRONTEND define. As it's done for 
> example. in src/common/hmac.c.
> 
> * If we're going to fix our code to be palloc/pfree friendly there's also a 
> way to pass our custom allocator callbacks inside ZSTD functions. By default 
> ZSTD uses malloc/free but it can be overwritten by the caller in e.g. 
> ZSTD_createDStream_advanced(ZSTD_customMem customMem) versions of API . IMHO 
> that would be good. If a 3rd party component allows us to inject  a custom 
> memory allocator and we do have it why not use this feature?
> 
> * Most zs_foo() functions do not check ptr arguments, though some like 
> zs_free() do. If we're speaking about a "common API" that can be used by a 
> wide range of different modules around the project, such a relaxed way to 
> treat input arguments IMHO can be an evil. Or at least add Assert(ptr) 
> ass

Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2021-07-29 Thread Daniel Verite
   Hi,

Trying the v7a patch, here are a few comments:

* SIGSEGV with ON HOLD cursors.

Reproducer:

declare c cursor with hold for select oid,relname
  from pg_class order by 1 limit 10;

select * from rows_in('c') as x(f1 oid,f2 name);

consumes a bit of time, then crashes and generates a 13 GB core file
without a usable stacktrace:

Core was generated by `postgres: daniel postgres [local] SELECT  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f4c5b2f3dc9 in ?? ()
(gdb) bt
#0  0x7f4c5b2f3dc9 in ?? ()
#1  0x564567efc505 in ?? ()
#2  0x0001 in ?? ()
#3  0x56456a4b28f8 in ?? ()
#4  0x56456a4b2908 in ?? ()
#5  0x56456a4b2774 in ?? ()
#6  0x56456a4ad218 in ?? ()
#7  0x56456a4b1590 in ?? ()
#8  0x0010 in ?? ()
#9  0x in ?? ()


* rows_in() does not fetch from the current position of the cursor,
but from the start. For instance, I would expect that if doing
FETCH FROM cursor followed by SELECT * FROM rows_in('cursor'), the first
row would be ignored by rows_in(). That seems more convenient and more
principled.


* 
+  
+   This section describes functions that cursors to be manipulated
+   in normal SELECT queries.
+  

A verb seems to be missing.
It should be "function that *allow* cursors to be..." or something
like that?

* 
+   The REFCURSOR must be open, and the query must be a
+   SELECT statement. If the REFCURSOR’s
+   output does not

After  there is a fancy quote (codepoint U+2019). There is
currently no codepoint outside of US-ASCII in *.sgml ref/*.sgml, so
they're probably not welcome.


* Also: does the community wants it as a built-in function in core?
As mentioned in a previous round of review, a function like this in
plpgsql comes close:

create function rows_in(x refcursor) returns setof record as $$
declare
 r record;
begin
  loop
fetch x into r;
exit when not found;
return next r;
  end loop;
end $$ language plpgsql;



Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Masahiko Sawada
On Thu, Jul 29, 2021 at 8:03 PM Yura Sokolov  wrote:
>
> Masahiko Sawada писал 2021-07-29 12:11:
> > On Thu, Jul 29, 2021 at 3:53 AM Andres Freund 
> > wrote:
> >>
> >> Hi,
> >>
> >> On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote:
> >> > Apart from performance and memory usage points of view, we also need
> >> > to consider the reusability of the code. When I started this thread, I
> >> > thought the best data structure would be the one optimized for
> >> > vacuum's dead tuple storage. However, if we can use a data structure
> >> > that can also be used in general, we can use it also for other
> >> > purposes. Moreover, if it's too optimized for the current TID system
> >> > (32 bits block number, 16 bits offset number, maximum block/offset
> >> > number, etc.) it may become a blocker for future changes.
> >>
> >> Indeed.
> >>
> >>
> >> > In that sense, radix tree also seems good since it can also be used in
> >> > gist vacuum as a replacement for intset, or a replacement for hash
> >> > table for shared buffer as discussed before. Are there any other use
> >> > cases?
> >>
> >> Yes, I think there are. Whenever there is some spatial locality it has
> >> a
> >> decent chance of winning over a hash table, and it will most of the
> >> time
> >> win over ordered datastructures like rbtrees (which perform very
> >> poorly
> >> due to the number of branches and pointer dispatches). There's plenty
> >> hashtables, e.g. for caches, locks, etc, in PG that have a medium-high
> >> degree of locality, so I'd expect a few potential uses. When adding
> >> "tree compression" (i.e. skip inner nodes that have a single incoming
> >> &
> >> outgoing node) radix trees even can deal quite performantly with
> >> variable width keys.
> >
> > Good point.
> >
> >>
> >> > On the other hand, I’m concerned that radix tree would be an
> >> > over-engineering in terms of vacuum's dead tuples storage since the
> >> > dead tuple storage is static data and requires only lookup operation,
> >> > so if we want to use radix tree as dead tuple storage, I'd like to see
> >> > further use cases.
> >>
> >> I don't think we should rely on the read-only-ness. It seems pretty
> >> clear that we'd want parallel dead-tuple scans at a point not too far
> >> into the future?
> >
> > Indeed. Given that the radix tree itself has other use cases, I have
> > no concern about using radix tree for vacuum's dead tuples storage. It
> > will be better to have one that can be generally used and has some
> > optimizations that are helpful also for vacuum's use case, rather than
> > having one that is very optimized only for vacuum's use case.
>
> Main portion of svtm that leads to memory saving is compression of many
> pages at once (CHUNK). It could be combined with radix as a storage for
> pointers to CHUNKs.
>
> For a moment I'm benchmarking IntegerSet replacement based on Trie (HATM
> like)
> and CHUNK compression, therefore datastructure could be used for gist
> vacuum as well.
>
> Since it is generic (allows to index all 64bit) it lacks of trick used
> to speedup svtm. Still on 10x test it is faster than radix.

BTW, how does svtm work when we add two sets of dead tuple TIDs to one
svtm? Dead tuple TIDs are unique sets but those sets could have TIDs
of the different offsets on the same block. The case I imagine is the
idea discussed on this thread[1]. With this idea, we store the
collected dead tuple TIDs somewhere and skip index vacuuming for some
reason (index skipping optimization, failsafe mode, or interruptions
etc.). Then, in the next lazy vacuum timing, we load the dead tuple
TIDs and start to scan the heap. During the heap scan in the second
lazy vacuum, it's possible that new dead tuples will be found on the
pages that we have already stored in svtm during the first lazy
vacuum. How can we efficiently update the chunk in the svtm?

Regards,

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Showing applied extended statistics in explain

2021-07-29 Thread Dmitry Dolgov
> On Tue, Jul 27, 2021 at 10:20:34PM +0200, Tomas Vondra wrote:
>
> >> 1) The information is stashed in multiple lists added to a Plan. Maybe
> >> there's a better place, and maybe we need to invent a better way to
> >> track the info (a new node stashed in a single List).
> >>
> >> ...
> >>
> >> 3) It does not work for functional dependencies, because we effectively
> >> "merge" all functional dependencies and apply the entries. Not sure how
> >> to display this, but I think it should show the individual dependencies
> >> actually applied.
> >>
> >> ...
> >>
> >> 5) It includes just statistics name + clauses, but maybe we should
> >> include additional info (e.g estimate for that combination of clauses).
> >
> > Yes, a new node would be nice to have. The other questions above are
> > somewhat related to what it should contain, and I guess it depends on
> > the use case this patch targets. E.g. for the case "help to figure out
> > if an extended statistics was applied" even "merged" functional
> > dependencies would be fine I believe.
>
> What do you mean by "merged" functional dependencies? I guess we could
> say "these clauses were estimated using dependencies" without listing
> which exact dependencies were applied.

Yes, that's exactly what I was thinking. From the original email I've
got an impression that in case of functional dependencies you plan to
display the info only with the individual dependencies (when
implemented) or nothing at all. By "merged" I wanted to refer to the
statement about "merge" all functional dependencies and apply the
entries.

> > More advanced plan troubleshooting may benefit from estimates.
>
> I'm sorry, I don't know what you mean by this. Can you elaborate?

Yeah, sorry for not being clear. The idea was that the question about including
"additional info" strongly depends on which use cases the patch tries to
address, and I follow up on that further. There is no more hidden detailed
meaning here :)

> > I've got few more questions after reading the patch:
> >
> > * Is there any particular reason behind choosing only some scan nodes to
> >   display extended stats? E.g. BitmapHeapScan is missing.
> >
>
> All nodes that may apply extended stats to estimate quals should include
> this info. I guess BitmapHeapScan may do that for the filter, right?
>

Yes, something like this (stats output added by me):

 Bitmap Heap Scan on public.tenk1
   Output: unique1
   Recheck Cond: (tenk1.unique1 < 1000)
   Filter: (tenk1.stringu1 = 'xxx'::name)
   Statistics: public.s Clauses: ((unique1 < 1000) AND (stringu1 = 
'xxx'::name))
   ->  Bitmap Index Scan on tenk1_unique1
 Index Cond: (tenk1.unique1 < 1000)




Re: Out-of-memory error reports in libpq

2021-07-29 Thread Tom Lane
Andrew Dunstan  writes:
> On 7/29/21 3:01 AM, Peter Smith wrote:
>> I've seen lots of code like this where I may have been tempted to use
>> a ternary operator for readability, so I was wondering is there a PG
>> convention to avoid such ternary operator assignments, or is it simply
>> a personal taste thing, or is there some other reason?

> A simple grep on the sources should disabuse you of any idea that there
> is such a convention. The code is littered with examples of the ?: operator.

Yeah.  I happened not to write it that way here, but if I'd been reviewing
someone else's code and they'd done it that way, I'd not have objected.

In the case at hand, I'd personally avoid a ternary op for the first
assignment because then the line would run over 80 characters, and
you'd have to make decisions about where to break it.  (We don't have
a standardized convention about that, and none of the alternatives
look very good to my eye.)  Then it seemed to make sense to also
write the second step as an "if" not a ternary op.

regards, tom lane




Re: Replace l337sp34k in comments.

2021-07-29 Thread Geoff Winkless
On Thu, 29 Jul 2021 at 11:22, Andrew Dunstan  wrote:

> Personally, I would have written this as just "up to date", I don't
> think the hyphens are required.
>

FWIW Mirriam-Webster and the CED suggest "up-to-date" when before a noun,
so the changes should be "up-to-date answer" but "are up to date".

https://dictionary.cambridge.org/dictionary/english/up-to-date

Geoff


Re: [HACKERS] logical decoding of two-phase transactions

2021-07-29 Thread Amit Kapila
On Tue, Jul 27, 2021 at 11:41 AM Peter Smith  wrote:
>
> Please find attached the latest patch set v99*
>
> v98-0001 --> split into v99-0001 + v99-0002
>

Pushed the first refactoring patch after making few modifications as below.
1.
- /* open the spool file for the committed transaction */
+ /* Open the spool file for the committed/prepared transaction */
  changes_filename(path, MyLogicalRepWorker->subid, xid);

In the above comment, we don't need to say prepared. It can be done as
part of the second patch.

2.
+apply_handle_prepare_internal(LogicalRepPreparedTxnData
*prepare_data, char *gid)

I don't think there is any need for this function to take gid as
input. It can compute by itself instead of callers doing it.

3.
+static TransactionId+logicalrep_read_prepare_common(StringInfo in,
char *msgtype,
+   LogicalRepPreparedTxnData *prepare_data)

I don't think the above function needs to return xid because it is
already present as part of prepare_data. Even, if it is required due
to some reason for the second patch then let's do it as part of if but
I don't think it is required for the second patch.

4.
 /*
- * Write PREPARE to the output stream.
+ * Common code for logicalrep_write_prepare and
logicalrep_write_stream_prepare.
  */

Here and at a similar another place, we don't need to refer to
logicalrep_write_stream_prepare as that is part of the second patch.

Few comments on 0002 patch:
==
1.
+# -
+# 2PC + STREAMING TESTS
+# -
+
+# Setup logical replication (streaming = on)
+
+$node_B->safe_psql('postgres', "
+ ALTER SUBSCRIPTION tap_sub_B
+ SET (streaming = on);");
+
+$node_C->safe_psql('postgres', "
+ ALTER SUBSCRIPTION tap_sub_C
+ SET (streaming = on)");
+
+# Wait for subscribers to finish initialization
+$node_A->wait_for_catchup($appname_B);
+$node_B->wait_for_catchup($appname_C);

This is not the right way to determine if the new streaming option is
enabled on the publisher. Even if there is no restart of apply workers
(and walsender) after you have enabled the option, the above wait will
succeed. You need to do something like below as we are doing in
001_rep_changes.pl:

$oldpid = $node_publisher->safe_psql('postgres',
"SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub';"
);
$node_subscriber->safe_psql('postgres',
"ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_ins_only WITH
(copy_data = false)"
);
$node_publisher->poll_query_until('postgres',
"SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name
= 'tap_sub';"
) or die "Timed out while waiting for apply to restart";

2.
+# Create some pre-existing content on publisher (uses same DDL as
015_stream test)

Here, in the comments, I don't see the need to same uses same DDL ...

-- 
With Regards,
Amit Kapila.




Re: [Patch] ALTER SYSTEM READ ONLY

2021-07-29 Thread Amul Sul
On Thu, Jul 29, 2021 at 4:47 PM Dilip Kumar  wrote:
>
> On Wed, Jul 28, 2021 at 5:03 PM Amul Sul  wrote:
> >
> > I was too worried about how I could miss that & after thinking more
> > about that, I realized that the operation for ArchiveRecoveryRequested
> > is never going to be skipped in the startup process and that never
> > left for the checkpoint process to do that later. That is the reason
> > that assert was added there.
> >
> > When ArchiveRecoveryRequested, the server will no longer be in
> > the wal prohibited mode, we implicitly change the state to
> > wal-permitted. Here is the snip from the 0003 patch:
> >
> > @@ -6614,13 +6629,30 @@ StartupXLOG(void)
> >   (errmsg("starting archive recovery")));
> >   }
> >
> > - /*
> > - * Take ownership of the wakeup latch if we're going to sleep during
> > - * recovery.
> > - */
> >   if (ArchiveRecoveryRequested)
> > + {
> > + /*
> > + * Take ownership of the wakeup latch if we're going to sleep during
> > + * recovery.
> > + */
> >   OwnLatch(&XLogCtl->recoveryWakeupLatch);
> >
> > + /*
> > + * Since archive recovery is requested, we cannot be in a wal prohibited
> > + * state.
> > + */
> > + if (ControlFile->wal_prohibited)
> > + {
> > + /* No need to hold ControlFileLock yet, we aren't up far enough */
> > + ControlFile->wal_prohibited = false;
> > + ControlFile->time = (pg_time_t) time(NULL);
> > + UpdateControlFile();
> > +
>
> Is there some reason why we are forcing 'wal_prohibited' to off if we
> are doing archive recovery?  It might have already been discussed, but
> I could not find it on a quick look into the thread.
>

Here is: 
https://postgr.es/m/CA+TgmoZ=cctbaxxmtyzogxegqzoz9smkbwrdpsacpjvfcgc...@mail.gmail.com

Regards,
Amul




Re: Out-of-memory error reports in libpq

2021-07-29 Thread Ranier Vilela
Em qui., 29 de jul. de 2021 às 04:02, Peter Smith 
escreveu:

> (This is not a code review - this is just to satisfy my curiosity)
>
> I've seen lots of code like this where I may have been tempted to use
> a ternary operator for readability, so I was wondering is there a PG
> convention to avoid such ternary operator assignments, or is it simply
> a personal taste thing, or is there some other reason?
>
> For example:
>
> if (msg)
>   res->errMsg = msg;
> else
>   res->errMsg = libpq_gettext("out of memory\n");
>
The C compiler will expand:

res->errMsg = msg ? msg : libpq_gettext("out of memory\n");

to

if (msg)
 res->errMsg = msg;
else
 res->errMsg = libpq_gettext("out of memory\n");

What IMHO is much more readable.

regards,
Ranier Vilela


  1   2   >