Re: Allow file inclusion in pg_hba and pg_ident files

2022-10-23 Thread Julien Rouhaud
On Mon, Oct 10, 2022 at 10:51:32AM +0800, Julien Rouhaud wrote:
> On Sun, Sep 18, 2022 at 01:06:12AM +0800, Julien Rouhaud wrote:
> > On Tue, Aug 16, 2022 at 02:10:30PM +0800, Julien Rouhaud wrote:
> > >
> > > On Sat, Jul 30, 2022 at 04:09:36PM +0800, Julien Rouhaud wrote:
> > >
> > > >
> > > > - 0001: the rule_number / mapping_number addition in the views in a 
> > > > separate
> > > >   commit
> > > > - 0002: the main file inclusion patch.  Only a few minor bugfix since
> > > >   previous version discovered thanks to the tests (a bit more about it 
> > > > after),
> > > >   and documentation tweaks based on previous discussions
> > > > - 0003: the pg_hba_matches() POC, no changes

v12 attached, fixing multiple conflicts with recent activity.
>From 3361344bc0e907b341df912cf8ffcf28440b2370 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 30 May 2022 10:59:51 +0800
Subject: [PATCH v12 1/3] Add rule_number / mapping_number to the
 pg_hba/pg_ident views.

Author: Julien Rouhaud
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya%40jrouhaud
---
 doc/src/sgml/system-views.sgml  | 22 +
 src/backend/utils/adt/hbafuncs.c| 50 ++---
 src/include/catalog/pg_proc.dat | 11 ---
 src/test/regress/expected/rules.out | 10 +++---
 4 files changed, 72 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 1ca7c3f9bf..4723f712a7 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -991,6 +991,18 @@
 
 
 
+ 
+  
+   rule_number int4
+  
+  
+   Rule number of this rule among all rules if the rule is valid, otherwise
+   null. This indicates the order in which each rule will be considered
+   until the first matching one, if any, is used to perform authentication
+   with the client.
+  
+ 
+
  
   
line_number int4
@@ -1131,6 +1143,16 @@
 
 
 
+ 
+  
+   mapping_number int4
+  
+  
+   Mapping number, in priority order, of this mapping if the mapping is
+   valid, otherwise null
+  
+ 
+
  
   
line_number int4
diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index cfdc4d8b39..21a451e391 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -26,10 +26,12 @@
 
 static ArrayType *get_hba_options(HbaLine *hba);
 static void fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
- int lineno, HbaLine *hba, 
const char *err_msg);
+ int rule_number, int lineno, 
HbaLine *hba,
+ const char *err_msg);
 static void fill_hba_view(Tuplestorestate *tuple_store, TupleDesc tupdesc);
 static void fill_ident_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
-   int lineno, IdentLine 
*ident, const char *err_msg);
+   int mapping_number, int 
lineno, IdentLine *ident,
+   const char *err_msg);
 static void fill_ident_view(Tuplestorestate *tuple_store, TupleDesc tupdesc);
 
 
@@ -157,7 +159,7 @@ get_hba_options(HbaLine *hba)
 }
 
 /* Number of columns in pg_hba_file_rules view */
-#define NUM_PG_HBA_FILE_RULES_ATTS  9
+#define NUM_PG_HBA_FILE_RULES_ATTS  10
 
 /*
  * fill_hba_line
@@ -165,6 +167,7 @@ get_hba_options(HbaLine *hba)
  *
  * tuple_store: where to store data
  * tupdesc: tuple descriptor for the view
+ * rule_number: unique rule identifier among all valid rules
  * lineno: pg_hba.conf line number (must always be valid)
  * hba: parsed line data (can be NULL, in which case err_msg should be set)
  * err_msg: error message (NULL if none)
@@ -174,7 +177,8 @@ get_hba_options(HbaLine *hba)
  */
 static void
 fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
- int lineno, HbaLine *hba, const char *err_msg)
+ int rule_number, int lineno, HbaLine *hba,
+ const char *err_msg)
 {
Datum   values[NUM_PG_HBA_FILE_RULES_ATTS];
boolnulls[NUM_PG_HBA_FILE_RULES_ATTS];
@@ -193,6 +197,11 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc 
tupdesc,
memset(nulls, 0, sizeof(nulls));
index = 0;
 
+   /* rule_number */
+   if (err_msg)
+   nulls[index++] = true;
+   else
+   values[index++] = Int32GetDatum(rule_number);
/* line_number */
values[index++] = Int32GetDatum(lineno);
 
@@ -336,7 +345,7 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc 
tupdesc,
else
{
/* no parsing result, so set relevant fields to nulls */
-   memset([1], true, 

Re: Crash after a call to pg_backup_start()

2022-10-23 Thread Michael Paquier
On Mon, Oct 24, 2022 at 11:42:58AM +0900, Kyotaro Horiguchi wrote:
> At Sat, 22 Oct 2022 09:56:06 +0200, Alvaro Herrera  
> wrote in 
>> My intention here was that the Assert should be inside the block, that
>> is, we already know that at least one is true, and we want to make sure
>> that they are not *both* true.
>> 
>> AFAICT the attached patch also fixes the bug without making the assert
>> weaker.

On the contrary, it seems to me that putting the assertion within the
if() block makes the assertion weaker, because we would never check
for an incorrect state after do_pg_abort_backup() is registered (aka
any pg_backup_start() call) when not entering in this if() block.

Saying that, if you feel otherwise I am fine with your conclusion as
well, so feel free to solve this issue as you see fit.  :p
--
Michael


signature.asc
Description: PGP signature


Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-10-23 Thread Jeff Davis
On Fri, 2022-10-21 at 17:39 -0700, Peter Geoghegan wrote:
> One small thought on the presentation/docs side of this: maybe it
> would be better to invent a new kind of autovacuum

It's possible this would be easier for users to understand: one process
that does cleanup work over time in a way that minimizes interference;
and another process that activates in more urgent situations (perhaps
due to misconfiguration of the first process).

But we should be careful that we don't end up with more confusion. For
something like that to work, we'd probably want the second process to
not be configurable at all, and we'd want it to be issuing WARNINGs
pointing to what might be misconfigured, and otherwise just be
invisible.

>  That way we wouldn't be fighting against the widely held perception
> that antiwraparound autovacuums are scary.

There's certainly a terminology problem there. Just to brainstorm on
some new names, we might want to call it something like "xid
reclamation" or "xid horizon advancement".

When it starts to run out, we can use words like "wraparound" or
"exhaustion".


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Question about savepoint level?

2022-10-23 Thread Japin Li


Hi, hackers

The TransactionStateData has savepointLevel field, however, I do not understand
what is savepoint level, it seems all savepoints have the same savepointLevel,
I want to know how the savepoint level changes.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: [PoC] Reducing planning time when tables have many partitions

2022-10-23 Thread Yuya Watari
Hello,

On Wed, Sep 21, 2022 at 6:43 PM Yuya Watari  wrote:
> 1.1. Revert one of our optimizations (v5)
>
> As I mentioned in the comment in
> v[5|6|7]-0002-Revert-one-of-the-optimizations.patch, I reverted one of
> our optimizations. This code tries to find EquivalenceMembers that do
> not satisfy the bms_overlap condition. We encounter such members early
> in the loop, so the linear search is enough, and our optimization is
> too excessive here. As a result of experiments, I found this
> optimization was a bottleneck, so I reverted it.

In the previous mail, I proposed a revert of one excessive
optimization. In addition, I found a new bottleneck and attached a new
version of the patch solving it to this email.

The new bottleneck exists in the select_outer_pathkeys_for_merge()
function. At the end of this function, we count EquivalenceMembers
that satisfy the specific condition. To count them, we have used
Bitmapset operations. Through experiments, I concluded that this
optimization is effective for larger cases but leads to some
degradation for the smaller number of partitions. The new patch
switches two algorithms depending on the problem sizes.

1. Experimental result

1.1. Join Order Benchmark

As in the previous email, I used the Join Order Benchmark to evaluate
the patches' performance. The correspondence between each version and
patches is as follows.

v3: v8-0001-*.patch
v5 (v3 with revert): v8-0001-*.patch + v8-0002-*.patch
v8 (v5 with revert): v8-0001-*.patch + v8-0002-*.patch + v8-0003-*.patch

I show the speed-up of each method compared with the master branch in
Table 1. When the number of partitions is 1, performance degradation
is kept to 1.1% in v8, while they are 4.2% and 1.8% in v3 and v5. This
result indicates that a newly introduced revert is effective.

Table 1: Speedup of Join Order Benchmark (higher is better)
(n = the number of partitions)
--
   n | v3 | v5 (v3 with revert) | v8 (v5 with revert)
--
   2 |  95.8% |   98.2% |   98.9%
   4 |  97.2% |   99.7% |   99.3%
   8 | 101.4% |  102.5% |  103.4%
  16 | 108.7% |  111.4% |  110.2%
  32 | 127.1% |  127.6% |  128.8%
  64 | 169.5% |  172.1% |  172.4%
 128 | 330.1% |  335.2% |  332.3%
 256 | 815.1% |  826.4% |  821.8%
--

1.2. pgbench

The following table describes the result of pgbench. The v5 and v8
performed clearly better than the v3 patch. The difference between v5
and v8 is not so significant, but v8's performance is close to the
master branch.

Table 2: The result of pgbench (tps)
-
   n | Master |   v3 | v5 (v3 with revert) | v8 (v5 with revert)
-
   1 |   7550 | 7422 |7474 |7521
   2 |   7594 | 7381 |7536 |7529
   4 |   7518 | 7362 |7461 |7524
   8 |   7459 | 7340 |7424 |7460
-
 Avg |   7531 | 7377 |7474 |7509
-

2. Conclusion and future works

The revert in the v8-0003-*.patch is effective in preventing
performance degradation for the smaller number of partitions. However,
I don't think what I have done in the patch is the best or ideal
solution. As I mentioned in the comments in the patch, switching two
algorithms may be ugly because it introduces code duplication. We need
a wiser solution to this problem.

-- 
Best regards,
Yuya Watari


v8-0001-Apply-eclass_member_speedup_v3.patch.patch
Description: Binary data


v8-0002-Revert-one-of-the-optimizations.patch
Description: Binary data


v8-0003-Use-conventional-algorithm-for-smaller-size-cases.patch
Description: Binary data


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-23 Thread Michael Paquier
On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote:
> On 10/21/22 2:58 AM, Michael Paquier wrote:
>> The same approach with keywords first, regex, and exact match could be
>> applied as well for the databases?  Perhaps it is just mainly a matter
>> of taste,
> 
> Yeah, I think it is.

;)

Still it looks that this makes for less confusion with a minimal
footprint once the new additions are in place.

>> In the same fashion as load_ident(), it seems to me that we
>> need two extra things for this patch:
>> - if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
>> through new_parsed_lines and free for each line the AuthTokens for the
>> database and user lists.
>> - if ok and new_parsed_lines != NIL, the same cleanup needs to
>> happen.
> 
> Right, but I think that should be "parsed_hba_lines != NIL".

For the second case, where we need to free the past contents after a
success, yes.

> Right. To avoid code duplication in the !ok/ok cases, the function
> free_hba_line() has been added in v2: it goes through the list of databases
> and roles tokens and call free_auth_token() for each of them.

Having a small-ish routine for that is fine.

I have spent a couple of hours doing a pass over v2, playing manually
with regex patterns, reloads, the system views and item lists.  The
logic was fine, but I have adjusted a few things related to the
comments and the documentation (particularly with the examples,
removing one example and updating one with a regex that has a comma,
needing double quotes).  The CI and all my machines were green, and
the test coverage looked sufficient.  So, applied.  I'll keep an eye
on the buildfarm.
--
Michael


signature.asc
Description: PGP signature


Re: doubt about FullTransactionIdAdvance()

2022-10-23 Thread Zhang Mingli
Hi, Andres

On Oct 24, 2022, 01:16 +0800, Andres Freund , wrote:
> Hi,
>
> On 2022-10-22 11:32:47 +0800, Zhang Mingli wrote:
> > Hi, hackers
> >
> > I don't quite understand FullTransactionIdAdvance(), correct me if I’m 
> > wrong.
> >
> >
> > /*
> >  * Advance a FullTransactionId variable, stepping over xids that would 
> > appear
> >  * to be special only when viewed as 32bit XIDs.
> >  */
> > static inline void
> > FullTransactionIdAdvance(FullTransactionId *dest)
> > {
> > dest->value++;
> >
> > /* see FullTransactionIdAdvance() */
> > if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
> >  return;
> >
> > while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
> >  dest->value++;
> > }
> >
> > #define XidFromFullTransactionId(x) ((x).value)
> > #define FullTransactionIdPrecedes(a, b) ((a).value < (b).value)
> >
> > Can the codes reach line: while (XidFromFullTransactionId(*dest) < 
> > FirstNormalTransactionId)?
> > As we will return if (FullTransactionIdPrecedes(*dest, 
> > FirstNormalFullTransactionId)), and the two judgements seem equal.
> > Another confusion is the comments: /* see FullTransactionIdAdvance() */, is 
> > its own  itself.
> > Could anyone explain this? Thanks in advance.
>
> FullTransactionId is 64bit. An "old school" xid is 32bit. The first branch is
> to protect against the special fxids that are actually below
> FirstNormalFullTransactionId:
>
> if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
> return;
>
> The second branch is to protect against 64bit xids that would yield a 32bit
> xid below FirstNormalTransactionId after truncating to 32bit:
>
> while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
> dest->value++;
>
> E.g. we don't want to modify the 64bit xid 0 (meaning InvalidTransactionId) as
> it has special meaning. But we have to make sure that e.g. the 64bit xid
> 0x1 won't exist, as it'd also result in InvalidTransactionId if
> truncated to 32bit.
>
>
> However, it looks like this comment:
> /* see FullTransactionIdAdvance() */
> if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
> return;
>
> is bogus, and it's my fault. Looks like it's intending to reference
> FullTransactionIdRetreat().
>
> Greetings,
>
> Andres Freund
Now I get it, thanks for your explanation.

Regards,
Zhang Mingli


Re: csv_populate_recordset and csv_agg

2022-10-23 Thread Tom Lane
Steve Chavez  writes:
> CSV processing is also a common use case and PostgreSQL has the COPY ..
> FROM .. CSV form but COPY is not compatible with libpq pipeline mode and
> the interface is clunkier to use.

> I propose to include two new functions:

> - csv_populate_recordset ( base anyelement, from_csv text )
> - csv_agg ( anyelement )

The trouble with CSV is there are so many mildly-incompatible
versions of it.  I'm okay with supporting it in COPY, where
we have the freedom to add random sub-options (QUOTE, ESCAPE,
FORCE_QUOTE, yadda yadda) to cope with those variants.
I don't see a nice way to handle that issue in the functions
you propose --- you'd have to assume that there is One True CSV,
which sadly ain't so, or else complicate the functions beyond
usability.

Also, in the end CSV is a surface presentation layer, and as
such it's not terribly well suited as the calculation representation
for aggregates and other functions.  I think these proposed functions
would have pretty terrible performance as a consequence of the
need to constantly re-parse the surface format.  The same point
could be made about JSON ... which is why we prefer to implement
processing functions with JSONB.

regards, tom lane




Re: pg_recvlogical prints bogus error when interrupted

2022-10-23 Thread Bharath Rupireddy
On Fri, Oct 21, 2022 at 7:52 AM Kyotaro Horiguchi
 wrote:
>
> > +1. How about emitting a message like its friend pg_receivewal, like
> > the attached patch?
>
> I'm not a fan of treating SIGINT as an error in this case. It calls
> prepareToTerminate() when time_to_abort and everything goes fine after
> then. So I think we should do the same thing after receiving an
> interrupt.  This also does file-sync naturally as a part of normal
> shutdown.  I'm also not a fan of doing fsync at error.

I think the pg_recvlogical can gracefully exit on both SIGINT and
SIGTERM to keep things simple.

> > > I also then noticed that we don't fsync the output file in cases of 
> > > errors -
> > > that seems wrong to me? Looks to me like that block should be moved till 
> > > after
> > > the error:?
> >
> > How about something like the attached patch?

The attached patch (pg_recvlogical_graceful_interrupt.text) has a
couple of problems, I believe. We're losing prepareToTerminate() with
keepalive true and we're not skipping pg_log_error("unexpected
termination of replication stream: %s" upon interrupt, after all we're
here discussing how to avoid it.

I came up with the attached v2 patch, please have a look.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 56e25373796b114254f5995701b07b05381f28ef Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 22 Oct 2022 08:35:16 +
Subject: [PATCH v2] pg_recvlogical fixes

---
 src/bin/pg_basebackup/pg_recvlogical.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 5f2e6af445..849e9d9071 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -54,7 +54,8 @@ static const char *plugin = "test_decoding";
 
 /* Global State */
 static int	outfd = -1;
-static volatile sig_atomic_t time_to_abort = false;
+static bool	time_to_abort = false;
+static volatile sig_atomic_t ready_to_exit = false;
 static volatile sig_atomic_t output_reopen = false;
 static bool output_isfile;
 static TimestampTz output_last_fsync = -1;
@@ -283,6 +284,23 @@ StreamLogicalLog(void)
 			copybuf = NULL;
 		}
 
+		/* When we get SIGINT/SIGTERM, we exit */
+		if (ready_to_exit)
+		{
+			/*
+			 * Try informing the server about our exit, but don't wait around
+			 * or retry on failure.
+			 */
+			(void) PQputCopyEnd(conn, NULL);
+			(void) PQflush(conn);
+			time_to_abort = ready_to_exit;
+
+			if (verbose)
+pg_log_info("received interrupt signal, exiting");
+
+			break;
+		}
+
 		/*
 		 * Potentially send a status message to the primary.
 		 */
@@ -614,7 +632,8 @@ StreamLogicalLog(void)
 
 		res = PQgetResult(conn);
 	}
-	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+	if (!ready_to_exit &&
+		PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
 		pg_log_error("unexpected termination of replication stream: %s",
 	 PQresultErrorMessage(res));
@@ -656,7 +675,7 @@ error:
 static void
 sigexit_handler(SIGNAL_ARGS)
 {
-	time_to_abort = true;
+	ready_to_exit = true;
 }
 
 /*
-- 
2.34.1



Re: cross-platform pg_basebackup

2022-10-23 Thread Julien Rouhaud
Hi,

On Fri, Oct 21, 2022 at 09:15:39AM -0400, Robert Haas wrote:
>
> Committed and back-patched to all supported branches.

Is there any additional things to be taken care of or should
https://commitfest.postgresql.org/40/3954/ be closed?




Re: Crash after a call to pg_backup_start()

2022-10-23 Thread Kyotaro Horiguchi
At Sat, 22 Oct 2022 09:56:06 +0200, Alvaro Herrera  
wrote in 
> On 2022-Oct-21, Michael Paquier wrote:
> 
> > On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
> 
> > > /* These conditions can not be both true */
> > 
> > If you do that, it would be a bit easier to read as of the following
> > assertion instead?  Like:
> > Assert(!during_backup_start ||
> >sessionBackupState == SESSION_BACKUP_NONE);
> 
> My intention here was that the Assert should be inside the block, that
> is, we already know that at least one is true, and we want to make sure
> that they are not *both* true.
> 
> AFAICT the attached patch also fixes the bug without making the assert
> weaker.

I'm fine with either of them, but..

The reason for that works the same way is that the if() block excludes
the case of (!during_backup_start && S_B_RUNNING)<*1>. In other words
the strictness is a kind of illusion [*a]. Actually the assertion does
not detect the case <*1>.  In this regard, moving the current
assertion into the if() block might be confusing.

regards,

<*1>: It's evidently superfluous but "strictness" and "illusion" share
 the exactly the same pronounciation in Japanese "Ghen-Ka-Ku".

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [DOCS] Stats views and functions not in order?

2022-10-23 Thread Peter Smith
A rebase was needed.

PSA v3*.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




csv_populate_recordset and csv_agg

2022-10-23 Thread Steve Chavez
Hello hackers,

The `json_populate_recordset` and `json_agg` functions allow systems to
process/generate json directly on the database. This "cut outs the middle
tier"[1] and notably reduces the complexity of web applications.

CSV processing is also a common use case and PostgreSQL has the COPY ..
FROM .. CSV form but COPY is not compatible with libpq pipeline mode and
the interface is clunkier to use.

I propose to include two new functions:

- csv_populate_recordset ( base anyelement, from_csv text )
- csv_agg ( anyelement )

I would gladly implement these if it sounds like a good idea.

I see there's already some code that deals with CSV on

- src/backend/commands/copyfromparse.c(CopyReadAttributesCSV)
- src/fe_utils/print.c(csv_print_field)
- src/backend/utils/error/csvlog(write_csvlog)

So perhaps a new csv module could benefit the codebase as well.

Best regards,
Steve

[1]: https://www.crunchydata.com/blog/generating-json-directly-from-postgres


Re: [PATCH] minor bug fix for pg_dump --clean

2022-10-23 Thread Tom Lane
=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?=  writes:
> When using pg_dump (or pg_restore) with option "--clean", there is some 
> SQL code to drop every objects at the beginning.

Yup ...

> The DROP statement for a view involving circular dependencies is :
> CREATE OR REPLACE VIEW [...]
> (see commit message of d8c05aff for a much better explanation)
> If the view is not in the "public" schema, and the target database is 
> empty, this statement fails, because the schema hasn't been created yet.
> The attached patches are a TAP test which can be used to reproduce the 
> bug, and a proposed fix. They apply to the master branch.

I am disinclined to accept this as a valid bug, because there's never
been any guarantee that a --clean script would execute error-free in
a database that doesn't match what the source database contains.

(The pg_dump documentation used to say that in so many words.
I see that whoever added the --if-exists option was under the
fond illusion that that fixes all cases, which it surely does not.
We need to back off the promises a bit there.)

An example of a case that won't execute error-free is if the view
having a circular dependency includes a column of a non-built-in
data type.  If you try to run that in an empty database, you'll
get an error from the CREATE OR REPLACE VIEW's reference to that
data type.  For instance, if I adjust your test case to make
the "payload" column be of type hstore, I get something like

psql:dumpresult.sql:22: ERROR:  type "public.hstore" does not exist
LINE 4: NULL::public.hstore AS payload;
  ^

The same type of failure occurs for user-defined functions and
operators that use a non-built-in type, and I'm sure there are
more cases in the same vein.  But it gets *really* messy if
the target database isn't completely empty, but contains objects
with different properties than the dump script expects; for example,
if the view being discussed here exists with a different column set
than the script thinks, or if the dependency chains aren't all the
same.

If this fix were cleaner I might be inclined to accept it anyway,
but it's not very clean at all --- for example, it's far from
obvious to me what are the side-effects of changing the filter
in RestoreArchive like that.  Nor am I sure that the schema
you want to create is guaranteed to get dropped again later in
every use-case.

So I think mainly what we ought to do here is to adjust the
documentation to make it clearer that --clean is not guaranteed
to work without errors unless the target database has the same
set of objects as the source.  --if-exists can reduce the set
of error cases, but not eliminate it.  Possibly we should be
more enthusiastic about recommending --create --clean (ie,
drop and recreate the whole database) instead.

regards, tom lane




Re: [PATCH] minor bug fix for pg_dump --clean

2022-10-23 Thread Виктория Шепард
Hi,

Good catch! Here are several points for improvement:
1. pg_dump.c:17380
Maybe better to write simpler


appendPQExpBuffer(delcmd, "CREATE SCHEMA IF NOT EXISTS %s;\n",
tbinfo->dobj.namespace->dobj.name);

because there is a schema name inside the `tbinfo->dobj.namespace->dobj.name
`

2. pg_backup_archiver.c:588
Here are no necessary spaces before and after braces, and no spaces around
the '+' sign.

( strncmp(dropStmt, "CREATE SCHEMA IF NOT EXISTS", 27) == 0 &&
  strstr(dropStmt+29, "CREATE OR REPLACE VIEW") ))


Best regards,
Viktoria Shepard

чт, 1 сент. 2022 г. в 12:13, Frédéric Yhuel :

> Hello,
>
> When using pg_dump (or pg_restore) with option "--clean", there is some
> SQL code to drop every objects at the beginning.
>
> The DROP statement for a view involving circular dependencies is :
>
> CREATE OR REPLACE VIEW [...]
>
> (see commit message of d8c05aff for a much better explanation)
>
> If the view is not in the "public" schema, and the target database is
> empty, this statement fails, because the schema hasn't been created yet.
>
> The attached patches are a TAP test which can be used to reproduce the
> bug, and a proposed fix. They apply to the master branch.
>
> Best regards,
> Frédéric


Re: Use simplehash.h instead of dynahash in SMgr

2022-10-23 Thread Andres Freund
Hi,

On 2021-04-25 03:58:38 +1200, David Rowley wrote:
> Currently, we use dynahash hash tables to store the SMgrRelation so we
> can perform fast lookups by RelFileNodeBackend. However, I had in mind
> that a simplehash table might perform better. So I tried it...

> The test case was basically inserting 100 million rows one at a time
> into a hash partitioned table with 1000 partitions and 2 int columns
> and a primary key on one of those columns. It was about 12GB of WAL. I
> used a hash partitioned table in the hope to create a fairly
> random-looking SMgr hash table access pattern.  Hopefully something
> similar to what might happen in the real world.

A potentially stupid question: Do we actually need to do smgr lookups in this
path? Afaict nearly all of the buffer lookups here will end up as cache hits in
shared buffers, correct?

Afaict we'll do two smgropens in a lot of paths:
1) XLogReadBufferExtended() does smgropen so it can do smgrnblocks()
2) ReadBufferWithoutRelcache() does an smgropen()

It's pretty sad that we constantly do two smgropen()s to start with. But in
the cache hit path we don't actually need an smgropen in either case afaict.

ReadBufferWithoutRelcache() does an smgropen, because that's
ReadBuffer_common()'s API. Which in turn has that API because it wants to use
RelationGetSmgr() when coming from ReadBufferExtended().  It doesn't seem
awful to allow smgr to be NULL and to pass in the rlocator in addition.

XLogReadBufferExtended() does an smgropen() so it can do smgrcreate() and
smgrnblocks(). But neither is needed in the cache hit case, I think. We could
do a "read-only" lookup in s_b, and only do the current logic in case that
fails?

Greetings,

Andres Freund




Re: Multiple grouping set specs referencing duplicate alias

2022-10-23 Thread Tom Lane
David Kimura  writes:
> I think I may have stumbled across a case of wrong results on HEAD (same
> through version 9.6, though interestingly 9.5 produces different results
> altogether).

> test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
> ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

Yeah, this is an instance of an issue we've known about for awhile:
when using grouping sets (ROLLUP), the planner fails to distinguish
between "ai1" and "ai1 as possibly nulled by the action of the
grouping node".  This has been discussed at, eg, [1] and [2].
The direction I'd like to take to fix it is to invent explicit
labeling of Vars that have been nulled by some operation such as
outer joins or grouping, and then represent grouping set outputs
as either PlaceHolderVars or Vars tied to a new RTE that represents
the grouping step.  I have been working on a patch that'd do the
first half of that [3], but it's been slow going, because we've
indulged in a lot of semantic squishiness in this area and cleaning
it all up is a large undertaking.

> I tinkered a bit and hacked together an admittedly ugly patch that triggers an
> explicit sort constructed from the parse tree.

I seriously doubt that that'll fix all the issues in this area.
We really really need to understand that a PathKey based on
the scan-level value of a Var is different from a PathKey based
on a post-nulling-step value.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CAMbWs48AtQTQGk37MSyDk_EAgDO3Y0iA_LzvuvGQ2uO_Wh2muw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/7dbdcf5c-b5a6-ef89-4958-da212fe10176%40iki.fi
[3] https://www.postgresql.org/message-id/flat/830269.1656693...@sss.pgh.pa.us




Re: Re[2]: Possible solution for masking chosen columns when using pg_dump

2022-10-23 Thread Виктория Шепард
Hi,

Thank you, Oleg Tselebrovskiy, for your valuable review, here are the fixes

Best regards,
Viktoria Shepard

ср, 12 окт. 2022 г. в 12:19, Виктория Шепард :

> Hi,
>
> Here is an idea of how to read masking options from a file. Please, take a
> look.
>
> пн, 10 окт. 2022 г. в 14:54, Олег Целебровский  >:
>
>> Hi,
>>
>> I applied most of suggestions: used separate files for most of added
>> code, fixed typos/mistakes, got rid of that pesky TODO that was already
>> implemented, just not deleted.
>>
>> Added tests (and functionality) for cases when you need to mask columns
>> in tables with the same name in different schemas. If schema is not
>> specified, then columns in all tables with specified name are masked
>> (Example - pg_dump -t ‘t0’ --mask-columns id --mask-function mask_int will
>> mask all ids in all tables with names ‘t0’ in all existing schemas).
>>
>> Wrote comments for all ‘magic numbers’
>>
>> About that
>>
>> >- Also it can be hard to use a lot of different functions for different
>> fields, maybe it would be better to set up functions in a file.
>>
>> I agree with that, but I know about at least 2 other patches (both are
>> WIP, but still) that are interacting with reading command-line options from
>> file. And if everyone will write their own version of reading command-line
>> options from file, it will quickly get confusing.
>>
>> A solution to that problem is another patch that will put all options
>> from file (one file for any possible options, from existing to future ones)
>> into **argv in main, so that pg_dump can process them as if they came form
>> command line.
>>
>>
>> Пятница, 7 октября 2022, 8:01 +07:00 от Виктория Шепард <
>> we.vikt...@gmail.com>:
>>
>> Hi,
>> I took a look, here are several suggestions for improvement:
>>
>> - Masking is not a main functionality of pg_dump and it is better to
>> write most of the connected things in a separate file like parallel.c or
>> dumputils.c. This will help slow down the growth of an already huge pg_dump
>> file.
>>
>> - Also it can be hard to use a lot of different functions for different
>> fields, maybe it would be better to set up functions in a file.
>>
>> - How will it work for the same field and tables in the different
>> schemas? Can we set up the exact schema for the field?
>>
>> - misspelling in a word
>> >/*
>> >* Add all columns and funcions to list of MaskColumnInfo structures,
>> >*/
>>
>> - Why did you use 256 here?
>> > char* table = (char*) pg_malloc(256 * sizeof(char));
>> Also for malloc you need malloc on 1 symbol more because you have to
>> store '\0' symbol.
>>
>> - Instead of addFuncToDatabase you can run your query using something
>> already defined from fe_utils/query_utils.c. And It will be better to set
>> up a connection only once and create all functions. Establishing a
>> connection is a resource-intensive procedure. There are a lot of magic
>> numbers, better to leave some comments explaining why there are 64 or 512.
>>
>> - It seems that you are not using temp_string
>> > char   *temp_string = (char*)malloc(256 * sizeof(char));
>>
>> - Grammar issues
>> >/*
>> >* mask_column_info_list contains info about every to-be-masked column:
>> >* its name, a name its table (if nothing is specified - mask all columns
>> with this name),
>> >* name of masking function and name of schema containing this function
>> (public if not specified)
>> >*/
>> the name of its table
>>
>>
>> пн, 3 окт. 2022 г. в 20:45, Julien Rouhaud > >:
>>
>> Hi,
>>
>> On Mon, Oct 03, 2022 at 06:30:17PM +0300, Олег Целебровский wrote:
>> >
>> > Hello, here's my take on masking data when using pg_dump
>> >
>> > The main idea is using PostgreSQL functions to replace data during a
>> SELECT.
>> > When table data is dumped SELECT a,b,c,d ... from ... query is
>> generated, the columns that are marked for masking are replaced with result
>> of functions on those columns
>> > Example: columns name, count are to be masked, so the query will look
>> as such: SELECT id, mask_text(name), mask_int(count), date from ...
>> >
>> > So about the interface: I added 2 more command-line options:
>> >
>> > --mask-columns, which specifies what columns from what tables will be
>> masked
>> > usage example:
>> > --mask-columns "t1.name, t2.description" - both columns
>> will be masked with the same corresponding function
>> > or --mask-columns name - ALL columns with name "name" from
>> all dumped tables will be masked with correspoding function
>> >
>> > --mask-function, which specifies what functions will mask data
>> > usage example:
>> > --mask-function mask_int - corresponding columns will be
>> masked with function named "mask_int" from default schema (public)
>> > or --mask-function my_schema.mask_varchar - same as above
>> but with specified schema where the function is stored
>> > or --mask-function somedir/filename - the function is
>> "defined" here - more on the structure below

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-23 Thread Maciek Sakrejda
On Thu, Oct 20, 2022 at 10:31 AM Andres Freund  wrote:
> - "repossession" is a very unintuitive name for me. If we want something like
>   it, can't we just name it reuse_failed or such?

+1, I think "repossessed" is awkward. I think "reuse_failed" works,
but no strong opinions on an alternate name.

> - Wonder if the column names should be reads, writes, extends, etc instead of
>   the current naming pattern

Why? Lukas suggested alignment with existing views like
pg_stat_database and pg_stat_statements. It doesn't make sense to use
the blks_ prefix since it's not all blocks, but otherwise it seems
like we should be consistent, no?

> > "freelist_acquired" is confusing for local buffers but I wanted to
> > distinguish between reuse/eviction of local buffers and initial
> > allocation. "freelist_acquired" seemed more fitting because there is a
> > clocksweep to find a local buffer and if it hasn't been allocated yet it
> > is allocated in a place similar to where shared buffers acquire a buffer
> > from the freelist. If I didn't count it here, I would need to make a new
> > column only for local buffers called "allocated" or something like that.
>
> I think you're making this too granular. We need to have more detail than
> today. But we don't necessarily need to catch every nuance.

In general I agree that coarser granularity here may be easier to use.
I do think the current docs explain what's going on pretty well,
though, and I worry if merging too many concepts will make that harder
to follow. But if a less detailed breakdown still communicates
potential problems, +1.

> > This seems not too bad at first, but if you consider that later we will
> > add other kinds of IO -- eg WAL IO or temporary file IO, we won't be
> > able to use these existing columns and will need to add even more
> > columns describing the exact behavior in those cases.
>
> I think it's clearly not the right direction.

+1, I think the existing approach makes more sense.




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-23 Thread Maciek Sakrejda
On Wed, Oct 19, 2022 at 12:27 PM Melanie Plageman
 wrote:
>
> v34 is attached.
> I think the column names need discussion. Also, the docs need more work
> (I added a lot of new content there). I could use feedback on the column
> names and definitions and review/rephrasing ideas for the docs
> additions.

Nice! I think the expanded docs are great, and make this information
much easier to interpret.

>+   io_context bulkread, existing
>+   dirty buffers in the ring requirng flush are

"requiring"

>+   shared buffers were acquired from the freelist and added to the
>+   fixed-size strategy ring buffer. Shared buffers are added to the
>+   strategy ring lazily. If the current buffer in the ring is pinned or in

This is the first mention of the term "strategy" in these docs. It's
not totally opaque, since there's some context, but maybe we should
either try to avoid that term or define it more explicitly?

>+   io_contexts. This is equivalent to
>+   evicted for shared buffers in
>+   io_context shared, as the 
>contents
>+   of the buffer are evicted but refers to the case when 
>the

I don't quite follow this: does this mean that I should expect
'reused' and 'evicted' to be equal in the 'shared' context, because
they represent the same thing? Or will 'reused' just be null because
it's not distinct from 'evicted'? It looks like it's null right now,
but I find the wording here confusing.

>+  future with a new shared buffer. A high number of
>+  bulkread rejections can indicate a need for more
>+  frequent vacuuming or more aggressive autovacuum settings, as buffers 
>are
>+  dirtied during a bulkread operation when updating the hint bit or when
>+  performing on-access pruning.

This is great. Just wanted to re-iterate that notes like this are
really helpful to understanding this view.

> I've implemented a change using the same function pg_settings uses to
> turn the build-time parameter BLCKSZ into 8kB (get_config_unit_name())
> using the flag GUC_UNIT_BLOCKS. I am unsure if this is better or worse
> than "block_size". I am feeling very conflicted about this column.

Yeah, I guess it feels less natural here than in pg_settings, but it
still kind of feels like one way of doing this is better than two...




Re: Pluggable toaster

2022-10-23 Thread Nikita Malakhov
Hi!

Aleksander,
>Don't you think that this is an arguable design decision? Basically
>all we know about the underlying TableAM is that it stores tuples
>_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
>it even has any sort of pages, whether they are fixed in size or not,
>whether it uses shared buffers, etc. It may not even require TOAST.
>(Not to mention the fact that when you have N TOAST implementations
>and M TableAM implementations now you have to run N x M compatibility
>tests. And this doesn't account for different versions of Ns and Ms,
>different platforms and different versions of PostgreSQL.)

>I believe the proposed approach is architecturally broken from the
beginning.

Existing TOAST mechanics just works, but for certain types of data it does
so
very poorly, and, let's face it, this mechanics has very strict limitations
that limit
overall capabilities of DBMS, because TOAST was designed when today's
usual amounts of data were not the case - I mean tables with hundreds of
billions of rows, with sizes measured by hundreds of Gb and even by
Terabytes.

But TOAST itself is good solution to problem of storing oversized
attributes, and
though it has some limitations - it is unwise to just throw it away, better
way is to
make it up-to-date by revising it, get rid of the most painful limitations
and allow
to use different (custom) TOAST strategies for special cases.

The main idea of Pluggable TOAST is to extend TOAST capabilities by
providing
common API allowing to uniformly use different strategies to TOAST
different data.
With the acronym "TOAST" I mean that data would be stored externally to
source
table, somewhere only its Toaster know where and how - it may be regular
Heap
tables, Heap tables with different table structure, some other AM tables,
files outside
of the database, even files on different storage systems. Pluggable TOAST
allows
using advanced compression methods and complex operations on externally
stored
data, like search without fully de-TOASTing data, etc.

Also, existing TOAST is a part of Heap AM and is restricted to use Heap
only.
To make it extensible - we have to separate TOAST from Heap AM. Default
TOAST
in Pluggable TOAST still uses Heap, but Heap knows nothing about TOAST. It
fits
perfectly in OOP paradigms

>It looks like the idea should be actually turned inside out. I.e. what
>would be nice to have is some sort of _framework_ that helps TableAM
>authors to implement TOAST (alternatively, the rest of the TableAM
>except for TOAST) if the TableAM is similar to the default one. In
>other words the idea is not to implement alternative TOASTers that
>will work with all possible TableAMs but rather to simplify the task
>of implementing an alternative TableAM which is similar to the default
>one except for TOAST. These TableAMs should reuse as much common code
>as possible except for the parts where they differ.

To implement different TOAST strategies you must have an API to plug them
in,
otherwise for each strategy you'd have to change the core. TOAST API allows
to plug
in custom TOAST strategies just by adding contrib modules, once the API is
merged
into the core. I have to make a point that different TOAST strategies do
not have
to store data with other TAMs, they just could store these data in Heap but
using
knowledge of internal data structure of workflow to store them in a more
optimal
way - like fast and partially compressed and decompressed JSON, lots of
large
chunks of binary data stored in the database (as you know, largeobjects are
not
of much help with this) and so on.

Implementing another Table AM just to implement another TOAST strategy
seems too
much, the TAM API is very heavy and complex, and you would have to add it
as a contrib.
Lots of different TAMs would cause much more problems than lots of Toasters
because
such a solution results in data incompatibility between installations with
different TAMs
and some minor changes in custom TAM contrib could lead to losing all data
stored with
this TAM, but with custom TOAST you (in the worst case) could lose just
TOASTed data
 and nothing else.

We have lots of requests from clients and tickets related to TOAST
limitations and
extending Postgres this way - this growing need made us develop Pluggable
TOAST.



On Sun, Oct 23, 2022 at 12:38 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > Pluggable TOAST API was designed with storage flexibility in mind, and
> Custom TOAST mechanics is
> > free to use any storage methods
>
> Don't you think that this is an arguable design decision? Basically
> all we know about the underlying TableAM is that it stores tuples
> _somehow_ and that tuples have TIDs [1]. That's it. We don't know if
> it even has any sort of pages, whether they are fixed in size or not,
> whether it uses shared buffers, etc. It may not even require TOAST.
> (Not to mention the fact that when you have N TOAST implementations
> and M TableAM 

Re: Interesting areas for beginners

2022-10-23 Thread Robert Treat
On Sun, Oct 23, 2022 at 6:24 AM Aleksander Alekseev
 wrote:
>
> Hi Matheus,
>
> > Some months ago I've got my first patch accept [1], and I'm looking to try 
> > to
> > make other contributions.
>
> In personal experience reviewing other people's code is a good
> starting point. Firstly, IMO this is one of the most valuable
> contributions, since the community is always short on reviewers.
> Secondly, in the process you will learn what the rest of the community
> is working on, which patches have good chances to be accepted, and
> learn the implementation details of the system.
>
> Additionally I would like to recommend the following materials for self-study:
>
> * 
> https://www.amazon.com/Database-System-Concepts-Abraham-Silberschatz/dp/1260084507/
> ** Especially the chapter available online about PostgreSQL
> * https://www.youtube.com/playlist?list=PLSE8ODhjZXjZaHA6QcxDfJ0SIWBzQFKEG
> * 
> https://www.timescale.com/blog/how-and-why-to-become-a-postgresql-contributor/
>

I would second the recommendation to help with patch reviewing because
it is one of the most valuable contributions you can make to the
project as well as a good way to start to build relationships with
other contributors, which will be helpful the next time they are
tearing apart one of your patches ;-)

In addition, two other resources to be aware of:
* Paul Ramsey has a really nice write up on his thoughts on getting
started hacking Postgres:
http://blog.cleverelephant.ca/2022/10/postgresql-links.html

* I suspect you may have seen these, but in case not, the wiki has
several key pages to be aware of, which are linked to from
https://wiki.postgresql.org/wiki/Development_information


Robert Treat
https://xzilla.net




Re: Tracking last scan time

2022-10-23 Thread Robert Treat
On Fri, Oct 14, 2022 at 2:55 PM Dave Page  wrote:
> On Fri, 14 Oct 2022 at 19:16, Andres Freund  wrote:
>> On 2022-10-13 14:38:06 +0100, Dave Page wrote:
>> > Thanks for that. It looks good to me, bar one comment (repeated 3 times in
>> > the sql and expected files):
>> >
>> > fetch timestamps from before the next test
>> >
>> > "from " should be removed.
>>
>> I was trying to say something with that from, but clearly it wasn't
>> understandable :). Removed.
>>
>> With that I pushed the changes and marked the CF entry as committed.
>
>
> Thanks!
>

Hey folks,

I was looking at this a bit further (great addition btw) and noticed
the following behavior (this is a mre of the original testing that
uncovered this):

pagila=# select * from pg_stat_user_tables ;
 relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
---++-+--+---+--+--+---+---+---+---+---+---+++-++-+-+--+--+--+--+---+---
(0 rows)

pagila=# create table x (xx int);
CREATE TABLE
Time: 2.145 ms
pagila=# select * from pg_stat_user_tables ;
 relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
---++-+--+---+--+--+---+---+---+---+---+---+++-++-+-+--+--+--+--+---+---
 16392 | public | x   |0 | [null]|
0 |   [null] | [null]|[null] | 0 | 0 |
0 | 0 |  0 |  0 |
 0 |  0 | [null]  | [null]  | [null]
| [null]   |0 |0 | 0 |
0
(1 row)

pagila=# insert into x select 1;
INSERT 0 1
pagila=# select * from pg_stat_user_tables ;
 relid | schemaname | relname | seq_scan | last_seq_scan  |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
---++-+--++--+--+---+---+---+---+---+---+++-++-+-+--+--+--+--+---+---
 16392 | public | x   |0 | 1999-12-31 19:00:00-05 |
0 |   [null] | [null]|[null] | 1 |
0 | 0 | 0 |  1 |  0 |
 1 |  1 | [null]  | [null]  |
[null]   | [null]   |0 |0 |
 0 | 0
(1 row)

Normally we populate "last" columns with a NULL value when the
corresponding marker is zero, which seems correct in the first query,
but no longer matches in the second. I can see an argument that this
is a necessary exception to that rule (I'm not sure I agree with it,
but I see it) but even in that scenario, ISTM we should avoid
populating the table with a "special value", which generally goes
against observability best practices, and I believe we've been able to
avoid it elsewhere.

Beyond that, I also notice the behavior changes when adding a table
with a PK, though not necessarily better...

pagila=# drop table x;
DROP TABLE
Time: 2.896 ms
pagila=# select * from pg_stat_user_tables ;
 relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | 

Re: doubt about FullTransactionIdAdvance()

2022-10-23 Thread Andres Freund
Hi,

On 2022-10-22 11:32:47 +0800, Zhang Mingli wrote:
> Hi, hackers
> 
> I don't quite understand FullTransactionIdAdvance(), correct me if I’m wrong.
> 
> 
> /*
>  * Advance a FullTransactionId variable, stepping over xids that would appear
>  * to be special only when viewed as 32bit XIDs.
>  */
> static inline void
> FullTransactionIdAdvance(FullTransactionId *dest)
> {
>   dest->value++;
> 
>   /* see FullTransactionIdAdvance() */
>   if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
>  return;
> 
>   while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
>  dest->value++;
> }
> 
> #define XidFromFullTransactionId(x) ((x).value)
> #define FullTransactionIdPrecedes(a, b)   ((a).value < (b).value)
> 
> Can the codes reach line: while (XidFromFullTransactionId(*dest) < 
> FirstNormalTransactionId)?
> As we will return if (FullTransactionIdPrecedes(*dest, 
> FirstNormalFullTransactionId)), and the two judgements seem equal.
> Another confusion is the comments: /* see FullTransactionIdAdvance() */, is 
> its own  itself.
> Could anyone explain this? Thanks in advance.

FullTransactionId is 64bit. An "old school" xid is 32bit. The first branch is
to protect against the special fxids that are actually below
FirstNormalFullTransactionId:

if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
return;

The second branch is to protect against 64bit xids that would yield a 32bit
xid below FirstNormalTransactionId after truncating to 32bit:

while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
dest->value++;

E.g. we don't want to modify the 64bit xid 0 (meaning InvalidTransactionId) as
it has special meaning. But we have to make sure that e.g. the 64bit xid
0x1 won't exist, as it'd also result in InvalidTransactionId if
truncated to 32bit.


However, it looks like this comment:
/* see FullTransactionIdAdvance() */
if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
return;

is bogus, and it's my fault. Looks like it's intending to reference
FullTransactionIdRetreat().

Greetings,

Andres Freund




Re: [PATCH] Replace "implementor" with "implementer"

2022-10-23 Thread Aleksander Alekseev
Hi Tom,

> They're both valid according to the dictionaries I looked
> at, eg [1].  I don't feel a need to change anything.

OK, thanks. And sorry for the noise.

-- 
Best regards,
Aleksander Alekseev




Re: [PATCH] Replace "implementor" with "implementer"

2022-10-23 Thread Tom Lane
Aleksander Alekseev  writes:
> I noticed that there are several places where we use the spelling
> "implementOr" while the correct one seems to be "implementEr". Here is
> the patch.

They're both valid according to the dictionaries I looked
at, eg [1].  I don't feel a need to change anything.

regards, tom lane

[1] https://www.merriam-webster.com/dictionary/implement




Re: [PATCH] Replace "implementor" with "implementer"

2022-10-23 Thread Aleksander Alekseev
Hi hackers,

> I noticed that there are several places where we use the spelling
> "implementOr" while the correct one seems to be "implementEr". Here is
> the patch.

After a little more study I found evidence that both spellings can be
acceptable [1]. As a non-native speaker I can't judge whether this is
true or not and which spelling is preferable. I believe we should
unify the spelling though. The reason why I initially thought
"implementOr" is an incorrect spelling is because most spell-checking
tools I personally use indicated so.

[1] https://english.stackexchange.com/a/358111

-- 
Best regards,
Aleksander Alekseev




Re: Interesting areas for beginners

2022-10-23 Thread Aleksander Alekseev
Hi Matheus,

> Some months ago I've got my first patch accept [1], and I'm looking to try to
> make other contributions.

In personal experience reviewing other people's code is a good
starting point. Firstly, IMO this is one of the most valuable
contributions, since the community is always short on reviewers.
Secondly, in the process you will learn what the rest of the community
is working on, which patches have good chances to be accepted, and
learn the implementation details of the system.

Additionally I would like to recommend the following materials for self-study:

* 
https://www.amazon.com/Database-System-Concepts-Abraham-Silberschatz/dp/1260084507/
** Especially the chapter available online about PostgreSQL
* https://www.youtube.com/playlist?list=PLSE8ODhjZXjZaHA6QcxDfJ0SIWBzQFKEG
* https://www.timescale.com/blog/how-and-why-to-become-a-postgresql-contributor/

-- 
Best regards,
Aleksander Alekseev




[PATCH] Replace "implementor" with "implementer"

2022-10-23 Thread Aleksander Alekseev
Hi hackers,

I noticed that there are several places where we use the spelling
"implementOr" while the correct one seems to be "implementEr". Here is
the patch.

-- 
Best regards,
Aleksander Alekseev


v1-0001-Fix-incorrect-spelling-of-the-word-implementer.patch
Description: Binary data


Re: Difference between HeapTupleData and TupleTableSlot structures

2022-10-23 Thread Aleksander Alekseev
Hi hackers,

> TupleTableSlot is a more abstract concept, being a container
> for a tuple that can be present in several different forms.
> It can contain a concrete tuple (HeapTupleData), or a "virtual"
> tuple that is just an array of Datum+isnull values.  The executor
> usually uses tuple slots to return tuples out of plan nodes;
> they're not very common elsewhere.

I came across another little piece of information about
TupleTableSlots [1] and recalled this thread:

"""
To implement an access method, an implementer will typically need to
implement an AM-specific type of tuple table slot (see
src/include/executor/tuptable.h), which allows code outside the access
method to hold references to tuples of the AM, and to access the
columns of the tuple.
"""

Hopefully this is helpful.

[1] https://www.postgresql.org/docs/current/tableam.html

-- 
Best regards,
Aleksander Alekseev




Re: Pluggable toaster

2022-10-23 Thread Aleksander Alekseev
Hi Nikita,

> Pluggable TOAST API was designed with storage flexibility in mind, and Custom 
> TOAST mechanics is
> free to use any storage methods

Don't you think that this is an arguable design decision? Basically
all we know about the underlying TableAM is that it stores tuples
_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
it even has any sort of pages, whether they are fixed in size or not,
whether it uses shared buffers, etc. It may not even require TOAST.
(Not to mention the fact that when you have N TOAST implementations
and M TableAM implementations now you have to run N x M compatibility
tests. And this doesn't account for different versions of Ns and Ms,
different platforms and different versions of PostgreSQL.)

I believe the proposed approach is architecturally broken from the beginning.

It looks like the idea should be actually turned inside out. I.e. what
would be nice to have is some sort of _framework_ that helps TableAM
authors to implement TOAST (alternatively, the rest of the TableAM
except for TOAST) if the TableAM is similar to the default one. In
other words the idea is not to implement alternative TOASTers that
will work with all possible TableAMs but rather to simplify the task
of implementing an alternative TableAM which is similar to the default
one except for TOAST. These TableAMs should reuse as much common code
as possible except for the parts where they differ.

Does it make sense?

Sorry, I realize this will probably imply a complete rewrite of the
patch. This is the reason why one should start proposing changes from
gathering the requirements, writing an RFC and run it through several
rounds of discussion.

[1]: https://www.postgresql.org/docs/current/tableam.html

-- 
Best regards,
Aleksander Alekseev