date:20210420

Hello Satyanarayana,

This can be an option for us in our case. But there also needs to be a 
process how to detect these "stuck commits" and how to invalidate/remove 
them, because in reality, if the app/user would not see the change in 
the database, it/he/she will try to insert/delete it again. If it just 
stuck without management, it will create a queue which can cause, that 
in the queue there will be 2 similar inserts/deletes which can again 
cause issues (like with the primary key I mentioned before).

So the process should be in this case:

- DBA receives information, that write operations stuck (DBA in 
coordination with the infrastructure team disconnects all clients and 
prevent new ones to create a new connection).
- DBA will recognize, that there is an issue in communication between 
the primary and the sync replica (caused the issue with the propagation 
of commits)

- DBA will see that there are some commits that are in the "stuck state"
- DBA removes these stuck commits. Note: Because the client never 
received a confirmation about the successful commit -> changes in the DB 
client tried to perform can't be considered as successful.
- DBA and infrastructure team restore the communication between server 
nodes to be able to propagate commits from the primary node to sync replica.

- DBA and infrastructure team allows new connections to the database

This approach would require external monitoring and alerting, but I 
would say, that this is an acceptable solution. Would your patch be able 
to perform that?

Thank you
Ondrej

On 20/04/2021 22:19, SATYANARAYANA NARLAPURAM wrote:
One idea here is to make the backend ignore query cancellation/backend 
termination while waiting for the synchronous commit ACK. This way 
client never reads the data that was never flushed remotely. The 
problem with this approach is that your backends get stuck until your 
commit log record is flushed on the remote side. Also, the client can 
see the data not flushed remotely if the server crashes and comes back 
online. You can prevent the latter case by making a SyncRepWaitForLSN 
before opening up the connections to the non-superusers. I have a 
working prototype of this logic, if there is enough interest I can 
post the patch.

On Tue, Apr 20, 2021 at 11:25 AM Ondřej Žižka > wrote:

I am sorry, I forgot mentioned, that in the second situation I
added a
primary key to the table.

Ondrej

On 20/04/2021 18:49, Ondřej Žižka wrote:
> Hello Aleksander,
>
> Thank you for the reaction. This was tested on version 13.2.
>
> There are also other possible situations with the same setup and
> similar issue:
>
> -
> When the background process on server fails
>
> On postgresql1:
> tecmint=# select * from a; --> LAN on sync replica is OK
>  id
> 
>   1
> (1 row)
>
> tecmint=# insert into a values (2); ---> LAN on sync replica is
DOWN
> and insert is waiting. During this time kill the background
process on
> the PostgreSQL server for this session
> WARNING:  canceling the wait for synchronous replication and
> terminating connection due to administrator command
> DETAIL:  The transaction has already committed locally, but
might not
> have been replicated to the standby.
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> tecmint=# select * from a;
>  id
> 
>   1
>   2
> (2 rows)
>
> tecmint=# ---> LAN on sync replica is still DOWN
>
> The potgres session will restore after the background process
failed.
> When you run select on master, it still looks OK. But data is still
> not replicated on the sync replica. If we lost the master now, we
> would lost this data as well.
>
> **
> Another case
> **
>
> Kill the client process.
>
> tecmint=# select * from a;
>  id
> 
>   1
>   2
>   3
> (3 rows)
> tecmint=#    --> Disconnect the sync replica now.
LAN on
> replica is DOWN
> tecmint=# insert into a values (4); --> Kill the client process
> Terminated
> xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432
-d tecmint
> Password for user postgres:
> psql (13.2 (Debian 13.2-1.pgdg100+1))
> Type "help" for help.
>
> tecmint=# select * from a;
>  id
> 
>   1
>   2
>   3
> (3 rows)
>
> tecmint=# --> Number 4 is not there. Now switch the LAN on sync
> replica ON.
>
> --
>
> Result from sync replica after the LAN is again UP:
> tecmint=# select * from a;
>  id
> 
>   1
>

Re: Replication slot stats misgivings

2021-04-20 Thread Amit Kapila

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada  wrote:
>
>
> I've attached the patch. In addition to the test Vignesh prepared, I
> added one test for the message for creating a slot that checks if the
> statistics are initialized after re-creating the same name slot.
>

I am not sure how much useful your new test is because you are testing
it for slot name for which we have removed the slot file. It is not
related to stat messages this patch is sending. I think we can leave
that for now. One other minor comment:

- * create the statistics for the replication slot.
+ * create the statistics for the replication slot.  In the cases where the
+ * message for dropping the old slot gets lost and a slot with the same
+ * name is created, since the stats will be initialized by the message
+ * for creating the slot the statistics are not accumulated into the
+ * old slot unless the messages for both creating and dropping slots with
+ * the same name got lost.  Just in case it happens, the user can reset
+ * the particular stats by pg_stat_reset_replication_slot().

I think we can change it to something like: " XXX In case, the
messages for creation and drop slot of the same name get lost and
create happens before (auto)vacuum cleans up the dead slot, the stats
will be accumulated into the old slot. One can imagine having OIDs for
each slot to avoid the accumulation of stats but that doesn't seem
worth doing as in practice this won't happen frequently.". Also, I am
not sure after your recent change whether it is a good idea to mention
something in docs. What do you think?

-- 
With Regards,
Amit Kapila.

Re: wal stats questions

2021-04-20 Thread torikoshia


On 2021-04-16 10:27, Masahiro Ikeda wrote:

On 2021/04/13 9:33, Fujii Masao wrote:



On 2021/03/30 20:37, Masahiro Ikeda wrote:

OK, I added the condition to the fast-return check. I noticed that I
misunderstood that the purpose is to avoid expanding a clock check 
using WAL
stats counters. But, the purpose is to make the conditions stricter, 
right?


Yes. Currently if the following condition is false even when the WAL 
counters
are updated, nothing is sent to the stats collector. But with your 
patch,

in this case the WAL stats are sent.

if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
    pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
    !have_function_stats && !disconnect)

Thanks for the patch! It now fails to be applied to the master 
cleanly.

So could you rebase the patch?


Thanks for your comments!
I rebased it.


Thanks for working on this!

I have some minor comments on 
performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch.



177 @@ -3094,20 +3066,33 @@ pgstat_report_wal(void)
178   * Return true if the message is sent, and false otherwise.

Since you changed the return value to void, it seems the description is
not necessary anymore.

208 +* generate wal records. 'wal_writes' and 'wal_sync' are 
zero means the


It may be better to change 'wal_writes' to 'wal_write' since single
quotation seems to mean variable name.

234 +* set the counters related to generated WAL data if the 
counters are



set -> Set?


Regards,

Re: multi-install PostgresNode fails with older postgres versions

On Tue, Apr 20, 2021 at 01:11:59PM -0400, Andrew Dunstan wrote:
> Here's the patch for that.

Thanks.

> + # Accept standard formats, in case caller has handed us the output of a
> + # postgres command line tool
> + $arg = $1
> + if ($arg =~ m/\(?PostgreSQL\)? (\d+(?:\.\d+)*(?:devel)?)/);

Interesting.  This would work even if using --with-extra-version,
which is a good thing.

> +# render the version number in the standard "joined by dots" notation if
> +# interpolated into a string
> +sub _stringify
> +{
> +   my $self = shift;
> +   return join('.',  @$self);
> +}

This comes out a bit strangely when using a devel build as this
appends -1 as sub-version number, becoming say 14.-1.  It may be
clearer to add back "devel" in this case?

Wouldn't it be better to add some perldoc to PostgresVersion.pm?
--
Michael

signature.asc
Description: PGP signature

Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?

On Wed, Apr 21, 2021 at 8:02 AM Kyotaro Horiguchi
 wrote:
> > Thanks! I think we could avoid extra processing costs for cases like
> > VACUUM/ANALYZE foo, foo; when no explicit columns are specified. The
> > avoided costs can be lock acquire, relation open, vacuum/analyze,
> > relation close, starting new xact command, command counter increment
> > in case of analyze etc. This can be done with a simple patch like the
> > attached. When explicit columns are specified along with relations
> > i.e. VACUUM/ANALYZE foo(c1), foo(c2); we don't want to do the extra
> > complex processing to optimize the cases when c1 = c2.
> >
> > Note that the TRUNCATE command currently skips processing repeated
> > relations (see ExecuteTruncate). For example, TRUNCATE foo, foo; and
> > TRUNCATE foo, ONLY foo, foo; first instance of relation foo is taken
> > into consideration for processing and other relation instances
> > (options specified if any) are ignored.
> >
> > Thoughts?
>
> Although I don't strongly oppose to check that, the check of truncate
> is natural and required. The relation list is anyway used afterwards,
> and we cannot truncate the same relation twice or more since a
> relation under "use" cannot be truncated. (Truncation is one form of
> use).  In short, TRUNCATE runs no checking just for the check's own
> sake.

Thanks for the point. Yes, if we don't skip repeated instances we do
get below error:
postgres=# truncate t1, t1;
ERROR:  cannot TRUNCATE "t1" because it is being used by active
queries in this session

> On the other hand the patch creates a relation list just for this
> purpose, which is not needed to run VACUUM/ANALYZE, and VACUUM/ANALYE
> works well with duplicates in target relations.

Yeah, the relids list is only used to skip the duplicates. I feel
that's okay given the negligible extra processing (searching for the
relids in the list) we add with it versus the extra processing we
avoid with skipping duplicates, see [1].

Although VACUUM/ANALYZE works well with duplicate relations without
any error (unlike TRUNCATE), is there any benefit if we run
back-to-back VACUUM/ANALYZE within a single command? I assume that
there's no benefit. My only point was that even if somebody specifies
duplicate relations, we could avoid some processing effort see [1] for
the gain. For ANALYZE, we can avoid doing extra
StartTransactionCommand, CommitTransactionCommand and
CommandCounterIncrement as well.

I know the use cases that I'm trying to optimize with the patch are
worthless and unrealistic (may be written by someone like me). Since
we generally don't optimize for rare and unrecommended scenarios, I'm
okay if we drop this patch. But I would like to mention [1] the gain
we get with the patch.

[1] tested on my dev system, with default postgresql.conf, t1 is
having 10mn rows:
HEAD:
postgres=# analyze t1;
Time: 363.580 ms
postgres=# analyze t1;
Time: 384.760 ms

postgres=# analyze t1, t1;
Time: 687.976 ms
postgres=# analyze t1, t1;
Time: 664.420 ms

postgres=# analyze t1, t1, t1;
Time: 1010.855 ms (00:01.011)
postgres=# analyze t1, t1, t1;
Time: 1119.970 ms (00:01.120)

postgres=# analyze t1, t1, t1, t1;
Time: 1350.345 ms (00:01.350)
postgres=# analyze t1, t1, t1, t1;
Time: 1316.738 ms (00:01.317)

postgres=# analyze t1, t1, t1, t1, t1;
Time: 1651.780 ms (00:01.652)
postgres=# analyze t1, t1, t1, t1, t1, t1;
Time: 1983.163 ms (00:01.983)

PATCHed:
postgres=# analyze t1;
Time: 356.709 ms
postgres=# analyze t1;
Time: 360.780 ms

postgres=# analyze t1, t1;
Time: 377.193 ms
postgres=# analyze t1, t1;
Time: 370.636 ms

postgres=# analyze t1, t1, t1;
Time: 364.271 ms
postgres=# analyze t1, t1, t1;
Time: 349.988 ms

postgres=# analyze t1, t1, t1, t1;
Time: 362.567 ms
postgres=# analyze t1, t1, t1, t1;
Time: 383.292 ms

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Replication slot stats misgivings

2021-04-20 Thread vignesh C

On Wed, Apr 21, 2021 at 9:47 AM Amit Kapila  wrote:
>
> On Wed, Apr 21, 2021 at 9:39 AM vignesh C  wrote:
> >
> > I feel we can change CATALOG_VERSION_NO so that we will get this error
> > "The database cluster was initialized with CATALOG_VERSION_NO
> > 2021X, but the server was compiled with CATALOG_VERSION_NO
> > 2021X." which will prevent the above issue.
> >
>
> Right, but we normally do that just before commit. We might want to
> mention it in the commit message just as a Note so that we don't
> forget to bump it before commit but otherwise, we don't need to change
> it in the patch.
>

Yes, that is fine with me.

Regards,
Vignesh

Re: Support tab completion for upper character inputs in psql

2021-04-20 Thread Peter Smith

On Wed, Apr 14, 2021 at 11:34 PM tanghy.f...@fujitsu.com
 wrote:
>
> On Thursday, April 8, 2021 4:14 PM, Peter Eisentraut 
>  wrote
>
> >Seeing the tests you provided, it's pretty obvious that the current
> >behavior is insufficient.  I think we could probably think of a few more
> >tests, for example exercising the "If case insensitive matching was
> >requested initially, adjust the case according to setting." case, or
> >something with quoted identifiers.
>
> Thanks for your review and suggestions on my patch.
> I've added more tests in the latest patch V5, the added tests helped me find 
> some bugs in my patch and I fixed them.
> Now the patch can support not only the SET/SHOW [PARAMETER] but also UPDATE 
> ["aTable"|ATABLE], also UPDATE atable SET ["aColumn"|ACOLUMN].
>
> I really hope someone can have more tests suggestions on my patch or kindly 
> do some tests on my patch and share me if any bugs happened.
>
> Differences from V4 are:
> * fix some bugs related to quoted identifiers.
> * add some tap tests.

I tried playing a bit with your psql patch V5 and I did not find any
problems - it seemed to work as advertised.

Below are a few code review comments.



1. Patch applies with whitespace warnings.

[postgres@CentOS7-x64 oss_postgres_2PC]$ git apply
../patches_misc/V5-0001-Support-tab-completion-with-a-query-result-for-upper.patch
../patches_misc/V5-0001-Support-tab-completion-with-a-query-result-for-upper.patch:130:
trailing whitespace.
}
warning: 1 line adds whitespace errors.



2. Unrelated "code tidy" fixes maybe should be another patch?

I noticed there are a couple of "code tidy" fixes combined with this
patch - e.g. passing fixes to some code comments and blank lines etc
(see below). Although they are all good improvements, they maybe don't
really have anything to do with your feature/bugfix so I am not sure
if they should be included here. Maybe post a separate patch for these
ones?

@@ -1028,7 +1032,7 @@ static const VersionedQuery
Query_for_list_of_subscriptions[] = {
 };

 /*
- * This is a list of all "things" in Pgsql, which can show up after CREATE or
+ * This is a list of all "things" in pgsql, which can show up after CREATE or
  * DROP; and there is also a query to get a list of them.
  */

@@ -4607,7 +4642,6 @@ complete_from_list(const char *text, int state)
  if (completion_case_sensitive)
  return pg_strdup(item);
  else
-
  /*
  * If case insensitive matching was requested initially,
  * adjust the case according to setting.
@@ -4660,7 +4694,6 @@ complete_from_const(const char *text, int state)
  if (completion_case_sensitive)
  return pg_strdup(completion_charp);
  else
-
  /*
  * If case insensitive matching was requested initially, adjust
  * the case according to setting.



3. Unnecessary NULL check?

@@ -4420,16 +4425,37 @@ _complete_from_query(const char *simple_query,
  PQclear(result);
  result = NULL;

- /* Set up suitably-escaped copies of textual inputs */
+ /* Set up suitably-escaped copies of textual inputs,
+ * then change the textual inputs to lower case.
+ */
  e_text = escape_string(text);
+ if(e_text != NULL)
+ {
+ if(e_text[0] == '"')
+ completion_case_sensitive = true;
+ else
+ e_text = pg_string_tolower(e_text);
+ }

Perhaps that check "if(e_text != NULL)" is unnecessary. That function
hardly looks capable of returning a NULL, and other callers are not
checking the return like this.



4. Memory not freed in multiple places?

@@ -4420,16 +4425,37 @@ _complete_from_query(const char *simple_query,
  PQclear(result);
  result = NULL;

- /* Set up suitably-escaped copies of textual inputs */
+ /* Set up suitably-escaped copies of textual inputs,
+ * then change the textual inputs to lower case.
+ */
  e_text = escape_string(text);
+ if(e_text != NULL)
+ {
+ if(e_text[0] == '"')
+ completion_case_sensitive = true;
+ else
+ e_text = pg_string_tolower(e_text);
+ }

  if (completion_info_charp)
+ {
  e_info_charp = escape_string(completion_info_charp);
+ if(e_info_charp[0] == '"')
+ completion_case_sensitive = true;
+ else
+ e_info_charp = pg_string_tolower(e_info_charp);
+ }
  else
  e_info_charp = NULL;

  if (completion_info_charp2)
+ {
  e_info_charp2 = escape_string(completion_info_charp2);
+ if(e_info_charp2[0] == '"')
+ completion_case_sensitive = true;
+ else
+ e_info_charp2 = pg_string_tolower(e_info_charp2);
+ }
  else
  e_info_charp2 = NULL;

The function escape_string has a comment saying "The returned value
has to be freed." but in the above code you are overwriting the
escape_string result with the strdup'ed pg_string_tolower but without
free-ing the original e_text/e_info_charp/e_info_charp2.

==

5. strncmp replacement?

@@ -4464,7 +4490,7 @@ _complete_from_query(const char *simple_query,
  */
  if (strcmp(schema_query->catname,
 "pg_catalog.pg_class c") == 0 &&
- strncmp(text, "pg_", 3) != 0)
+ strncmp(pg_string_tolower(text), "pg_", 3) != 0)
  {
  appendPQExpBufferStr(&query_buffer,
  " AND c.relnamespace <> (SELECT oid

Re: Replication slot stats misgivings

2021-04-20 Thread Amit Kapila

On Wed, Apr 21, 2021 at 9:39 AM vignesh C  wrote:
>
> I feel we can change CATALOG_VERSION_NO so that we will get this error
> "The database cluster was initialized with CATALOG_VERSION_NO
> 2021X, but the server was compiled with CATALOG_VERSION_NO
> 2021X." which will prevent the above issue.
>

Right, but we normally do that just before commit. We might want to
mention it in the commit message just as a Note so that we don't
forget to bump it before commit but otherwise, we don't need to change
it in the patch.

-- 
With Regards,
Amit Kapila.

Re: Replication slot stats misgivings

2021-04-20 Thread vignesh C

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada  wrote:
>
> On Tue, Apr 20, 2021 at 7:22 PM vignesh C  wrote:
> >
> > On Tue, Apr 20, 2021 at 9:08 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Mon, Apr 19, 2021 at 4:48 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Mon, Apr 19, 2021 at 2:14 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila 
> > > > > >  wrote:
> > > > > > >
> > > > > > >
> > > > > > > 4.
> > > > > > > +CREATE VIEW pg_stat_replication_slots AS
> > > > > > > +SELECT
> > > > > > > +s.slot_name,
> > > > > > > +s.spill_txns,
> > > > > > > +s.spill_count,
> > > > > > > +s.spill_bytes,
> > > > > > > +s.stream_txns,
> > > > > > > +s.stream_count,
> > > > > > > +s.stream_bytes,
> > > > > > > +s.total_txns,
> > > > > > > +s.total_bytes,
> > > > > > > +s.stats_reset
> > > > > > > +FROM pg_replication_slots as r,
> > > > > > > +LATERAL pg_stat_get_replication_slot(slot_name) as s
> > > > > > > +WHERE r.datoid IS NOT NULL; -- excluding physical slots
> > > > > > > ..
> > > > > > > ..
> > > > > > >
> > > > > > > -/* Get the statistics for the replication slots */
> > > > > > > +/* Get the statistics for the replication slot */
> > > > > > >  Datum
> > > > > > > -pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
> > > > > > > +pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
> > > > > > >  {
> > > > > > >  #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
> > > > > > > - ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> > > > > > > + text *slotname_text = PG_GETARG_TEXT_P(0);
> > > > > > > + NameData slotname;
> > > > > > >
> > > > > > > I think with the above changes getting all the slot stats has 
> > > > > > > become
> > > > > > > much costlier. Is there any reason why can't we get all the stats 
> > > > > > > from
> > > > > > > the new hash_table in one shot and return them to the user?
> > > > > >
> > > > > > I think the advantage of this approach would be that it can avoid
> > > > > > showing the stats for already-dropped slots. Like other statistics
> > > > > > views such as pg_stat_all_tables and pg_stat_all_functions, 
> > > > > > searching
> > > > > > the stats by the name got from pg_replication_slots can show only
> > > > > > available slot stats even if the hash table has garbage slot stats.
> > > > > >
> > > > >
> > > > > Sounds reasonable. However, if the create_slot message is missed, it
> > > > > will show an empty row for it. See below:
> > > > >
> > > > > postgres=# select slot_name, total_txns from 
> > > > > pg_stat_replication_slots;
> > > > >  slot_name | total_txns
> > > > > ---+
> > > > >  s1|  0
> > > > >  s2|  0
> > > > >|
> > > > > (3 rows)
> > > > >
> > > > > Here, I have manually via debugger skipped sending the create_slot
> > > > > message for the third slot and we are showing an empty for it. This
> > > > > won't happen for pg_stat_all_tables, as it will set 0 or other initial
> > > > > values in such a case. I think we need to address this case.
> > > >
> > > > Good catch. I think it's better to set 0 to all counters and NULL to
> > > > reset_stats.
> > > >
> > > > >
> > > > > > Given that pg_stat_replication_slots doesn’t show garbage slot stats
> > > > > > even if it has, I thought we can avoid checking those garbage stats
> > > > > > frequently. It should not essentially be a problem for the hash 
> > > > > > table
> > > > > > to have entries up to max_replication_slots regardless of live or
> > > > > > already-dropped.
> > > > > >
> > > > >
> > > > > Yeah, but I guess we still might not save much by not doing it,
> > > > > especially because for the other cases like tables/functions, we are
> > > > > doing it without any threshold limit.
> > > >
> > > > Agreed.
> > > >
> > > > I've attached the updated patch, please review it.
> > >
> > > I've attached the new version patch that fixed the compilation error
> > > reported off-line by Amit.
> >
> > Thanks for the updated patch, few comments:
>
> Thank you for the review comments.
>
> > 1) We can change "slotent = pgstat_get_replslot_entry(slotname,
> > false);" to "return pgstat_get_replslot_entry(slotname, false);" and
> > remove the slotent variable.
> >
> > +   PgStat_StatReplSlotEntry *slotent = NULL;
> > +
> > backend_read_statsfile();
> >
> > -   *nslots_p = nReplSlotStats;
> > -   return replSlotStats;
> > +   slotent = pgstat_get_replslot_entry(slotname, false);
> > +
> > +   return slotent;
>
> Fixed.
>
> >
> > 2) Should we change PGSTAT_FILE_FORMAT_ID as the statistic file format
> > has changed for replication statistics?
>
> The struct name is changed but I think the statistics file format has
> not changed by this patch. No?

I trie

Re: Replication slot stats misgivings

2021-04-20 Thread Amit Kapila

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada  wrote:
>

I have one question:

+ /*
+ * Create the replication slot stats hash table if we don't have
+ * it already.
+ */
+ if (replSlotStats == NULL)
  {
- if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
- return i; /* found */
+ HASHCTL hash_ctl;
+
+ hash_ctl.keysize = sizeof(NameData);
+ hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+ hash_ctl.hcxt = pgStatLocalContext;
+
+ replSlotStats = hash_create("Replication slots hash",
+ PGSTAT_REPLSLOT_HASH_SIZE,
+ &hash_ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
  }

It seems to me that the patch is always creating a hash table in
pgStatLocalContext?  AFAIU, we need to create it in pgStatLocalContext
when we read stats via backend_read_statsfile so that we can clear it
at the end of the transaction. The db/function stats seems to be doing
the same. Is there a reason why here we need to always create it in
pgStatLocalContext?

-- 
With Regards,
Amit Kapila.

Re: PATCH: Add GSSAPI ccache_name option to libpq

On Tue, Apr 20, 2021 at 08:44:23PM +0100, Daniel Carter wrote:
> The original motivation for investigating this was setting up a web app
> which could authenticate to a database server using a Kerberos ticket. Since
> the web framework already needs to create a connection string (with database
> name etc.) to set up the database connection, having an option here for the
> ccache location makes it much more straightforward to specify than having to
> save data out to environment variables (and makes things cleaner if there
> are potentially multiple database connections going on at once in different
> processes).
> 
> There may well be a better way of going about this -- it's just that I can't
> currently see an obvious way to get this kind of setup working using only
> the environment variable.

The environment variable bit sounds like a fair argument to me.

Please do not forget to add this patch and thread to the next commit
fest:
https://commitfest.postgresql.org/33/
You need a community account, and that's unfortunately too late for
Postgres 14, but the development of 15 will begin at the beginning of
July so it could be included there.
--
Michael


signature.asc
Description: PGP signature

Re: Table refer leak in logical replication

On Wed, Apr 21, 2021 at 11:13:06AM +0900, Amit Langote wrote:
> Agree about adding tests along these lines.  Will post in a bit.

Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Tiny update to pg_stat_statements documentation

On Wed, Apr 21, 2021 at 09:46:59AM +0800, Julien Rouhaud wrote:
> +1, it looks good to me.

Cool, thanks for confirming.  The top of the docs have IMO enough
details about the requirements around compute_query_id and third-part
modules, so we are done here.
--
Michael


signature.asc
Description: PGP signature

prerequisites of pull_up_sublinks

2021-04-20 Thread Andy Fan

Hi,

I'm reading the pull_up_sublinks, and find the below comments.

 * However, this optimization *only*
 * works at the top level of WHERE or a JOIN/ON clause, because we cannot
 * distinguish whether the ANY ought to return FALSE or NULL in cases
 * involving NULL inputs. Also, in an outer join's ON clause we can only
 * do this if the sublink is degenerate (ie, references only the nullable
 * side of the join).

I tried to write some SQLs but still can't understand the above comments.
Any
help here?

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?

On Wed, Apr 21, 2021 at 11:32:49AM +0900, Kyotaro Horiguchi wrote:
> On the other hand the patch creates a relation list just for this
> purpose, which is not needed to run VACUUM/ANALYZE, and VACUUM/ANALYE
> works well with duplicates in target relations.

Yeah, I don't think either that this is worth spending cycles on, not
to mention that the current behavior could be handy as VACUUM uses
separate transactions for each relation vacuumed if more than one
relation is listed in the set.
--
Michael


signature.asc
Description: PGP signature

RE: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-04-20 Thread tsunakawa.ta...@fujitsu.com

From: Tom Lane 
> [ raised eyebrow... ]  I find it very hard to understand why that would
> be necessary, or even a good idea.  Not least because there's no spare
> room there; you'd have to incur a substantial enlargement of the
> array to add another flag.  But also, that would indeed lock down
> the value of the parallel-safety flag, and that seems like a fairly
> bad idea.

You're right, FmgrBuiltins is already fully packed (24 bytes on 64-bit 
machines).  Enlarging the frequently accessed fmgr_builtins array may wreak 
unexpectedly large adverse effect on performance.

I wanted to check the parallel safety of functions, which various objects (data 
type, index, trigger, etc.) come down to, in FunctionCallInvoke() and other few 
places.  But maybe we skip the check for built-in functions.  That's a matter 
of where we draw a line between where we check and where we don't.


Regards
Takayuki Tsunakawa

Re: Docs: Move parallel_leader_participation GUC description under relevant category

On Wed, Apr 21, 2021 at 8:00 AM Michael Paquier  wrote:
>
> On Tue, Apr 20, 2021 at 09:16:49PM +0530, Bharath Rupireddy wrote:
> > It looks like even though the commit e5253fdc4f that added the
> > parallel_leader_participation GUC correctly categorized it as
> > RESOURCES_ASYNCHRONOUS parameter in the code, but in the docs it is kept
> > under irrelevant section i.e. "Query Planning/Other Planner Options". This
> > is reported in the bugs list [1], cc-ed the reporter.
> >
> > Attaching a small patch that moves the GUC description to the right place.
> > Thoughts?
>
> I would keep the discussion on the existing thread rather than spawn a
> new one on -hackers for exactly the same problem, so I'll reply there
> in a minute.

I thought we might miss the discussion in the hackers list. I'm sorry
for starting a new thread. I'm closing this thread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?

2021-04-20 Thread Kyotaro Horiguchi

At Wed, 21 Apr 2021 07:34:40 +0530, Bharath Rupireddy 
 wrote in 
> On Sat, Apr 10, 2021 at 8:03 PM Tom Lane  wrote:
> >
> > Bharath Rupireddy  writes:
> > > I'm reading the code for vacuum/analyze and it looks like currently we
> > > call vacuum_rel/analyze_rel for each relation specified. Which means
> > > that if a relation is specified more than once, then we simply
> > > vacuum/analyze it that many times. Do we gain any advantage by
> > > vacuuming/analyzing a relation back-to-back within a single command? I
> > > strongly feel no. I'm thinking we could do a simple optimization here,
> >
> > This really is not something to expend cycles and code complexity on.
> > If the user wrote the same table more than once, that's their choice.
> 
> Thanks! I think we could avoid extra processing costs for cases like
> VACUUM/ANALYZE foo, foo; when no explicit columns are specified. The
> avoided costs can be lock acquire, relation open, vacuum/analyze,
> relation close, starting new xact command, command counter increment
> in case of analyze etc. This can be done with a simple patch like the
> attached. When explicit columns are specified along with relations
> i.e. VACUUM/ANALYZE foo(c1), foo(c2); we don't want to do the extra
> complex processing to optimize the cases when c1 = c2.
> 
> Note that the TRUNCATE command currently skips processing repeated
> relations (see ExecuteTruncate). For example, TRUNCATE foo, foo; and
> TRUNCATE foo, ONLY foo, foo; first instance of relation foo is taken
> into consideration for processing and other relation instances
> (options specified if any) are ignored.
> 
> Thoughts?

Although I don't strongly oppose to check that, the check of truncate
is natural and required. The relation list is anyway used afterwards,
and we cannot truncate the same relation twice or more since a
relation under "use" cannot be truncated. (Truncation is one form of
use).  In short, TRUNCATE runs no checking just for the check's own
sake.

On the other hand the patch creates a relation list just for this
purpose, which is not needed to run VACUUM/ANALYZE, and VACUUM/ANALYE
works well with duplicates in target relations.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Docs: Move parallel_leader_participation GUC description under relevant category

On Tue, Apr 20, 2021 at 09:16:49PM +0530, Bharath Rupireddy wrote:
> It looks like even though the commit e5253fdc4f that added the
> parallel_leader_participation GUC correctly categorized it as
> RESOURCES_ASYNCHRONOUS parameter in the code, but in the docs it is kept
> under irrelevant section i.e. "Query Planning/Other Planner Options". This
> is reported in the bugs list [1], cc-ed the reporter.
> 
> Attaching a small patch that moves the GUC description to the right place.
> Thoughts?

I would keep the discussion on the existing thread rather than spawn a
new one on -hackers for exactly the same problem, so I'll reply there
in a minute.
--
Michael


signature.asc
Description: PGP signature

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

"tsunakawa.ta...@fujitsu.com"  writes:
> From: Tom Lane 
>> No.  You'd have to be superuser anyway to do that, and we're not in the
>> habit of trying to put training wheels on superusers.

> Understood.  However, we may add the parallel safety member in 
> fmgr_builtins[] in another thread for parallel INSERT SELECT.  I'd appreciate 
> your comment on this if you see any concern.

[ raised eyebrow... ]  I find it very hard to understand why that would
be necessary, or even a good idea.  Not least because there's no spare
room there; you'd have to incur a substantial enlargement of the
array to add another flag.  But also, that would indeed lock down
the value of the parallel-safety flag, and that seems like a fairly
bad idea.

regards, tom lane

Re: Table refer leak in logical replication

2021-04-20 Thread Amit Langote

On Wed, Apr 21, 2021 at 9:31 AM Michael Paquier  wrote:
> On Tue, Apr 20, 2021 at 06:20:03PM +0530, Amit Kapila wrote:
> > +1. I think it makes sense to add a test case especially because we
> > don't have any existing test in this area.
>
> Yes, let's add add something into 013_partition.pl within both
> subscriber1 and subscriber2.   This will not catch up the relation
> leak, but it is better to make sure that the trigger is fired as we'd
> like to expect.  This will become helpful if this code gets refactored
> or changed in the future.  What about adding an extra table inserted
> into by the trigger itself?  If I were to design that, I would insert
> the following information that gets checked by a simple psql call once
> the changes are applied in the subscriber: relation name, TG_WHEN,
> TG_OP and TG_LEVEL.  So such a table would need at least 4 columns.

Agree about adding tests along these lines.  Will post in a bit.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: terminate called after throwing an instance of 'std::bad_alloc'

2021-04-20 Thread Justin Pryzby

On Tue, Apr 20, 2021 at 06:16:28PM -0700, Andres Freund wrote:
> On 2021-04-20 20:03:13 -0500, Justin Pryzby wrote:
> > That's a query over a 2 day period (midnight to midnight+2), so it's not 
> > hard
> > for me to believe it sometimes exceeds 100k cost units 
> > (jit_inline_above_cost),
> > depending on where we are in that interval, and on planner statistics.  It
> > might've been different when I first reported the problem, too.
> > 
> > I'm not sure what you mean about expensive enough to be jited ?
> > I think our custom functions don't have a COST set.
> 
> What get's JITed is queries. Unless your functions are fully inlinable
> SQL functions that means that query executions in functions are
> separately JITed. And need to separately pass the cost minimums.

I think the queries our plpgsql functions make wouldn't be expensive - mostly
lookups by primary key.  And any functions called by the queries within the
plpgsql function would not have COST set.

What do you think of the test case I sent?
| SELECT fn() FROM generate_series

If the JIT contexts can't be reset for this query, I assume the same reason
applies to our more complicated query.

> Note that this isn't about how many plpsql functions or such are called,
> but about how many JITed functions are generated. Typically there's one
> function per expression, and one for each table accessed.

Thanks for the explanations.  Note that in our config I have set
jit_tuple_deforming=off, so I think the table accesses don't use functions,
right?

Oh...that appears to be relevant somehow:

$ PGOPTIONS='-c client_min_messages=debug -c log_executor_stats=on -c jit=on -c 
jit_above_cost=0 -c jit_inline_above_cost=0' psql ts -c "explain analyze SELECT 
fn() FROM generate_series(1,999)"
!   69872 kB max resident size

$ PGOPTIONS='-c client_min_messages=debug -c log_executor_stats=on -c jit=on -c 
jit_above_cost=0 -c jit_inline_above_cost=0 -c jit_tuple_deforming=on' psql ts 
-c "explain analyze SELECT fn() FROM generate_series(1,999)"
!   36140 kB max resident size


-- 
Justin

Re: Stale description for pg_basebackup

2021-04-20 Thread Kyotaro Horiguchi

At Wed, 21 Apr 2021 10:43:30 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Tue, 20 Apr 2021 13:32:35 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > Hello.
> > 
> > It seems to me that there's a stale description in the documentation
> > of pg_basebackup.
> > 
> > https://www.postgresql.org/docs/13/app-pgbasebackup.html
> > 
> > > Note that there are some limitations in taking a backup from a standby:
> > ...
> > > If you are using -X none, there is no guarantee that all WAL files
> > > required for the backup are archived at the end of backup.
> > 
> > Actually, pg_basebackup waits for the all required files to be
> > archived, which is an established behavior by commit
> > 52f8a59dd9@PG10. However, the same commit seems to have forgot to
> > change the doc for pg_basebackup. (The current description is
> > introduced by 9a4d51077c@PG10)
> > 
> > The attached is a proposal to rewrite it as the following.
> > 
> > + If you are using -X none, pg_basebackup may wait for a long time for
> > + all the required WAL files to be archived. In that case, You may need
> > + to call pg_switch_wal() on the primary to complete it sooner.
> 
> I forgot to preserve the description about *primary*. It should be as
> the following instead.
> 
> + If you are using -X none, there is no guarantee on the primary that
> + all WAL files required for the backup are archived at the end of
> + backup. When the standby is configured as archive_mode=always,
> + pg_basebackup may wait for a long time for all the required WAL files
> + to be archived. In that case, You may need to call pg_switch_wal() on
> + the primary to complete it sooner.

Hmm. Some words need to be qualified. Attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index a3e292d44a..15e48f5a8e 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -83,8 +83,14 @@ PostgreSQL documentation
 
 
  
-  If you are using -X none, there is no guarantee that all
-  WAL files required for the backup are archived at the end of backup.
+   If you are using -X none, there is no guarantee on
+   the primary that all WAL files required for the backup are archived at
+   the end of backup. When archive_mode is set
+   to on on the
+   standby, pg_basebackup may wait for a long
+   time for all the required WAL files to be archived.  In that case, You
+   may need to call pg_switch_wal() on the primary to
+   complete it sooner.

Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?

On Sat, Apr 10, 2021 at 8:03 PM Tom Lane  wrote:
>
> Bharath Rupireddy  writes:
> > I'm reading the code for vacuum/analyze and it looks like currently we
> > call vacuum_rel/analyze_rel for each relation specified. Which means
> > that if a relation is specified more than once, then we simply
> > vacuum/analyze it that many times. Do we gain any advantage by
> > vacuuming/analyzing a relation back-to-back within a single command? I
> > strongly feel no. I'm thinking we could do a simple optimization here,
>
> This really is not something to expend cycles and code complexity on.
> If the user wrote the same table more than once, that's their choice.

Thanks! I think we could avoid extra processing costs for cases like
VACUUM/ANALYZE foo, foo; when no explicit columns are specified. The
avoided costs can be lock acquire, relation open, vacuum/analyze,
relation close, starting new xact command, command counter increment
in case of analyze etc. This can be done with a simple patch like the
attached. When explicit columns are specified along with relations
i.e. VACUUM/ANALYZE foo(c1), foo(c2); we don't want to do the extra
complex processing to optimize the cases when c1 = c2.

Note that the TRUNCATE command currently skips processing repeated
relations (see ExecuteTruncate). For example, TRUNCATE foo, foo; and
TRUNCATE foo, ONLY foo, foo; first instance of relation foo is taken
into consideration for processing and other relation instances
(options specified if any) are ignored.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

v1-0001-Skip-VACUUM-ANALYZE-of-repeated-relations.patch
Description: Binary data

non-blocking delayChkpt

Hi,

During commits, and some other places, there's a short phase at which we
block checkpoints from starting:

/*
 * Mark ourselves as within our "commit critical section".  This
 * forces any concurrent checkpoint to wait until we've updated
 * pg_xact.  Without this, it is possible for the checkpoint to 
set
 * REDO after the XLOG record but fail to flush the pg_xact 
update to
 * disk, leading to loss of the transaction commit if the system
 * crashes a little later.

One problem in the shared memory stats patch was that, to get rid of the
O(N) cost of pgstat_vacuum_stat(), commits/aborts should inform which
stats they drop.

Because we wouldn't do the dropping of stats as part of
RecordTransactionCommit()'s critical section, that would have the danger
of the stats dropping not being executed if we crash after WAL logging
the commit record, but before dropping the stats.

It's worthwhile to note that currently dropping of relfilenodes (e.g. a
committing DROP TABLE or an aborting CREATE TABLE) has the same issue.


An obvious way to address that would be to set delayChkpt not just for
part of RecordTransactionCommit()/Abort(), but also during the
relfilenode/stats dropping. But obviously that'd make it much more
likely that we'd actually prevent checkpoints from starting for a
significant amount of time.

Which lead me to wonder why we need to *block* when starting a
checkpoint, waiting for a moment in which there are no concurrent
commits?

I think we could replace the boolean per-backend delayChkpt with
per-backend LSNs that indicate an LSN that for the backend won't cause
recovery issues. For commits this LSN could e.g. be the current WAL
insert location, just before the XLogInsert() (but I think we could
optimize that a bit, but that's details).  CreateCheckPoint() would then
not loop over HaveVirtualXIDsDelayingChkpt() before completing a
checkpoint, but instead compute the oldest LSN that any backend needs to
be included in the checkpoint.

Moving the redo pointer to before where any backend is in a commit
critical section seems to provide sufficient (and I think sometimes
stronger) protection against the hazards that delayChkpt aims to
prevent? And it could do so without blocking.


I think the blocking by delayChkpt is already an issue in some busy
workloads, although it's hard to tell how much outside of artificial
workloads against modified versions of PG, given that we don't expose
such waits anywhere.  Particularly that we now set delayChkpt in
MarkBufferDirtyHint() seems to make that a lot more likely.


Does this seem like a viable idea, or did I entirely miss the boat?

Greetings,

Andres Freund

RE: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-04-20 Thread tsunakawa.ta...@fujitsu.com

From: Tom Lane 
> Bharath Rupireddy  writes:
> > IMO, the reason for not checking the parallel safety of the support
> > functions is that the functions themselves can have whole lot of other
> > functions (can be nested as well) which might be quite hard to check
> > at the planning time. That is why the job of marking an aggregate as
> > parallel safe is best left to the user.
> 
> Yes.  I think the documentation is perfectly clear that this is
> intentional; I don't see a need to change it.

OK, that's what I expected.  I understood from this that the Postgres's stance 
toward parallel safety is that Postgres does its best effort to check parallel 
safety (as far as it doesn't hurt performance much, and perhaps the core code 
doesn't get very complex), and the user should be responsible for the actual 
parallel safety of ancillary objects (in this case, support functions for an 
aggregate) of the target object that he/she marked as parallel safe.

> >> Should we add a member for parallel safety in fmgr_builtins[], and disallow
> ALTER FUNCTION to change the parallel safety of builtin UDFs?
> 
> No.  You'd have to be superuser anyway to do that, and we're not in the
> habit of trying to put training wheels on superusers.

Understood.  However, we may add the parallel safety member in fmgr_builtins[] 
in another thread for parallel INSERT SELECT.  I'd appreciate your comment on 
this if you see any concern.

> Don't have an opinion about the other points yet.

I'd like to have your comments on them, too.  But I understand you must be so 
busy at least until the beta release of PG 14.

Regards
Takayuki Tsunakawa

Re: An omission of automatic completion in tab-complete.c

On Tue, Apr 20, 2021 at 11:35:13AM +0300, Aleksander Alekseev wrote:
> I invested some time in checking this patch. It passes make
> check-world / make installcheck-world and adds CURRENT_ROLE to the
> automatic completion.

Thanks Aleksander and Wei.  Applied.
--
Michael


signature.asc
Description: PGP signature

Re: Tiny update to pg_stat_statements documentation

2021-04-20 Thread Julien Rouhaud

On Wed, Apr 21, 2021 at 10:27:43AM +0900, Michael Paquier wrote:
> On Wed, Apr 21, 2021 at 11:10:36AM +1000, Greg Nancarrow wrote:
> > However, I don't think the additional comment is really warranted
> > here, as the other typical usage settings are not commented, and all
> > settings are explained in the surrounding documentation.
> 
> Good catch, Greg.

Agreed!

> I agree to keep things simple and just do what you
> are suggesting here.

+1, it looks good to me.

Re: Stale description for pg_basebackup

2021-04-20 Thread Kyotaro Horiguchi

At Tue, 20 Apr 2021 13:32:35 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Hello.
> 
> It seems to me that there's a stale description in the documentation
> of pg_basebackup.
> 
> https://www.postgresql.org/docs/13/app-pgbasebackup.html
> 
> > Note that there are some limitations in taking a backup from a standby:
> ...
> > If you are using -X none, there is no guarantee that all WAL files
> > required for the backup are archived at the end of backup.
> 
> Actually, pg_basebackup waits for the all required files to be
> archived, which is an established behavior by commit
> 52f8a59dd9@PG10. However, the same commit seems to have forgot to
> change the doc for pg_basebackup. (The current description is
> introduced by 9a4d51077c@PG10)
> 
> The attached is a proposal to rewrite it as the following.
> 
> + If you are using -X none, pg_basebackup may wait for a long time for
> + all the required WAL files to be archived. In that case, You may need
> + to call pg_switch_wal() on the primary to complete it sooner.

I forgot to preserve the description about *primary*. It should be as
the following instead.

+ If you are using -X none, there is no guarantee on the primary that
+ all WAL files required for the backup are archived at the end of
+ backup. When the standby is configured as archive_mode=always,
+ pg_basebackup may wait for a long time for all the required WAL files
+ to be archived. In that case, You may need to call pg_switch_wal() on
+ the primary to complete it sooner.

Attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index a3e292d44a..35d7fb605f 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -83,8 +83,13 @@ PostgreSQL documentation
 
 
  
-  If you are using -X none, there is no guarantee that all
-  WAL files required for the backup are archived at the end of backup.
+   If you are using -X none, there is no guarantee on
+   the primary that all WAL files required for the backup are archived at
+   the end of backup. When the standby is configured as
+   archive_mode=always, pg_basebackup may wait for a long time for all the
+   required WAL files to be archived.  In that case, You may need to
+   call pg_switch_wal() on the primary to complete it
+   sooner.

Re: Tiny update to pg_stat_statements documentation

On Wed, Apr 21, 2021 at 6:40 AM Greg Nancarrow  wrote:
>
> On Tue, Apr 20, 2021 at 8:06 PM Bharath Rupireddy
>  wrote:
> > > I've attached a patch for this.
> >
> > +1. How about mentioning something like below?
> >
> > +compute_query_id = on # when in-core query identifier computation is
> > desired, otherwise off.
> >
>
> Hmm, I think that comment is perhaps slightly misleading, as
> compute_query_id wouldn't be set to "off" in settings for "typical
> usage".
> Just saying "use in-core query identifier computation" would be a
> better comment.
> However, I don't think the additional comment is really warranted
> here, as the other typical usage settings are not commented, and all
> settings are explained in the surrounding documentation.

Thanks Greg! I agree with you and withdraw my point.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Tiny update to pg_stat_statements documentation

On Wed, Apr 21, 2021 at 11:10:36AM +1000, Greg Nancarrow wrote:
> However, I don't think the additional comment is really warranted
> here, as the other typical usage settings are not commented, and all
> settings are explained in the surrounding documentation.

Good catch, Greg.  I agree to keep things simple and just do what you
are suggesting here.
--
Michael


signature.asc
Description: PGP signature

Re: `make check` doesn't pass on MacOS Catalina

2021-04-20 Thread Xing GUO

On 4/21/21, Tom Lane  wrote:
> Xing GUO  writes:
>> Thank you very much. I'm facing the same problem yesterday. May I
>> suggest that document it in the installation guide on MacOS platform?
>
> It is documented --- see last para under
>
> https://www.postgresql.org/docs/current/installation-platform-notes.html#INSTALLATION-NOTES-MACOS

Thank you! Sorry for my carelessness...

>
>   regards, tom lane
>


-- 
Best Regards,
Xing

Re: terminate called after throwing an instance of 'std::bad_alloc'

Hi,

On 2021-04-20 20:03:13 -0500, Justin Pryzby wrote:
> That's a query over a 2 day period (midnight to midnight+2), so it's not hard
> for me to believe it sometimes exceeds 100k cost units 
> (jit_inline_above_cost),
> depending on where we are in that interval, and on planner statistics.  It
> might've been different when I first reported the problem, too.
> 
> I'm not sure what you mean about expensive enough to be jited ?
> I think our custom functions don't have a COST set.

What get's JITed is queries. Unless your functions are fully inlinable
SQL functions that means that query executions in functions are
separately JITed. And need to separately pass the cost minimums.

> explain analyze is showing:
> 
> JIT:
>   Functions: 201

Note that this isn't about how many plpsql functions or such are called,
but about how many JITed functions are generated. Typically there's one
function per expression, and one for each table accessed.

Greetings,

Andres Freund

Re: `make check` doesn't pass on MacOS Catalina

Xing GUO  writes:
> Thank you very much. I'm facing the same problem yesterday. May I
> suggest that document it in the installation guide on MacOS platform?

It is documented --- see last para under

https://www.postgresql.org/docs/current/installation-platform-notes.html#INSTALLATION-NOTES-MACOS

regards, tom lane

Re: Tiny update to pg_stat_statements documentation

2021-04-20 Thread Greg Nancarrow

On Tue, Apr 20, 2021 at 8:06 PM Bharath Rupireddy
 wrote:
> > I've attached a patch for this.
>
> +1. How about mentioning something like below?
>
> +compute_query_id = on # when in-core query identifier computation is
> desired, otherwise off.
>

Hmm, I think that comment is perhaps slightly misleading, as
compute_query_id wouldn't be set to "off" in settings for "typical
usage".
Just saying "use in-core query identifier computation" would be a
better comment.
However, I don't think the additional comment is really warranted
here, as the other typical usage settings are not commented, and all
settings are explained in the surrounding documentation.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: terminate called after throwing an instance of 'std::bad_alloc'

2021-04-20 Thread Justin Pryzby

On Tue, Apr 20, 2021 at 05:20:56PM -0700, Andres Freund wrote:
> On 2021-04-20 00:58:21 -0500, Justin Pryzby wrote:
> > On Tue, Apr 20, 2021 at 12:38:26AM -0500, Justin Pryzby wrote:
> > > I don't know if this is related to the other issues, but this seems leaky.
> > 
> > And it explains how the context use counter can exceed its threshold.
> > 
> > create or replace function fn() returns void language plpgsql as $$ declare 
> > rec int; begin SELECT relpages INTO rec FROM pg_class LIMIT 1; end $$;
> Right - at the moment the context can only be recreated when there's no
> JITed queries ongoing. That's why I asked whether your "real" query
> contains function calls, 

The real query has a bunch of function calls, including aggregates and custom
plpgsql functions.  I reduced the whole thing to the above.

ts=# explain analyze SELECT fn() FROM generate_series(1,999);
!   71692 kB max resident size

ts=# explain analyze SELECT fn() FROM generate_series(1,);
!   332852 kB max resident size

Which doesn't seem to leak if I "SELECT 1 INTO rec" instead of "SELECT relpages
INTO rec".

> and whether those functions are expensive enough to be JITed.

The real query has:
| Append  (cost=44.54..192946.35 rows=79 width=741)

That's a query over a 2 day period (midnight to midnight+2), so it's not hard
for me to believe it sometimes exceeds 100k cost units (jit_inline_above_cost),
depending on where we are in that interval, and on planner statistics.  It
might've been different when I first reported the problem, too.

I'm not sure what you mean about expensive enough to be jited ?
I think our custom functions don't have a COST set.

explain analyze is showing:

JIT:
  Functions: 201

-- 
Justin

Re: `make check` doesn't pass on MacOS Catalina

2021-04-20 Thread Xing GUO

Hi hackers,

Thank you very much. I'm facing the same problem yesterday. May I
suggest that document it in the installation guide on MacOS platform?

On 4/21/21, Andrew Dunstan  wrote:
>
> On 4/20/21 11:02 AM, Tom Lane wrote:
>> Aleksander Alekseev  writes:
>>> While trying to build PostgreSQL from source (master branch, 95c3a195) on
>>> a
>>> MacBook I discovered that `make check` fails:
>> This is the usual symptom of not having disabled SIP :-(.
>>
>> If you don't want to do that, do "make install" before "make check".
>>
>>  
>
>
>
>
> FYI the buildfarm client has a '--delay-check' option that does exactly
> this. It's useful on Alpine Linux as well as MacOS
>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
>
>


-- 
Best Regards,
Xing

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-20 Thread Alvaro Herrera

Actually I had a silly bug in the version that attempted to cache a
partdesc that omits detached partitions.  This one, while not fully
baked, seems to work correctly (on top of the previous one).

The thing that I don't fully understand is why we have to require to
have built the regular one first.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"This is what I like so much about PostgreSQL.  Most of the surprises
are of the "oh wow!  That's cool" Not the "oh shit!" kind.  :)"
Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
>From b45e9db9cd08282f7a7fabcef944880cc3a3d415 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 20 Apr 2021 20:42:30 -0400
Subject: [PATCH] separately cache a partdesc that omits detached partitions,
 too

---
 src/backend/partitioning/partdesc.c | 21 +++--
 src/backend/utils/cache/relcache.c  |  4 
 src/include/utils/rel.h |  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 7d019a72be..f245cbd664 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -67,10 +67,25 @@ RelationGetPartitionDesc(Relation rel, bool include_detached)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
+	/*
+	 * The common case is that there are no partitions pending detach and
+	 * that the appropriate descriptor has already been built; use that.
+	 */
 	if (likely(rel->rd_partdesc &&
 			   (!rel->rd_partdesc->detached_exist || include_detached)))
 		return rel->rd_partdesc;
 
+	/*
+	 * If we're asked for a descriptor with detached partitions omitted, and
+	 * we have already built it, return that.
+	 */
+	if (!include_detached &&
+		rel->rd_partdesc &&
+		rel->rd_partdesc->detached_exist &&
+		rel->rd_partdesc_nodetached)
+		return rel->rd_partdesc_nodetached;
+
+	/* Otherwise build one afresh and store it for next time */
 	return RelationBuildPartitionDesc(rel, include_detached);
 }
 
@@ -278,9 +293,11 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached)
 		MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt);
 	rel->rd_pdcxt = new_pdcxt;
 
-	/* Store it into relcache, but only if no detached partitions exist */
-	if (!detached_exist)
+	/* Store it into the appropriate member of the relcache entry */
+	if (!detached_exist || include_detached)
 		rel->rd_partdesc = partdesc;
+	else
+		rel->rd_partdesc_nodetached = partdesc;
 
 	return partdesc;
 }
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6eab..87a7613584 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1157,6 +1157,7 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 	relation->rd_partkey = NULL;
 	relation->rd_partkeycxt = NULL;
 	relation->rd_partdesc = NULL;
+	relation->rd_partdesc_nodetached = NULL;
 	relation->rd_pdcxt = NULL;
 	relation->rd_partcheck = NIL;
 	relation->rd_partcheckvalid = false;
@@ -2672,12 +2673,14 @@ RelationClearRelation(Relation relation, bool rebuild)
 			 * newrel.
 			 */
 			relation->rd_partdesc = NULL;	/* ensure rd_partdesc is invalid */
+			relation->rd_partdesc_nodetached = NULL;
 			if (relation->rd_pdcxt != NULL) /* probably never happens */
 MemoryContextSetParent(newrel->rd_pdcxt, relation->rd_pdcxt);
 			else
 relation->rd_pdcxt = newrel->rd_pdcxt;
 			/* drop newrel's pointers so we don't destroy it below */
 			newrel->rd_partdesc = NULL;
+			newrel->rd_partdesc_nodetached = NULL;
 			newrel->rd_pdcxt = NULL;
 		}
 
@@ -5942,6 +5945,7 @@ load_relcache_init_file(bool shared)
 		rel->rd_partkey = NULL;
 		rel->rd_partkeycxt = NULL;
 		rel->rd_partdesc = NULL;
+		rel->rd_partdesc_nodetached = NULL;
 		rel->rd_pdcxt = NULL;
 		rel->rd_partcheck = NIL;
 		rel->rd_partcheckvalid = false;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 34e25eb597..3d480a3ab2 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -127,6 +127,7 @@ typedef struct RelationData
 
 	/* data managed by RelationGetPartitionDesc: */
 	PartitionDesc rd_partdesc;	/* partition descriptor, or NULL */
+	PartitionDesc rd_partdesc_nodetached;	/* partition descriptor omitting detached partitions */
 	MemoryContext rd_pdcxt;		/* private context for rd_partdesc, if any */
 
 	/* data managed by RelationGetPartitionQual: */
-- 
2.20.1

Re: Table refer leak in logical replication

On Tue, Apr 20, 2021 at 06:20:03PM +0530, Amit Kapila wrote:
> +1. I think it makes sense to add a test case especially because we
> don't have any existing test in this area.

Yes, let's add add something into 013_partition.pl within both
subscriber1 and subscriber2.   This will not catch up the relation
leak, but it is better to make sure that the trigger is fired as we'd
like to expect.  This will become helpful if this code gets refactored
or changed in the future.  What about adding an extra table inserted
into by the trigger itself?  If I were to design that, I would insert
the following information that gets checked by a simple psql call once
the changes are applied in the subscriber: relation name, TG_WHEN,
TG_OP and TG_LEVEL.  So such a table would need at least 4 columns. 
--
Michael


signature.asc
Description: PGP signature

Re: Privilege boundary between sysadmin and database superuser [Was: Re: pg_amcheck option to install extension]

2021-04-20 Thread Mark Dilger




> On Apr 20, 2021, at 3:19 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>>> On Apr 20, 2021, at 5:54 AM, Robert Haas  wrote:
>>> On Tue, Apr 20, 2021 at 1:31 AM Mark Dilger
>>>  wrote:
 I think you are conflating the concept of an operating system adminstrator 
 with the concept of the database superuser/owner.
> 
>>> You should conflate those things, because there's no meaningful
>>> privilege boundary between them:
> 
>> I understand why you say so, but I think the situation is more nuanced than 
>> that.
> 
> Maybe I too am confused, but I understand "operating system administrator"
> to mean "somebody who has root, or some elevated OS privilege level, on
> the box running Postgres".  That is 100% distinct from the operating
> system user that runs Postgres, which should generally be a bog-standard
> OS user.  (In fact, we try to prevent you from running Postgres as root.)
> 
> What there is not a meaningful privilege boundary between is that standard
> OS user and a within-the-database superuser.  Since we allow superusers to
> trigger file reads and writes, and indeed execute programs, from within
> the DB, a superuser can surely reach anything the OS user can do.

Right.  This is the part that Alice might want to restrict, and we don't have 
an easy way for her to do so.

> The rest of your analysis seems a bit off-point to me, which is what
> makes me think that one of us is confused.  If Alice is storing her
> data in a Postgres database, she had better trust both the Postgres
> superuser and the box's administrators ... otherwise, she should go
> get her own box and her own Postgres installation.

It is the other way around.  Alice is the operating system administrator who 
doesn't trust Bob.  She wants Bob to be able to do any database thing he wants 
within the PostgreSQL environment, but doesn't want that to leak out as an 
ability to run arbitrary stuff on the system, not even if it's just stuff 
running as bog-standard user "postgres".  In my view, Alice can accomplish this 
goal using a very tightly designed container, but it is far from easy to do and 
to get right. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: terminate called after throwing an instance of 'std::bad_alloc'

Hi,

On 2021-04-20 00:58:21 -0500, Justin Pryzby wrote:
> On Tue, Apr 20, 2021 at 12:38:26AM -0500, Justin Pryzby wrote:
> > I don't know if this is related to the other issues, but this seems leaky.
> 
> And it explains how the context use counter can exceed its threshold.
> 
> create or replace function fn() returns void language plpgsql as $$ declare 
> rec int; begin SELECT relpages INTO rec FROM pg_class LIMIT 1; end $$;
> explain analyze
> SELECT fn()
> FROM generate_series(1,99);
> 
> SELECT SUM(a) FROM (VALUES(1))a(a);
> 
> time PGOPTIONS='-cclient_min_messages=debug -clog_executor_stats=off 
> -clog_min_duration_statement=-1 -cjit=on -cjit_above_cost=0 
> -cjit_inline_above_cost=0' psql ts -f jitleak.sql
> ...
> psql:jitleak.sql:6: DEBUG:  recreating LLVM context after 100 uses

Right - at the moment the context can only be recreated when there's no
JITed queries ongoing. That's why I asked whether your "real" query
contains function calls, and whether those functions are expensive
enough to be JITed.


> Question: does the jit context need to be recreated only when inlining is
> enabled?  Or maybe it's better if it's not conditionalized like that..

It'd be sufficient to do it when doing inlining, but currently that's
not tracked separately.

Greetings,

Andres Freund

Re: Free port choosing freezes when PostgresNode::use_tcp is used on BSD systems

2021-04-20 Thread Alexey Kondratov


On 2021-04-20 18:03, Tom Lane wrote:

Andrew Dunstan  writes:

On 4/19/21 7:22 PM, Tom Lane wrote:

I wonder whether we could get away with just replacing the $use_tcp
test with $TestLib::windows_os.  It's not really apparent to me
why we should care about 127.0.0.not-1 on Unix-oid systems.



Yeah
The comment is a bit strange anyway - Cygwin is actually going to use
Unix sockets, not TCP.
I think I would just change the test to this: $use_tcp &&
$TestLib::windows_os.


Works for me, but we need to revise the comment to match.



Then it could be somewhat like that, I guess.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Companydiff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index db47a97d196..f7b488ed464 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1191,19 +1191,19 @@ sub get_free_port
 		# Check to see if anything else is listening on this TCP port.
 		# Seek a port available for all possible listen_addresses values,
 		# so callers can harness this port for the widest range of purposes.
-		# The 0.0.0.0 test achieves that for post-2006 Cygwin, which
-		# automatically sets SO_EXCLUSIVEADDRUSE.  The same holds for MSYS (a
-		# Cygwin fork).  Testing 0.0.0.0 is insufficient for Windows native
-		# Perl (https://stackoverflow.com/a/14388707), so we also test
-		# individual addresses.
+		# The 0.0.0.0 test achieves that for MSYS, which automatically sets
+		# SO_EXCLUSIVEADDRUSE.  Testing 0.0.0.0 is insufficient for Windows
+		# native Perl (https://stackoverflow.com/a/14388707), so we also
+		# have to test individual addresses.  Doing that for 127.0.0/24
+		# addresses other than 127.0.0.1 might fail with EADDRNOTAVAIL on
+		# non-Linux, non-Windows kernels.
 		#
-		# On non-Linux, non-Windows kernels, binding to 127.0.0/24 addresses
-		# other than 127.0.0.1 might fail with EADDRNOTAVAIL.  Binding to
-		# 0.0.0.0 is unnecessary on non-Windows systems.
+		# That way, 0.0.0.0 and individual 127.0.0/24 addresses are tested
+		# only on Windows when TCP usage is requested.
 		if ($found == 1)
 		{
 			foreach my $addr (qw(127.0.0.1),
-$use_tcp ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ())
+$use_tcp && $TestLib::windows_os ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ())
 			{
 if (!can_bind($addr, $port))
 {

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-20 Thread Alvaro Herrera

OK so after mulling this over for a long time, here's a patch for this.
It solves the problem by making the partition descriptor no longer be
cached if a detached partition is omitted.  Any transaction that needs a
partition descriptor that excludes detached partitions, will have to
recreate the partdesc from scratch.  To support this, I changed the
output boolean semantics: instead of "does this partdesc include an
detached partitions" as in your patch, it now is "are there any detached
partitions".  But whenever no detached partitions exist, or when all
partitions including detached are requested, then the cached descriptor
is returned (because that necessarily has to be correct).  The main
difference this has to your patch is that we always keep the descriptor
in the cache and don't rebuild anything, unless a detached partition is
present.

I flipped the find_inheritance_children() input boolean, from
"include_detached" to "omit_detached".  This is more natural, given the
internal behavior.  You could argue to propagate that naming change to
the partdesc.h API and PartitionDirectory, but I don't think there's a
need for that.

I ran all the detach-partition-concurrently tests under
debug_invalidate_system_caches_always=1 and everything passes.

I experimented with keeping a separate cached partition descriptor that
omits the detached partition, but that brings back some trouble; I
couldn't find a way to invalidate such a cached entry in a reasonable
way.  I have the patch for that, if somebody wants to play with it.

-- 
Álvaro Herrera   Valdivia, Chile
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)
>From f95149f0fee451b8a2b49861dc813aeddf384f8f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 20 Apr 2021 18:01:18 -0400
Subject: [PATCH] Fix relcache hazard with detached partitions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During queries run by ri_triggers.c queries, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition.  Which is bogus: once the detach operation
completes, the row becomes an orphan.

However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query.  This commit changes the
caching code so that the partition descriptors we keep in relcache are
only those that do not contain any detached partition, *or* those that
are requested to contain all partitions.  When a partdesc-without-
detached-partitions is requested, we create one afresh each time.

This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus.  This commit also corrects that expected output.

Author: Amit Langote 
Author: Álvaro Herrera 
Discussion: https://postgr.es/m/3269784.1617215...@sss.pgh.pa.us
---
 src/backend/catalog/pg_inherits.c | 64 +++
 src/backend/commands/tablecmds.c  | 22 +++
 src/backend/commands/trigger.c|  4 +-
 src/backend/executor/execPartition.c  | 15 ++---
 src/backend/partitioning/partdesc.c   | 30 +
 src/include/catalog/pg_inherits.h |  4 +-
 src/include/partitioning/partdesc.h   | 10 ++-
 .../detach-partition-concurrently-4.out   |  1 +
 8 files changed, 88 insertions(+), 62 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index bb8b2249b1..bb6b1686ee 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,13 +52,19 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
- * include_detached says to include all partitions, even if they're marked
- * detached.  Passing it as false means they might or might not be included,
- * depending on the visibility of the pg_inherits row for the active snapshot.
+ * If a partition's pg_inherits row is marked "detach pending",
+ * *detached_exist (if not null) is set true, otherwise it is set false.
+ *
+ * If omit_detached is true and there is an active snapshot (not the same as
+ * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
+ * marked "detach pending" is visible to that snapshot, then that partition is
+ * omitted from the output list.  This makes partitions invisible depending on
+ * whether the transaction that marked those partitions as detached appears
+ * committed to the active snapshot.
  */

Re: Privilege boundary between sysadmin and database superuser [Was: Re: pg_amcheck option to install extension]

Mark Dilger  writes:
>> On Apr 20, 2021, at 5:54 AM, Robert Haas  wrote:
>> On Tue, Apr 20, 2021 at 1:31 AM Mark Dilger
>>  wrote:
>>> I think you are conflating the concept of an operating system adminstrator 
>>> with the concept of the database superuser/owner.

>> You should conflate those things, because there's no meaningful
>> privilege boundary between them:

> I understand why you say so, but I think the situation is more nuanced than 
> that.

Maybe I too am confused, but I understand "operating system administrator"
to mean "somebody who has root, or some elevated OS privilege level, on
the box running Postgres".  That is 100% distinct from the operating
system user that runs Postgres, which should generally be a bog-standard
OS user.  (In fact, we try to prevent you from running Postgres as root.)

What there is not a meaningful privilege boundary between is that standard
OS user and a within-the-database superuser.  Since we allow superusers to
trigger file reads and writes, and indeed execute programs, from within
the DB, a superuser can surely reach anything the OS user can do.

The rest of your analysis seems a bit off-point to me, which is what
makes me think that one of us is confused.  If Alice is storing her
data in a Postgres database, she had better trust both the Postgres
superuser and the box's administrators ... otherwise, she should go
get her own box and her own Postgres installation.

regards, tom lane

Privilege boundary between sysadmin and database superuser [Was: Re: pg_amcheck option to install extension]

2021-04-20 Thread Mark Dilger

> On Apr 20, 2021, at 5:54 AM, Robert Haas  wrote:
> 
> On Tue, Apr 20, 2021 at 1:31 AM Mark Dilger
>  wrote:
>> I think you are conflating the concept of an operating system adminstrator 
>> with the concept of the database superuser/owner.
> 
> You should conflate those things, because there's no meaningful
> privilege boundary between them:

I understand why you say so, but I think the situation is more nuanced than 
that.

> http://rhaas.blogspot.com/2020/12/cve-2019-9193.html
> 
> If reading the whole thing is too much, scroll down to the part in
> fixed-width font and behold me trivially compromising the OS account
> using plperlu.

I think the question here is whether PostgreSQL is inherently insecure, meaning 
that it cannot function unless installed in a way that would allow the database 
superuser Bob to compromise the OS administered by Alice.

Magnus seems to object even to this formulation in his blog post, 
https://blog.hagander.net/when-a-vulnerability-is-not-a-vulnerability-244/, 
saying "a common setup is to only allow the postgres OS user itself to act as 
superuser, in which case there is no escalation at all."  He seems to view Bob 
taking over the OS account as nothing more than Alice taking over her own 
account, since nobody but Alice should ever be able to log in as Bob.  At a 
minimum, I think that means that Alice must trust PostgreSQL to contain zero 
exploits.  If database user Charlie can escalate his privileges to the level of 
Bob, then Alice has a big problem.  Assuming Alice is an average prudent system 
administrator, she doesn't really want to trust that PostgreSQL is completely 
exploit free.  She just wants to quarantine it enough that she can sleep at 
night.

I think we have made the situation for Alice a bit difficult.  She needs to 
make sure that whichever user the backend runs as does not have permission to 
access anything beyond the PGDATA directory and a handful of postgres binaries, 
otherwise Bob, and perhaps Charlie, can access them.  She can do this most 
easily with containers, or at least it seems so to me.  The only binaries that 
should be executable from within the container are "postgres", "locale", and 
whichever hardened archive command, recovery command, and restore command Alice 
wants to allow.  The only shell that should be executable from within the 
container should be minimal, maybe something custom written by Alice that only 
works to recognize the very limited set of commands Alice wants to allow and 
then forks/execs those commands without allowing any further shell magic.  
"Copy to program" and "copy from program" internally call popen, which calls 
the shell, and if Alice's custom shell doesn't offer to pipe anything to the 
target program, Bob can't really do anything that way.  "locale -a" doesn't 
seem particularly vulnerable to being fed garbage, and in any event, Alice's 
custom shell doesn't have to implement the pipe stream logic in that direction. 
 She could make it unidirectional from `locale -a` back to postgres.  The 
archive, recovery, and restore commands are internally invoked using system() 
which calls those commands indirectly using Alice's shell.  Once again, she 
could write the shell to not pipe anything in either direction, which pretty 
well prevents Bob from doing anything malicious with them.

Reading and writing postgresql data files seems a much trickier problem.  The 
"copy to file" and "copy from file" implementations don't go through the shell, 
and Alice can't deny the database reading or writing the data directory, so 
there doesn't seem to be any quarantine trick that will work.  Bob can copy 
arbitrary malicious content to or from that directory.  I don't see how this 
gets Bob any closer to compromising the OS account, though.  All Bob is doing 
is messing up his own database.  Even if corrupting these files convinces the 
postgres backend to attempt to write somewhere else in the system, the 
container should be sufficient to prevent it from actually succeeding outside 
its own data directory.

The issue of the pg_read_file() sql function, and similar functions, would seem 
to fall into the same category as "copy to file" and "copy from file".  Bob can 
read and write his own data directory, but not anything else, assuming Alice 
set up the container properly.

> I actually think this is a design error on our part. A lot of people,
> apparently including you, feel that there should be a privilege
> boundary between the PostgreSQL superuser and the OS user, or want
> such a boundary to exist.  

I'm arguing that the boundary does currently (almost) exist, but is violated by 
default, easy to further violate without realizing you are doing so, 
inconvenient and hard to maintain in practice, requires segregating the 
database superuser from whichever adminstrator(s) execute other tools, requires 
being paranoid when running such tools against the database because any content 
found therein could have

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

On Tue, Apr 20, 2021 at 2:29 PM Stefan Keller  wrote:
> Just for the records: A learned index as no more foreknowledge about
> the dataset as other indices.

Maybe. ML models are famously prone to over-interpreting training
data. In any case I am simply not competent to assess how true this
is.

> I'd give learned indexes at least a change to provide a
> proof-of-concept. And I want to learn more about the requirements to
> be accepted as a new index (before undergoing month's of code
> sprints).

I have everything to gain and nothing to lose by giving them a chance
-- I'm not required to do anything to give them a chance, after all. I
just want to be clear that I'm a skeptic now rather than later. I'm
not the one making a big investment of my time here.

> As you may have seen, the "Stonebraker paper" I cited [1] is also
> sceptic requiring full parity on features (like "concurrency control,
> recovery, non main memory,and multi-user settings")! Non main memory
> code I understand.
> => But index read/write operations and multi-user settings are part of
> a separate software (transaction manager), aren't they?

It's easy for me to be a skeptic -- again, what do I have to lose by
freely expressing my opinion? Mostly I'm just saying that I wouldn't
work on this because ISTM that there is significant uncertainty about
the outcome, but much less uncertainty about the outcome of
alternative projects of comparable difficulty. That's fundamentally
how I assess what to work on. There is plenty of uncertainty on my end
-- but that's beside the point.

-- 
Peter Geoghegan

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

Stefan Keller  writes:
> I'd give learned indexes at least a change to provide a
> proof-of-concept. And I want to learn more about the requirements to
> be accepted as a new index (before undergoing month's of code
> sprints).

There's enough support these days that you can build a new index
type as an extension, without touching the core code at all.
I'd advise proceeding that way, at least until you have a pretty
convincing prototype.

regards, tom lane

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2021-04-20 Thread Stefan Keller

Just for the records: A learned index as no more foreknowledge about
the dataset as other indices.

I'd give learned indexes at least a change to provide a
proof-of-concept. And I want to learn more about the requirements to
be accepted as a new index (before undergoing month's of code
sprints).

As you may have seen, the "Stonebraker paper" I cited [1] is also
sceptic requiring full parity on features (like "concurrency control,
recovery, non main memory,and multi-user settings")! Non main memory
code I understand.
=> But index read/write operations and multi-user settings are part of
a separate software (transaction manager), aren't they?

~Stefan

[1] https://www.cs.brandeis.edu/~olga/publications/ieee2021.pdf

Am Di., 20. Apr. 2021 um 22:22 Uhr schrieb Peter Geoghegan :
>
> On Tue, Apr 20, 2021 at 12:51 PM Jonah H. Harris  
> wrote:
> >> Maybe I'll be wrong about learned indexes - who knows? But the burden
> >> of proof is not mine. I prefer to spend my time on things that I am
> >> reasonably confident will work out well ahead of time.
> >
> >
> > Agreed on all of your takes, Peter. In time, they will probably be more 
> > realistic.
>
> A big problem when critically evaluating any complicated top-down
> model in the abstract is that it's too easy for the designer to hide
> *risk* (perhaps inadvertently). If you are allowed to make what
> amounts to an assumption that you have perfect foreknowledge of the
> dataset, then sure, you can do a lot with that certainty. You can
> easily find a way to make things faster or more space efficient by
> some ridiculous multiple that way (like 10x, 100x, whatever).
>
> None of these papers ever get around to explaining why what they've
> come up with is not simply fool's gold. The assumption that you can
> have robust foreknowledge of the dataset seems incredibly fragile,
> even if your model is *almost* miraculously good. I have no idea how
> fair that is. But my job is to make Postgres better, not to judge
> papers. My mindset is very matter of fact and practical.
>
> --
> Peter Geoghegan

Re: RFE: Make statistics robust for unplanned events

On Tue, Apr 20, 2021 at 5:00 AM Patrik Novotny  wrote:
> As far as I've checked, this would have to be implemented.
>
> My question would be whether there is something that would make this 
> impossible to implement, and if there isn't, I'd like this to be considered a 
> feature request.

I agree with you.

Maybe crash safety would require some care in cases where autovacuum
runs very frequently, so that the overhead isn't too high. But
overall, non-crash-safe information that drives autovacuum is penny
wise, pound foolish.

I'm sure that it doesn't matter that much most of the time, but there
are probably workloads and use cases where it causes significant and
persistent problems. That's not the right trade-off IMV.

-- 
Peter Geoghegan

Re: Synchronous commit behavior during network outage

2021-04-20 Thread SATYANARAYANA NARLAPURAM

One idea here is to make the backend ignore query cancellation/backend
termination while waiting for the synchronous commit ACK. This way client
never reads the data that was never flushed remotely. The problem with this
approach is that your backends get stuck until your commit log record is
flushed on the remote side. Also, the client can see the data not flushed
remotely if the server crashes and comes back online. You can prevent the
latter case by making a SyncRepWaitForLSN before opening up the connections
to the non-superusers. I have a working prototype of this logic, if there
is enough interest I can post the patch.





On Tue, Apr 20, 2021 at 11:25 AM Ondřej Žižka 
wrote:

> I am sorry, I forgot mentioned, that in the second situation I added a
> primary key to the table.
>
> Ondrej
>
>
> On 20/04/2021 18:49, Ondřej Žižka wrote:
> > Hello Aleksander,
> >
> > Thank you for the reaction. This was tested on version 13.2.
> >
> > There are also other possible situations with the same setup and
> > similar issue:
> >
> > -
> > When the background process on server fails
> >
> > On postgresql1:
> > tecmint=# select * from a; --> LAN on sync replica is OK
> >  id
> > 
> >   1
> > (1 row)
> >
> > tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN
> > and insert is waiting. During this time kill the background process on
> > the PostgreSQL server for this session
> > WARNING:  canceling the wait for synchronous replication and
> > terminating connection due to administrator command
> > DETAIL:  The transaction has already committed locally, but might not
> > have been replicated to the standby.
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Succeeded.
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> > (2 rows)
> >
> > tecmint=# ---> LAN on sync replica is still DOWN
> >
> > The potgres session will restore after the background process failed.
> > When you run select on master, it still looks OK. But data is still
> > not replicated on the sync replica. If we lost the master now, we
> > would lost this data as well.
> >
> > **
> > Another case
> > **
> >
> > Kill the client process.
> >
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> >   3
> > (3 rows)
> > tecmint=#--> Disconnect the sync replica now. LAN on
> > replica is DOWN
> > tecmint=# insert into a values (4); --> Kill the client process
> > Terminated
> > xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d
> tecmint
> > Password for user postgres:
> > psql (13.2 (Debian 13.2-1.pgdg100+1))
> > Type "help" for help.
> >
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> >   3
> > (3 rows)
> >
> > tecmint=# --> Number 4 is not there. Now switch the LAN on sync
> > replica ON.
> >
> > --
> >
> > Result from sync replica after the LAN is again UP:
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> >   3
> >   4
> > (4 rows)
> >
> >
> > In this situation, try to insert the number 4 again to the table.
> >
> > tecmint=# select * from a;
> >  id
> > 
> >   1
> >   2
> >   3
> > (3 rows)
> >
> > tecmint=# insert into a values (4);
> > ERROR:  duplicate key value violates unique constraint "a_pkey"
> > DETAIL:  Key (id)=(4) already exists.
> > tecmint=#
> >
> > This is really strange... Application can be confused, It is not
> > possible to insert record, which is not there, but some systems which
> > use the sync node as a read replica maybe already read that record
> > from the sync replica database and done some steps which can cause
> > issues and can be hard to track.
> >
> > If I say, that it would be hard to send the CTRL+C to the database
> > from the client, I need to say, that the 2 situations I described here
> > can happen in real.
> >
> > What do you think?
> >
> > Thank you and regards
> > Ondrej
> >
> > On 20/04/2021 17:23, Aleksander Alekseev wrote:
> >> Hi Ondřej,
> >>
> >> Thanks for the report. It seems to be a clear violation of what is
> >> promised in the docs. Although it's unlikely that someone implemented
> >> an application which deals with important data and "pressed Ctr+C" as
> >> it's done in psql. So this might be not such a critical issue after
> >> all. BTW what version of PostgreSQL are you using?
> >>
> >>
> >> On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka
> >>  wrote:
> >>> Hello all,
> >>> I would like to know your opinion on the following behaviour I see
> >>> for PostgreSQL setup with synchronous replication.
> >>>
> >>> This behaviour happens in a special use case. In this use case,
> >>> there are 2 synchronous replicas with the following config (truncated):
> >>>
> >>> - 2 nodes
> >>> - synchronous_standby_names='*'
> >>> - synchronous_commit=remote_apply
> >>>
> >>>
> >>> With thi

Re: when the startup process doesn't

2021-04-20 Thread SATYANARAYANA NARLAPURAM

+1 for both log messages and allowing connections. I believe these two
complement each other.

In the cloud world, we oftentimes want to monitor the progress of the
recovery without connecting to the server as the operators don't
necessarily have the required permissions to connect and query. Secondly,
having this information in the log helps going back in time and understand
where Postgres spent time during recovery.

The ability to query the server provides real time information  and come
handy.

Thanks,
Satya



On Mon, Apr 19, 2021 at 10:55 AM Robert Haas  wrote:

> Hi,
>
> I've noticed that customers not infrequently complain that they start
> postgres and then the system doesn't come up for a while and they have
> no idea what's going on and are (understandably) worried. There are
> probably a number of reasons why this can happen, but the ones that
> seem to come up most often in my experience are (1) SyncDataDirectory
> takes a long time, (b) ResetUnloggedRelations takes a long time, and
> (c) there's a lot of WAL to apply so that takes a long time. It's
> possible to distinguish this last case from the other two by looking
> at the output of 'ps', but that's not super-convenient if your normal
> method of access to the server is via libpq, and it only works if you
> are monitoring it as it's happening rather than looking at the logs
> after-the-fact. I am not sure there's any real way to distinguish the
> other two cases without using strace or gdb or similar.
>
> It seems to me that we could do better. One approach would be to try
> to issue a log message periodically - maybe once per minute, or some
> configurable interval, e.g. perhaps add messages something like this:
>
> LOG:  still syncing data directory, elapsed time %ld.%03d ms, current path
> %s
> LOG:  data directory sync complete after %ld.%03d ms
> LOG:  still resetting unlogged relations, elapsed time %ld.%03d ms,
> current path %s
> LOG:  unlogged relations reset after %ld.%03d ms
> LOG:  still performing crash recovery, elapsed time %ld.%03d ms,
> current LSN %08X/%08X
>
> We already have a message when redo is complete, so there's no need
> for another one. The implementation here doesn't seem too hard either:
> the startup process would set a timer, when the timer expires the
> signal handler sets a flag, at a convenient point we notice the flag
> is set and responding by printing a message and clearing the flag.
>
> Another possible approach would be to accept connections for
> monitoring purposes even during crash recovery. We can't allow access
> to any database at that point, since the system might not be
> consistent, but we could allow something like a replication connection
> (the non-database-associated variant). Maybe it would be precisely a
> replication connection and we'd just refuse all but a subset of
> commands, or maybe it would be some other kinds of thing. But either
> way you'd be able to issue a command in some mini-language saying "so,
> tell me how startup is going" and it would reply with a result set of
> some kind.
>
> If I had to pick one of these two ideas, I'd pick the one the
> log-based solution, since it seems easier to access and simplifies
> retrospective analysis, but I suspect SQL access would be quite useful
> for some users too, especially in cloud environments where "just log
> into the machine and have a look" is not an option.
>
> Thoughts?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
>
>

Re: fix old confusing JSON example

Justin Pryzby  writes:
> On Tue, Apr 20, 2021 at 09:07:52PM +0200, Erik Rijkers wrote:
>> I just happened to use the website-documentation and noticed that there the 
>> change is not done: it still has the erroneous line, in the docs of 13 
>> (current), and 12; the docs of 14devel are apparently updated.
>> 
>> That makes me wonder: is there a regular html-docs-update (dayly? weekly?) 
>> of doc-bugs of this kind in the website-docs of current and earlier releases?

> Looking at the doc "HOME", it says:
> https://www.postgresql.org/docs/13/index.html
> | PostgreSQL 13.2 Documentation
> So this seems to be updated for minor releases.

Yeah.  The website's copy of the devel version of the docs is refreshed
quickly (within a few hours of commit, usually) but released branches
are only updated when there's a release.

regards, tom lane

Re: when the startup process doesn't

2021-04-20 Thread Alvaro Herrera

On 2021-Apr-20, Andres Freund wrote:

> On 2021-04-19 13:55:13 -0400, Robert Haas wrote:
> > Another possible approach would be to accept connections for
> > monitoring purposes even during crash recovery. We can't allow access
> > to any database at that point, since the system might not be
> > consistent, but we could allow something like a replication connection
> > (the non-database-associated variant). Maybe it would be precisely a
> > replication connection and we'd just refuse all but a subset of
> > commands, or maybe it would be some other kinds of thing. But either
> > way you'd be able to issue a command in some mini-language saying "so,
> > tell me how startup is going" and it would reply with a result set of
> > some kind.
> 
> The hard part about this seems to be how to perform authentication -
> obviously we can't do catalog lookups for users at that time.

Maybe a way to do this would involve some sort of monitoring cookie
that's obtained ahead of time (maybe at initdb time?) and is supplied to
the frontend by some OOB means.  Then frontend can present that during
startup to the server, which ascertains its legitimacy without having to
access catalogs.  Perhaps it even requires a specific pg_hba.conf rule.

-- 
Álvaro Herrera   Valdivia, Chile
"La verdad no siempre es bonita, pero el hambre de ella sí"

Re: when the startup process doesn't

Hi,

On 2021-04-20 14:56:58 -0400, Tom Lane wrote:
> I wonder though whether we really need authentication here.  pg_ping
> already exposes whether the database is up, to anyone who can reach the
> postmaster port at all.  Would it be so horrible if the "can't accept
> connections" error message included a detail about "recovery is X%
> done"?

Unfortunately I think something like a percentage is hard to calculate
right now.  Even just looking at crash recovery (vs replication or
PITR), we don't currently know where the WAL ends without reading all
the WAL. The easiest thing to return would be something in LSNs or
bytes and I suspect that we don't want to expose either unauthenticated?

I wonder if we ought to occasionally update something like
ControlFileData->minRecoveryPoint on primaries, similar to what we do on
standbys? Then we could actually calculate a percentage, and it'd have
the added advantage of allowing to detect more cases where the end of
the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
adding a lot of fsyncs, but that seems doable?

Greetings,

Andres Freund

Re: when the startup process doesn't

Hi,

On 2021-04-19 13:55:13 -0400, Robert Haas wrote:
> Another possible approach would be to accept connections for
> monitoring purposes even during crash recovery. We can't allow access
> to any database at that point, since the system might not be
> consistent, but we could allow something like a replication connection
> (the non-database-associated variant). Maybe it would be precisely a
> replication connection and we'd just refuse all but a subset of
> commands, or maybe it would be some other kinds of thing. But either
> way you'd be able to issue a command in some mini-language saying "so,
> tell me how startup is going" and it would reply with a result set of
> some kind.

The hard part about this seems to be how to perform authentication -
obviously we can't do catalog lookups for users at that time.

If that weren't the issue, we could easily do much better than now, by
just providing an errdetail() with recovery progress information. But we
presumably don't want to spray such information to unauthenticated
connection attempts.

I've vaguely wondered before whether it'd be worth having something like
an "admin" socket somewhere in the data directory. Which explicitly
wouldn't require authentication, have the cluster owner as the user,
etc.  That'd not just be useful for monitoring during recovery, but also
make some interactions with the server easier for admin tools I think.

> If I had to pick one of these two ideas, I'd pick the one the
> log-based solution, since it seems easier to access and simplifies
> retrospective analysis, but I suspect SQL access would be quite useful
> for some users too, especially in cloud environments where "just log
> into the machine and have a look" is not an option.

However, leaving aside the implementation effort, the crazy idea
above would not easily address the issue of only being accessible with
local access...

Greetings,

Andres Freund

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

On Tue, Apr 20, 2021 at 12:51 PM Jonah H. Harris  wrote:
>> Maybe I'll be wrong about learned indexes - who knows? But the burden
>> of proof is not mine. I prefer to spend my time on things that I am
>> reasonably confident will work out well ahead of time.
>
>
> Agreed on all of your takes, Peter. In time, they will probably be more 
> realistic.

A big problem when critically evaluating any complicated top-down
model in the abstract is that it's too easy for the designer to hide
*risk* (perhaps inadvertently). If you are allowed to make what
amounts to an assumption that you have perfect foreknowledge of the
dataset, then sure, you can do a lot with that certainty. You can
easily find a way to make things faster or more space efficient by
some ridiculous multiple that way (like 10x, 100x, whatever).

None of these papers ever get around to explaining why what they've
come up with is not simply fool's gold. The assumption that you can
have robust foreknowledge of the dataset seems incredibly fragile,
even if your model is *almost* miraculously good. I have no idea how
fair that is. But my job is to make Postgres better, not to judge
papers. My mindset is very matter of fact and practical.

-- 
Peter Geoghegan

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2021-04-20 Thread Jonah H. Harris

On Tue, Apr 20, 2021 at 3:45 PM Peter Geoghegan  wrote:

> On Tue, Apr 20, 2021 at 12:35 PM Chapman Flack 
> wrote:
> > How would showing that to be true for data structure X be different from
> > making a case for data structure X?
>
> You don't have to understand the theoretical basis of B-Tree indexes
> to see that they work well. In fact, it took at least a decade for
> somebody to formalize how the space utilization works with B-Trees
> containing random data. Of course theory matters, but the fact is that
> B-Trees had been widely used for commercial and scientific
> applications that whole time.
>
> Maybe I'll be wrong about learned indexes - who knows? But the burden
> of proof is not mine. I prefer to spend my time on things that I am
> reasonably confident will work out well ahead of time.
>

Agreed on all of your takes, Peter. In time, they will probably be more
realistic. But, at present, I tend to see the research papers make
comparisons between learned vs. traditional pitching the benefits of the
former without any of the well-known optimizations of the latter - as if
time stood still since the original B-Tree. Similarly, where most academic
research starts to fall apart in practicality is the lack of addressing
realistic write volumes and related concurrency issues. I'm happy to be
disproven on this, though.

-- 
Jonah H. Harris

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

On Tue, Apr 20, 2021 at 12:35 PM Chapman Flack  wrote:
> How would showing that to be true for data structure X be different from
> making a case for data structure X?

You don't have to understand the theoretical basis of B-Tree indexes
to see that they work well. In fact, it took at least a decade for
somebody to formalize how the space utilization works with B-Trees
containing random data. Of course theory matters, but the fact is that
B-Trees had been widely used for commercial and scientific
applications that whole time.

Maybe I'll be wrong about learned indexes - who knows? But the burden
of proof is not mine. I prefer to spend my time on things that I am
reasonably confident will work out well ahead of time.

--
Peter Geoghegan

Re: fix old confusing JSON example

2021-04-20 Thread Justin Pryzby

On Tue, Apr 20, 2021 at 09:07:52PM +0200, Erik Rijkers wrote:
> I just happened to use the website-documentation and noticed that there the 
> change is not done: it still has the erroneous line, in the docs of 13 
> (current), and 12; the docs of 14devel are apparently updated.
> 
> That makes me wonder: is there a regular html-docs-update (dayly? weekly?) of 
> doc-bugs of this kind in the website-docs of current and earlier releases?
> 
> To be clear, I am talking about the lines below:
>   'GIN index supports @@ and @? operators'
> 
> on pages
>   https://www.postgresql.org/docs/13/datatype-json.html
>   https://www.postgresql.org/docs/12/datatype-json.html
> 
> where the change that was pushed was to correct the second example from  @@  
> to  @?

Looking at the doc "HOME", it says:
https://www.postgresql.org/docs/13/index.html
| PostgreSQL 13.2 Documentation

So this seems to be updated for minor releases.

-- 
Justin

Re: PATCH: Add GSSAPI ccache_name option to libpq

2021-04-20 Thread Daniel Carter


Hi Stephen,

On 20/04/2021 20:01, Stephen Frost wrote:

I'm not necessarily against this, but typically the GSSAPI library
provides a way for you to control this using, eg, the KRB5_CCACHE
environment variable.  Is there some reason why that couldn't be used..?


The original motivation for investigating this was setting up a web app 
which could authenticate to a database server using a Kerberos ticket. 
Since the web framework already needs to create a connection string 
(with database name etc.) to set up the database connection, having an 
option here for the ccache location makes it much more straightforward 
to specify than having to save data out to environment variables (and 
makes things cleaner if there are potentially multiple database 
connections going on at once in different processes).


There may well be a better way of going about this -- it's just that I 
can't currently see an obvious way to get this kind of setup working 
using only the environment variable.


Many thanks,
Daniel

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2021-04-20 Thread Chapman Flack

On 04/20/21 15:24, Peter Geoghegan wrote:
> data structures that work well don't need anybody to make a case for them.
> They simply work well for the task they were designed for.

How would showing that to be true for data structure X be different from
making a case for data structure X?

Regards,
-Chap

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

On Tue, Apr 20, 2021 at 11:18 AM Andrey Borodin  wrote:
> BTW take a look into PGM [0]. I'm slowly working on implementing it.
> I think it is kind of straightforward to implement it as extension.
> I've started from forking B-tree[1]. I've removed support of anything that is 
> not int4.
> Then I plan to remove opclass\comparator abstraction layer.
> And then add interpolation search over the page before binsearch.

FWIW I'm a skeptic of learned indexes. There are lots of reasons why I
don't think that they're practical, but it boils down to this: in
general data structures that work well don't need anybody to make a
case for them. They simply work well for the task they were designed
for.

Anything is possible, but I strongly doubt that learned indexes are
the exception to that general rule. Why hasn't anybody been able to
apply the techniques in any real world database system to date? I'm
sure that it's possible to make things much faster if it's possible to
make certain wide-reaching assumptions. They talk about big
improvements in space utilization compared to B-Trees as if there was
some very fixed idea of how that works in B-Trees. Why haven't they
compared their model to grotty heuristics that practical experience
has shown to work, such as the rightmost leaf page split heuristic?
Any paper about learned indexes that I've ever read doesn't even
acknowledge the existence of these heuristics.

-- 
Peter Geoghegan

Re: fix old confusing JSON example

2021-04-20 Thread Erik Rijkers

> On 2021.04.16. 10:00 Michael Paquier  wrote:
> On Sat, Apr 03, 2021 at 02:28:38PM +0200, Erik Rijkers wrote:
> > So, that gives information on two operators, and then gives one
> > example query for each.  Clearly, the second example was meant to
> > illustrate a where-clause with the  @?  operator. 
> > 
> > Small change to prevent great confusion (I'll admit it took me far
> > too long to understand this). 
> 
> Once one guesses the definition of the table to use with the sample
> data at disposal in the docs, it is easy to see that both queries
> should return the same result, but the second one misses the shot and
> is corrected as you say.  So, applied.

Great, thank you.

I just happened to use the website-documentation and noticed that there the 
change is not done: it still has the erroneous line, in the docs of 13 
(current), and 12; the docs of 14devel are apparently updated.

That makes me wonder: is there a regular html-docs-update (dayly? weekly?) of 
doc-bugs of this kind in the website-docs of current and earlier releases?

To be clear, I am talking about the lines below:
  'GIN index supports @@ and @? operators'

on pages
  https://www.postgresql.org/docs/13/datatype-json.html
  https://www.postgresql.org/docs/12/datatype-json.html

where the change that was pushed was to correct the second example from  @@  to 
 @?

thanks,

Erik Rijkers

> 
> My apologies for the delay.
> --
> Michael

Re: when the startup process doesn't

2021-04-20 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Yeah, being able to pick up on this remotely seems like it'd be quite
> > nice.  I'm not really thrilled with the idea, but the best I've got
> > offhand for this would be a new role that's "pg_recovery_login" where an
> > admin can GRANT that role to the roles they'd like to be able to use to
> > login during the recovery process and then, for those roles, we write
> > out flat files to allow authentication without access to pg_authid,
> 
> We got rid of those flat files for good and sufficient reasons.  I really
> really don't want to go back to having such.

Yeah, certainly is part of the reason that I didn't really like that
idea either.

> I wonder though whether we really need authentication here.  pg_ping
> already exposes whether the database is up, to anyone who can reach the
> postmaster port at all.  Would it be so horrible if the "can't accept
> connections" error message included a detail about "recovery is X%
> done"?

Ultimately it seems like it would depend on exactly what we are thinking
of returning there.  A simple percentage of recovery which has been
completed doesn't seem like it'd really be revealing too much
information though.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: PATCH: Add GSSAPI ccache_name option to libpq

2021-04-20 Thread Stephen Frost

Greetings,

* Daniel Carter (danielchriscarter+postg...@gmail.com) wrote:
> This is a small patch (against master) to allow an application using libpq
> with GSSAPI authentication to specify where to fetch the credential cache
> from -- it effectively consists of a new field in PQconninfoOptions to store
> this data and (where the user has specified a ccache location) a call into
> the gss_krb5_ccache_name function in the GSSAPI library.

I'm not necessarily against this, but typically the GSSAPI library
provides a way for you to control this using, eg, the KRB5_CCACHE
environment variable.  Is there some reason why that couldn't be used..?

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: when the startup process doesn't

Stephen Frost  writes:
> Yeah, being able to pick up on this remotely seems like it'd be quite
> nice.  I'm not really thrilled with the idea, but the best I've got
> offhand for this would be a new role that's "pg_recovery_login" where an
> admin can GRANT that role to the roles they'd like to be able to use to
> login during the recovery process and then, for those roles, we write
> out flat files to allow authentication without access to pg_authid,

We got rid of those flat files for good and sufficient reasons.  I really
really don't want to go back to having such.

I wonder though whether we really need authentication here.  pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all.  Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?

regards, tom lane

Re: when the startup process doesn't

2021-04-20 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Magnus Hagander  writes:
> > On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
> >  wrote:
> >> Two another options:
> >> 1. if this is limited to local access only, outside of the log entries, the
> >> status of the startup could be updated in the controldata file as well. 
> >> This
> >> would allows to watch it without tail-grep'ing logs using eg. 
> >> pg_controldata.
> 
> > I think doing so in controldata would definitely make things
> > complicated for no real reason. Plus controldata has a fixed size (and
> > has to have), whereas something like this would probably want more
> > variation than that makes easy.
> 
> Also, given that pg_control is as critical a bit of data as we have,
> we really don't want to be writing it more often than we absolutely
> have to.

Yeah, don't think pg_control fiddling is what we want.  I do agree with
improving the logging situation around here, certainly.

> > There could be a "startup.status" file I guess which would basically
> > contain the last line of what would otherwise be in the log. But if it
> > remains a textfile, I'm not sure what the gain is -- you'll just have
> > to have the dba look in more places than one to find it? It's not like
> > there's likely to be much other data written to the log during these
> > times?
> 
> Yeah, once you are talking about dumping stuff in a file, it's not
> clear how that's better than progress-messages-in-the-log.  People
> already have a lot of tooling for looking at the postmaster log.

Agreed.

> I think the point of Robert's other proposal is to allow remote
> checks of the restart's progress, so local files aren't much of
> a substitute anyway.

Yeah, being able to pick up on this remotely seems like it'd be quite
nice.  I'm not really thrilled with the idea, but the best I've got
offhand for this would be a new role that's "pg_recovery_login" where an
admin can GRANT that role to the roles they'd like to be able to use to
login during the recovery process and then, for those roles, we write
out flat files to allow authentication without access to pg_authid,
whenever their password or such changes.  It's certainly a bit grotty
but I do think it'd work.  I definitely don't want to go back to having
all of pg_authid written as a flat file and I'd rather that existing
tools and libraries work with this (meaning using the same port and
speaking the PG protocol and such) rather than inventing some new thing
that listens on some other port, etc.

On the fence about tying this to 'pg_monitor' instead of using a new
predefined role.  Either way, I would definitely prefer to see the admin
have to create a role and then GRANT the predefined role to that role.
I really dislike the idea of having predefined roles that can be used to
directly log into the database.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Synchronous commit behavior during network outage

I am sorry, I forgot mentioned, that in the second situation I added a 
primary key to the table.


Ondrej


On 20/04/2021 18:49, Ondřej Žižka wrote:

Hello Aleksander,

Thank you for the reaction. This was tested on version 13.2.

There are also other possible situations with the same setup and 
similar issue:


-
When the background process on server fails

On postgresql1:
tecmint=# select * from a; --> LAN on sync replica is OK
 id

  1
(1 row)

tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN 
and insert is waiting. During this time kill the background process on 
the PostgreSQL server for this session
WARNING:  canceling the wait for synchronous replication and 
terminating connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
tecmint=# select * from a;
 id

  1
  2
(2 rows)

tecmint=# ---> LAN on sync replica is still DOWN

The potgres session will restore after the background process failed. 
When you run select on master, it still looks OK. But data is still 
not replicated on the sync replica. If we lost the master now, we 
would lost this data as well.


**
Another case
**

Kill the client process.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)
tecmint=#    --> Disconnect the sync replica now. LAN on 
replica is DOWN

tecmint=# insert into a values (4); --> Kill the client process
Terminated
xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d tecmint
Password for user postgres:
psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)

tecmint=# --> Number 4 is not there. Now switch the LAN on sync 
replica ON.


--

Result from sync replica after the LAN is again UP:
tecmint=# select * from a;
 id

  1
  2
  3
  4
(4 rows)


In this situation, try to insert the number 4 again to the table.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)

tecmint=# insert into a values (4);
ERROR:  duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (id)=(4) already exists.
tecmint=#

This is really strange... Application can be confused, It is not 
possible to insert record, which is not there, but some systems which 
use the sync node as a read replica maybe already read that record 
from the sync replica database and done some steps which can cause 
issues and can be hard to track.


If I say, that it would be hard to send the CTRL+C to the database 
from the client, I need to say, that the 2 situations I described here 
can happen in real.


What do you think?

Thank you and regards
Ondrej

On 20/04/2021 17:23, Aleksander Alekseev wrote:

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka 
 wrote:

Hello all,
I would like to know your opinion on the following behaviour I see 
for PostgreSQL setup with synchronous replication.


This behaviour happens in a special use case. In this use case, 
there are 2 synchronous replicas with the following config (truncated):


- 2 nodes
- synchronous_standby_names='*'
- synchronous_commit=remote_apply


With this setup run the following steps (LAN down - LAN between 
master and replica):

-
postgres=# truncate table a;
TRUNCATE TABLE
postgres=# insert into a values (1); -- LAN up, insert has been 
applied to replica.

INSERT 0 1
Vypnu LAN na serveru se standby:
postgres=# insert into a values (2); --LAN down, waiting for a 
confirmation from sync replica. In this situation cancel it (press 
CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user 
request
DETAIL:  The transaction has already committed locally, but might 
not have been replicated to the standby.

INSERT 0 1
There will be warning that commit was performed only locally:
2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for 
synchronous replication due to user request
2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has 
already committed locally, but might not have been replicated to the 
standby.


postgres=# insert into a values (2); --LAN down, waiting for a 
confirmation from sync replica. In this situation cancel it (press 
CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user 
request
DETAIL:  The transaction has already committed locally, but might

Re: Synchronous commit behavior during network outage

Hello Maksim,

I know your post [1]. That thread is why there we performed more tests
(see another my email in this thread). We are trying to somehow
implement RPO=0 solution using PostgreSQL. Knowing this... Would be
possible to build RPO=0 solution with PostgreSQL?

Ondrej

On 20/04/2021 18:51, Maksim Milyutin wrote:

Hi!

This is a known issue with synchronous replication [1]. You might
inject into unmodified operation some dummy modification to overcome
the negative sides of such partially committing without source code
patching.

On 20.04.2021 19:23, Aleksander Alekseev wrote:

Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql.

Some client libraries have feature to cancel session that has similar
effect to "Ctrl+C" from psql after specified by client deadline
expiration [2]. Hence, this case might be quite often when application
interacts with database.

On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka
wrote:

From the synchronous_commit=remote_write level and "higher", I would
expect, that when the remote application (doesn't matter if flush,
write or apply) would not be applied I would not receive a
confirmation about the commit (even with a warning). Something like,
if there is no commit from sync replica, there is no commit on
primary and if someone performs the steps above, the whole
transaction will not send a confirmation.

The warning have to be accounted here and performed commit have not to
be treated as *successful*.

1.
https://www.postgresql.org/message-id/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru

2.
https://www.postgresql.org/message-id/CANtu0ogbu%2By6Py963p-zKJ535b8zm5AOq7zkX7wW-tryPYi1DA%40mail.gmail.com

Re: Synchronous commit behavior during network outage


Hello Aleksander,

Thank you for the reaction. This was tested on version 13.2.

There are also other possible situations with the same setup and similar 
issue:


-
When the background process on server fails

On postgresql1:
tecmint=# select * from a; --> LAN on sync replica is OK
 id

  1
(1 row)

tecmint=# insert into a values (2); ---> LAN on sync replica is DOWN and 
insert is waiting. During this time kill the background process on the 
PostgreSQL server for this session
WARNING:  canceling the wait for synchronous replication and terminating 
connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
tecmint=# select * from a;
 id

  1
  2
(2 rows)

tecmint=# ---> LAN on sync replica is still DOWN

The potgres session will restore after the background process failed. 
When you run select on master, it still looks OK. But data is still not 
replicated on the sync replica. If we lost the master now, we would lost 
this data as well.


**
Another case
**

Kill the client process.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)
tecmint=#    --> Disconnect the sync replica now. LAN on 
replica is DOWN

tecmint=# insert into a values (4); --> Kill the client process
Terminated
xzizka@service-vm:~$ psql -U postgres -h 192.168.122.6 -p 5432 -d tecmint
Password for user postgres:
psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)

tecmint=# --> Number 4 is not there. Now switch the LAN on sync replica ON.

--

Result from sync replica after the LAN is again UP:
tecmint=# select * from a;
 id

  1
  2
  3
  4
(4 rows)


In this situation, try to insert the number 4 again to the table.

tecmint=# select * from a;
 id

  1
  2
  3
(3 rows)

tecmint=# insert into a values (4);
ERROR:  duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (id)=(4) already exists.
tecmint=#

This is really strange... Application can be confused, It is not 
possible to insert record, which is not there, but some systems which 
use the sync node as a read replica maybe already read that record from 
the sync replica database and done some steps which can cause issues and 
can be hard to track.


If I say, that it would be hard to send the CTRL+C to the database from 
the client, I need to say, that the 2 situations I described here can 
happen in real.


What do you think?

Thank you and regards
Ondrej

On 20/04/2021 17:23, Aleksander Alekseev wrote:

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka  wrote:

Hello all,
I would like to know your opinion on the following behaviour I see for 
PostgreSQL setup with synchronous replication.

This behaviour happens in a special use case. In this use case, there are 2 
synchronous replicas with the following config (truncated):

- 2 nodes
- synchronous_standby_names='*'
- synchronous_commit=remote_apply


With this setup run the following steps (LAN down - LAN between master and 
replica):
-
postgres=# truncate table a;
TRUNCATE TABLE
postgres=# insert into a values (1); -- LAN up, insert has been applied to 
replica.
INSERT 0 1
Vypnu LAN na serveru se standby:
postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
from sync replica. In this situation cancel it (press CTRL+C)
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been 
replicated to the standby.
INSERT 0 1
There will be warning that commit was performed only locally:
2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous 
replication due to user request
2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already 
committed locally, but might not have been replicated to the standby.

postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
from sync replica. In this situation cancel it (press CTRL+C)
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been 
replicated to the standby.
INSERT 0 1
postgres=# insert into a values (2); --LAN down, waiting for sync replica, 
second attempt, cancel it as well (CT

Re: when the startup process doesn't

Magnus Hagander  writes:
> On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
>  wrote:
>> Two another options:
>> 1. if this is limited to local access only, outside of the log entries, the
>> status of the startup could be updated in the controldata file as well. This
>> would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

> I think doing so in controldata would definitely make things
> complicated for no real reason. Plus controldata has a fixed size (and
> has to have), whereas something like this would probably want more
> variation than that makes easy.

Also, given that pg_control is as critical a bit of data as we have,
we really don't want to be writing it more often than we absolutely
have to.

> There could be a "startup.status" file I guess which would basically
> contain the last line of what would otherwise be in the log. But if it
> remains a textfile, I'm not sure what the gain is -- you'll just have
> to have the dba look in more places than one to find it? It's not like
> there's likely to be much other data written to the log during these
> times?

Yeah, once you are talking about dumping stuff in a file, it's not
clear how that's better than progress-messages-in-the-log.  People
already have a lot of tooling for looking at the postmaster log.

I think the point of Robert's other proposal is to allow remote
checks of the restart's progress, so local files aren't much of
a substitute anyway.

regards, tom lane

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Maksim Milyutin




On 20.04.2021 19:38, Tomas Vondra wrote:


On 4/20/21 6:23 PM, Aleksander Alekseev wrote:

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


Which part of the docs does this contradict?



I think, Aleksandr refers to the following phrase in docs:

"The guarantee we offer is that the application will not receive 
explicit acknowledgment of the successful commit of a transaction until 
the WAL data is known to be safely received by all the synchronous 
standbys." [1]


And IMO confusing here regards to the notion of `successful commit`. 
Does warning attached to received commit message make it not 
*successful*? I think we have to explicitly mention cases about 
cancellation and termination session in docs to avoid ambiguity in 
understanding of phrase above.



1. 
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA


--
Regards,
Maksim Milyutin

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2021-04-20 Thread Andrey Borodin




> 20 апр. 2021 г., в 22:56, Stefan Keller  написал(а):
> 
> Are there any advances in a learned index for PostgreSQL?

BTW take a look into PGM [0]. I'm slowly working on implementing it.
I think it is kind of straightforward to implement it as extension.
I've started from forking B-tree[1]. I've removed support of anything that is 
not int4.
Then I plan to remove opclass\comparator abstraction layer.
And then add interpolation search over the page before binsearch.
And, maybe, add delta compression. Or, perhaps, fix WAL, which is completely 
broken. Or add some vectorisation.
The concurrency is from a regular B-tree, of cause.

This is my pet project. So, unsurprisingly, I've done only parts of big plan :) 
I would appreciate any help.

Thanks!

Best regards, Andrey Borodin.


[0] https://pgm.di.unipi.it/
[1] https://github.com/x4m/pg2m

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2021-04-20 Thread Stefan Keller

Dear Olegs, dear Nikolay, dear all

Allow me to revive this thread:

Are there any advances in a learned index for PostgreSQL?

Background: I'm trying to benchmark those experimental indices. For
this I did some bibliography work (see below). Fun fact: Not only
Postgres people love high-proof drinks, sorry, high-proof index names:
see "From wisckey to bourbon" :-).

My main discovery is a discussion report - which included Stonebraker
- entitled "ML-In-Databases: Assessment and Prognosis" [1]. The first
two takeaways are "#1: The initial comparisons of learned indices with
optimized traditional indices should be further expanded to include
concurrency control and multi-user settings. (...) #2: A key benefit
of learned indices may come from the learned structures requiring
lower space utilization, rather than a reduction in search time."

Yours, Stefan

[1] Kraska, T., Minhas, U. F., Neumann, T., Papaemmanouil, O., Patel,
J. M., Ré, C., & Stonebraker, M. (2021). ML-In-Databases: Assessment
and Prognosis. Data Engineering, 3. Web access:
https://www.cs.brandeis.edu/~olga/publications/ieee2021.pdf

Bibliography (without any claim for completeness):

Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018,
May). The case for learned index structures. In Proceedings of the
2018 International Conference on Management of Data (pp. 489-504). Web
access https://dl.acm.org/doi/pdf/10.1145/3183713.3196909

"Recursive model index, a learned index structure" (based on Kraska et
al. 2017? 2018). Source code: https://github.com/BenJoyenConseil/rmi
(Go language), https://github.com/learnedsystems/RMI (Rust)

Kaur, T. (2018). Learned Index Structures: Practical Implementations
and Future Directions. Master Thesis. Web access:
https://wwwiti.cs.uni-magdeburg.de/iti_db/publikationen/ps/auto/kaur2018learned.pdf
 (pretends to be open source C++ but none found)

Kipf, A., Marcus, R., van Renen, A., Stoian, M., Kemper, A., Kraska,
T., & Neumann, T. (2020, June). RadixSpline: a single-pass learned
index. In Proceedings of the Third International Workshop on
Exploiting Artificial Intelligence Techniques for Data Management (pp.
1-5). Web access: https://dl.acm.org/doi/10.1145/3401071.3401659),
Source code: https://github.com/learnedsystems/RadixSpline (C++).

Dai, Y., Xu, Y., Ganesan, A., Alagappan, R., Kroth, B.,
Arpaci-Dusseau, A., & Arpaci-Dusseau, R. (2020). From wisckey to
bourbon: A learned index for log-structured merge trees. In 14th
{USENIX} Symposium on Operating Systems Design and Implementation
({OSDI} 20) (pp. 155-171). Web access:
https://www.usenix.org/system/files/osdi20-dai_0.pdf

Ding, J., Minhas, U. F., Yu, J., Wang, C., Do, J., Li, Y., ... &
Kraska, T. (2020, June). ALEX: an updatable adaptive learned index. In
Proceedings of the 2020 ACM SIGMOD International Conference on
Management of Data (pp. 969-984). Web access:
https://doi.org/10.1145/3318464.3389711 /
https://dblp.org/rec/conf/sigmod/DingMYWDLZCGKLK20 . Source code:
https://github.com/microsoft/ALEX (C++, MIT license)

Am Di., 12. Dez. 2017 um 18:02 Uhr schrieb Oleg Ivanov
:
>
> On 12/12/2017 04:33 PM, Oleg Bartunov wrote:
> > On Mon, Dec 11, 2017 at 11:11 PM, Nikolay Samokhvalov
> >  wrote:
> >> Very interesting read: https://arxiv.org/abs/1712.01208
> >>
> >> HN discussion: https://news.ycombinator.com/item?id=15894896
> >>
> >> Some of the comments (from Twitter
> >> https://twitter.com/schrockn/status/940037656494317568): "Jeff Dean and co
> >> at GOOG just released a paper showing how machine-learned indexes can
> >> replace B-Trees, Hash Indexes, and Bloom Filters. Execute 3x faster than
> >> B-Trees, 10-100x less space. Executes on GPU, which are getting faster
> >> unlike CPU. Amazing."
> >>
> >> Can those ideas be applied to Postgres in its current state? Or it's not
> >> really down-to-earth?
> > Oleg made some analysis of the paper.
> If the keys are comparable and the data distribution is simple enough
> (which seems to happen rather often) and the data does not change, we
> can learn the distribution, and so perform the search faster, and save
> the memory also. That is what authors wrote and that is what must work
> in my opinion.
>
> The limitations of the approach, which authors mention in their work:
> + The data does not change. The proposed method can be extended more or
> less straightforward to nearly append-only workloads and to workloads in
> which the data distribution does not change or changes very slowly (the
> data itself or its amount may change, nevertheless). Otherwise it is
> challenging, because the key positions cannot be considered as
> independent values anymore, but it looks still possible. The other
> proposed by the authors option is to store new objects in buffer and
> retrain model in the background, but that does not look as a nice solution.
> + GPU are fast, but the latency of doing operations on the GPU almost
> certainly avoids all speed benefits for such small models. The only case
> in which i

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Maksim Milyutin


Hi!


This is a known issue with synchronous replication [1]. You might inject 
into unmodified operation some dummy modification to overcome the 
negative sides of such partially committing without source code patching.



On 20.04.2021 19:23, Aleksander Alekseev wrote:

Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql.



Some client libraries have feature to cancel session that has similar 
effect to "Ctrl+C" from psql after specified by client deadline 
expiration [2]. Hence, this case might be quite often when application 
interacts with database.




On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka  wrote:

 From the synchronous_commit=remote_write level and "higher", I would expect, 
that when the remote application (doesn't matter if flush, write or apply) would not be 
applied I would not receive a confirmation about the commit (even with a warning). 
Something like, if there is no commit from sync replica, there is no commit on primary 
and if someone performs the steps above, the whole transaction will not send a 
confirmation.



The warning have to be accounted here and performed commit have not to 
be treated as *successful*.



1. 
https://www.postgresql.org/message-id/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru


2. 
https://www.postgresql.org/message-id/CANtu0ogbu%2By6Py963p-zKJ535b8zm5AOq7zkX7wW-tryPYi1DA%40mail.gmail.com



--
Regards,
Maksim Milyutin

Re: when the startup process doesn't

2021-04-20 Thread Magnus Hagander

On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
 wrote:
>
> On Tue, 20 Apr 2021 15:04:28 +0200
> Magnus Hagander  wrote:
> [...]
> > Yeah, I think we should definitely limit this to local access, one way
> > or another. Realistically using pg_hba is going to require catalog
> > access, isn't it? And we can't just go ignore those rows in pg_hba
> > that for example references role membership (as well as all those auth
> > methods you can't use).
>
> Two another options:
>
> 1. if this is limited to local access only, outside of the log entries, the
> status of the startup could be updated in the controldata file as well. This
> would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

I think doing so in controldata would definitely make things
complicated for no real reason. Plus controldata has a fixed size (and
has to have), whereas something like this would probably want more
variation than that makes easy.

There could be a "startup.status" file I guess which would basically
contain the last line of what would otherwise be in the log. But if it
remains a textfile, I'm not sure what the gain is -- you'll just have
to have the dba look in more places than one to find it? It's not like
there's likely to be much other data written to the log during these
times?

> 2. maybe the startup process could ignore update_process_title? As far
> as I understand the doc correctly, this GUC is mostly useful for backends on
> Windows.

You mention Windows -- that would be one excellent reason not to go
for this particular method. Viewing the process title is much harder
on Windows, as there is actually no such thing and we fake it.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: multi-install PostgresNode fails with older postgres versions


On 4/19/21 12:37 PM, Andrew Dunstan wrote:
> On 4/19/21 10:43 AM, Mark Dilger wrote:
>>> On Apr 19, 2021, at 5:11 AM, Andrew Dunstan  wrote:
>>>
>>> I think therefore I'm inclined for now to do nothing for old version
>>> compatibility.
>> I agree with waiting until the v15 development cycle.
>>
>>> I would commit the fix for the IPC::Run caching glitch,
>>> and version detection
>> Thank you.


I've committed this piece.



>>
>>> I would add a warning if the module is used with
>>> a version <= 11.
>> Sounds fine for now.


Here's the patch for that.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index b32223f716..d126c1df9f 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -96,6 +96,7 @@ use File::Spec;
 use File::stat qw(stat);
 use File::Temp ();
 use IPC::Run;
+use PostgresVersion;
 use RecursiveCopy;
 use Socket;
 use Test::More;
@@ -1196,9 +1197,62 @@ sub get_new_node
 	# Add node to list of nodes
 	push(@all_nodes, $node);
 
+	$node->_set_pg_version;
+
+	my $v = $node->{_pg_version};
+
+	carp("PostgresNode isn't fully compatible with version " . $v)
+	  if $v < 12;
+
 	return $node;
 }
 
+# Private routine to run the pg_config binary found in our environment (or in
+# our install_path, if we have one), and set the version from it
+#
+sub _set_pg_version
+{
+my ($self) = @_;
+my $inst = $self->{_install_path};
+my $pg_config = "pg_config";
+
+if (defined $inst)
+{
+# If the _install_path is invalid, our PATH variables might find an
+# unrelated pg_config executable elsewhere.  Sanity check the
+# directory.
+BAIL_OUT("directory not found: $inst")
+unless -d $inst;
+
+# If the directory exists but is not the root of a postgresql
+# installation, or if the user configured using
+# --bindir=$SOMEWHERE_ELSE, we're not going to find pg_config, so
+# complain about that, too.
+$pg_config = "$inst/bin/pg_config";
+BAIL_OUT("pg_config not found: $pg_config")
+unless -e $pg_config;
+BAIL_OUT("pg_config not executable: $pg_config")
+unless -x $pg_config;
+
+# Leave $pg_config install_path qualified, to be sure we get the right
+# version information, below, or die trying
+}
+
+local %ENV = $self->_get_env();
+
+# We only want the version field
+open my $fh, "-|", $pg_config, "--version"
+or
+BAIL_OUT("$pg_config failed: $!");
+my $version_line = <$fh>;
+close $fh or die;
+
+$self->{_pg_version} = PostgresVersion->new($version_line);
+
+BAIL_OUT("could not parse pg_config --version output: $version_line")
+	  unless defined $self->{_pg_version};
+}
+
 # Private routine to return a copy of the environment with the PATH and
 # (DY)LD_LIBRARY_PATH correctly set when there is an install path set for
 # the node.
diff --git a/src/test/perl/PostgresVersion.pm b/src/test/perl/PostgresVersion.pm
new file mode 100644
index 00..8f1284c61b
--- /dev/null
+++ b/src/test/perl/PostgresVersion.pm
@@ -0,0 +1,82 @@
+
+#
+# PostgresVersion.pm
+#
+# Module encapsulating Postgres Version numbers
+#
+# Copyright (c) 2021, PostgreSQL Global Development Group
+#
+
+package PostgresVersion;
+
+use strict;
+use warnings;
+
+use Scalar::Util qw(blessed);
+
+use overload
+  '<=>' => \&_version_cmp,
+  'cmp' => \&_version_cmp,
+  '""' => \&_stringify;
+
+
+sub new
+{
+	my $class = shift;
+	my $arg = shift;
+
+	# Accept standard formats, in case caller has handed us the output of a
+	# postgres command line tool
+	$arg = $1
+		if ($arg =~ m/\(?PostgreSQL\)? (\d+(?:\.\d+)*(?:devel)?)/);
+
+	# Split into an array
+	my @result = split(/\./, $arg);
+
+	# Treat development versions as having a minor/micro version one less than
+	# the first released version of that branch.
+	if ($result[$#result] =~ m/^(\d+)devel$/)
+	{
+		pop(@result);
+		push(@result, $1, -1);
+	}
+
+	my $res = [ @result ];
+	bless $res, $class;
+	return $res;
+}
+
+
+# Routine which compares the _pg_version_array obtained for the two
+# arguments and returns -1, 0, or 1, allowing comparison between two
+# PostgresNodes or a PostgresNode and a version string.
+#
+# If the second argument is not a blessed object we call the constructor
+# to make one.
+#
+# Because we're overloading '<=>' and 'cmp' this function supplies us with
+# all the comparison operators ('<' and friends, 'gt' and friends)
+#
+sub _version_cmp
+{
+	my ($a, $b) = @_;
+
+	$b = __PACKAGE__->new($b) unless blessed($b);
+
+	for (my $idx = 0; ; $idx++)
+	{
+		return 0 unless (defined $a->[$idx] && defined $b->[$idx]);
+		return $a->[$idx] <=> $b->[$idx]
+			if ($a->[$idx] <=> $b->[$idx]);
+	}
+}
+
+# render the versi

Re: Problems around compute_query_id

2021-04-20 Thread Bruce Momjian

On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote:
> On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
> > I think we should simply document that %Q is not compatible with
> > log_statements.
> 
> Hearing no objection I documented that limitation.
> 
> > 
> > > While making the feature run on some test server, I have noticed that
> > > %Q would log some garbage query ID for autovacuum workers that's kept
> > > around.  That looks wrong.
> > 
> > I've not been able to reproduce it, do you have some hint on how to do it?
> > 
> > Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could 
> > fix
> > the problem?
> 
> It turns out that the problem was simply that some process can inherit a
> PgBackendStatus for which a previous backend reported a queryid.  For 
> processes
> like autovacuum process, they will never report a new identifier so they
> reported the previous one.  Resetting the field like the other ones in
> pgstat_bestart() fixes the problem for autovacuum and any similar process.

I slightly adjusted the patch and applied it.  Thanks.  I think this
closes all the open issues around query_id.  :-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 776ab1a8c8..dd7ebe7a9d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7139,6 +7139,16 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 
 

+
+   
+
+ The %Q escape always reports a zero identifier
+ for lines output by  because
+ log_statement generates output before an
+ identifier can be calculated, including invalid statements for
+ which an identifier cannot be calculated.
+
+   
   
  
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 787f062f9c..a368101103 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -398,6 +398,7 @@ pgstat_bestart(void)
 	lbeentry.st_state = STATE_UNDEFINED;
 	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
 	lbeentry.st_progress_command_target = InvalidOid;
+	lbeentry.st_query_id = UINT64CONST(0);
 
 	/*
 	 * we don't zero st_progress_param here to save cycles; nobody should

Re: pg_amcheck option to install extension

2021-04-20 Thread Robert Haas

On Tue, Apr 20, 2021 at 12:05 PM Tom Lane  wrote:
> > I think the distinction I would draw is between things we would expect
> > to be present in every Postgres installation (e.g. pg_stat_statements,
> > auto_explain, postgres_fdw, hstore) and things we don't for one reason
> > or another (e.g. pgcrypto, adminpack)
>
> I dunno, that division appears quite arbitrary and endlessly
> bikesheddable.

+1. I wouldn't expect those things to be present in every
installation, for sure. I don't know that I've *ever* seen a customer
use hstore. If I have, it wasn't often. There's no way we'll ever get
consensus on which stuff people use, because it's different depending
on what customers you work with.

The stuff I feel bad about is stuff like 'isn' and 'earthdistance' and
'intarray', which are basically useless toys with low code quality.
You'd hate for people to confuse that with stuff like 'dblink' or
'pgcrypto' which might actually be useful. But there's a big, broad
fuzzy area in the middle where everyone is going to have different
opinions. And even things like 'isn' and 'earthdistance' and
'intarray' may well have defenders, either because somebody thinks
it's valuable as a coding example, or because somebody really did use
it in anger and had success.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Tomas Vondra

On 4/20/21 6:23 PM, Aleksander Alekseev wrote:
> Hi Ondřej,
> 
> Thanks for the report. It seems to be a clear violation of what is
> promised in the docs. Although it's unlikely that someone implemented
> an application which deals with important data and "pressed Ctr+C" as
> it's done in psql. So this might be not such a critical issue after
> all. BTW what version of PostgreSQL are you using?
> 

Which part of the docs does this contradict?

With Ctrl+C the application *did not* receive confirmation - the commit
was interrupted before fully completing. In a way, it's about the same
situation as if a regular commit was interrupted randomly. It might have
happened before the commit log got updated, or maybe right after it,
which determines the outcome.

What I find a bit strange is that this inserts 1, 2, 2, 2 locally, and
yet we end up with just two rows with 2 (before the update). I don't see
why a network outage should have such consequence.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: Add GSSAPI ccache_name option to libpq

2021-04-20 Thread Daniel Carter


Hi Aleksander,

On 20/04/2021 11:30, Aleksander Alekseev wrote:

Hi Daniel,


It's my first go at submitting a patch -- it works as far as I can tell,
but I suspect there will probably still be stuff to fix before it's
ready to use!


You are doing great :)


Thanks for the encouragement!


There are several other things worth checking:
0. Always run `make distclean` before following steps
1. Make sure `make -j4 world && make -j4 check-world` passes
2. Make sure `make install-world` and `make installcheck-world` passes
3. Since you are changing the documentation it's worth checking that
it displays properly. The documentation is in the
$(PGINSTALL)/share/doc/postgresql/html directory

Several years ago I published some scripts that simplify all this a
little: https://github.com/afiskon/pgscripts, especially step 3. They
may require some modifications for your OS of choice. Please read
https://wiki.postgresql.org/wiki/Submitting_a_Patch for more
information.


Thanks for the advice (and the script repository).

One thing this has identified is an implicit declaration error on the 
gss_krb5_ccache_name call (the code was still working so I presume it 
must get included at some point, although I can't see exactly where).


This can be fixed easily enough just by adding a `#include 
` line to libpq-int.h, although I don't know 
whether this wants to be treated differently because (as far as I can 
tell) it's a Kerberos-specific feature rather than something which any 
GSSAPI service could use (hence it being in gssapi_krb5.h rather than 
gssapi.h) and so might end up breaking other things?


(It looks like current versions of both MIT Kerberos and Heimdal use 
 rather than , although Heimdal previously 
had all its GSSAPI functionality, including this gss_krb5_ccache_name 
function, in .)



Generally speaking, it also a good idea to add some test cases for
your code, although I understand why it might be a little complicated
in this particular case. Maybe you could at least tell us how it can
be checked manually that this code actually does what is supposed to?


Something like the following code hopefully demonstrates how it's 
supposed to work:



const char *conninfo = "dbname='test' user='test' host='krb.local' port='5432' 
ccache_name='/home/user/test/krb5cc_1000'";
PGconn *conn;

conn = PQconnectdb(conninfo);

if(PQstatus(conn) != CONNECTION_OK) {
fprintf(stderr, "Connection to database failed: %s\n", 
PQerrorMessage(conn));
} else {
printf("Connection succeeded\n");
}
PQfinish(conn);


Hopefully this example gives some sort of guide to its intended purpose 
-- the ccache_name parameter in the connection string specifies a 
(non-standard) location for the credential cache, which is then used by 
libpq to fetch data from the database via GSSAPI authentication.


Many thanks,
Daniel

Re: Synchronous commit behavior during network outage

2021-04-20 Thread Aleksander Alekseev

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka  wrote:
>
> Hello all,
> I would like to know your opinion on the following behaviour I see for 
> PostgreSQL setup with synchronous replication.
>
> This behaviour happens in a special use case. In this use case, there are 2 
> synchronous replicas with the following config (truncated):
>
> - 2 nodes
> - synchronous_standby_names='*'
> - synchronous_commit=remote_apply
>
>
> With this setup run the following steps (LAN down - LAN between master and 
> replica):
> -
> postgres=# truncate table a;
> TRUNCATE TABLE
> postgres=# insert into a values (1); -- LAN up, insert has been applied to 
> replica.
> INSERT 0 1
> Vypnu LAN na serveru se standby:
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
> from sync replica. In this situation cancel it (press CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> There will be warning that commit was performed only locally:
> 2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for synchronous 
> replication due to user request
> 2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has already 
> committed locally, but might not have been replicated to the standby.
>
> postgres=# insert into a values (2); --LAN down, waiting for a confirmation 
> from sync replica. In this situation cancel it (press CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> postgres=# insert into a values (2); --LAN down, waiting for sync replica, 
> second attempt, cancel it as well (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> INSERT 0 1
> postgres=# update a set n=3 where n=2; --LAN down, waiting for sync replica, 
> cancel it (CTRL+C)
> ^CCancel request sent
> WARNING:  canceling wait for synchronous replication due to user request
> DETAIL:  The transaction has already committed locally, but might not have 
> been replicated to the standby.
> UPDATE 2
> postgres=# update a set n=3 where n=2; -- run the same update,because data 
> from the previous attempt was commited on master, it is sucessfull, but no 
> changes
> UPDATE 0
> postgres=# select * from a;
>  n
> ---
>  1
>  3
>  3
> (3 rows)
> postgres=#
> 
>
> Now, there is only value 1 in the sync replica table (no other values), data 
> is not in sync. This is expected, after the LAN restore, data will come sync 
> again, but if the main/primary node will fail and we failover to replica 
> before the LAN is back up or the storage for this node would be destroyed and 
> data would not sync to replica before it, we will lose data even if the 
> client received successful commit (with a warning).
> From the synchronous_commit=remote_write level and "higher", I would expect, 
> that when the remote application (doesn't matter if flush, write or apply) 
> would not be applied I would not receive a confirmation about the commit 
> (even with a warning). Something like, if there is no commit from sync 
> replica, there is no commit on primary and if someone performs the steps 
> above, the whole transaction will not send a confirmation.
>
> This can cause issues if the application receives a confirmation about the 
> success and performs some follow-up steps e.g. create a user account and 
> sends a request to the mail system to create an account or create a VPN 
> account. If the scenario above happens, there can exist a VPN account that 
> does not have any presence in the central database and can be a security 
> issue.
>
> I hope I explained it sufficiently. :-)
>
> Do you think, that would be possible to implement a process that would solve 
> this use case?
>
> Thank you
> Ondrej



-- 
Best regards,
Aleksander Alekseev

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-20 Thread Bruce Momjian

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:
> On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
> > On 2021-Apr-12, Bruce Momjian wrote:
> > 
> > > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> > > have not changed any of the APIs which existed before this feature was
> > > added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> > > I assume I should leave those unchanged.  It will also need a catversion
> > > bump.
> > 
> > I think it is fine actually.  These names appear in structs Query and
> > PlannedStmt, and every single member of those already uses camelCase
> > naming.  Changing those to use "query_id" would look out of place.
> > You did change the one in PgBackendStatus to st_query_id, which also
> > matches the naming style in that struct, so that looks fine also.
> > 
> > So I'm -1 on Julien's first proposed change, and +1 on his second
> > proposed change (the name of the first argument of
> > pgstat_report_query_id should be query_id).
> 
> Thanks for your analysis.  Updated patch attached with the change
> suggested above.

Patch applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: `make check` doesn't pass on MacOS Catalina



On 4/20/21 11:02 AM, Tom Lane wrote:
> Aleksander Alekseev  writes:
>> While trying to build PostgreSQL from source (master branch, 95c3a195) on a
>> MacBook I discovered that `make check` fails:
> This is the usual symptom of not having disabled SIP :-(.
>
> If you don't want to do that, do "make install" before "make check".
>
>   




FYI the buildfarm client has a '--delay-check' option that does exactly
this. It's useful on Alpine Linux as well as MacOS


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pg_amcheck option to install extension

Andrew Dunstan  writes:
> On 4/20/21 11:09 AM, Tom Lane wrote:
>> Indeed.  But I'm down on this idea of inventing src/extensions,
>> because then there will constantly be questions about whether FOO
>> belongs in contrib/ or src/extensions/.

> I think the distinction I would draw is between things we would expect
> to be present in every Postgres installation (e.g. pg_stat_statements,
> auto_explain, postgres_fdw, hstore) and things we don't for one reason
> or another (e.g. pgcrypto, adminpack)

I dunno, that division appears quite arbitrary and endlessly
bikesheddable.  It's something I'd prefer not to spend time
arguing about, but the only way we won't have such arguments
is if we don't make the distinction in the first place.

regards, tom lane

Re: pg_amcheck option to install extension



On 4/20/21 11:09 AM, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Actually I think the best balance would be to leave things where they
>> are, and move amcheck to src/extensions/ once the next devel cycle
>> opens.  That way, we avoid the (pretty much pointless) laborious task of
>> moving pg_amcheck to contrib only to move it back on the next cycle.
>> What I'm afraid of, if we move pg_amcheck to contrib, is that during the
>> next cycle people will say that they are both perfectly fine in contrib/
>> and so we don't need to move anything anywhere.
> Indeed.  But I'm down on this idea of inventing src/extensions,
> because then there will constantly be questions about whether FOO
> belongs in contrib/ or src/extensions/.  Unless we just move
> everything there, and then the question becomes why bother.  Sure,
> "contrib" is kind of a legacy name, but PG is full of legacy names.
>
>   



I think the distinction I would draw is between things we would expect
to be present in every Postgres installation (e.g. pg_stat_statements,
auto_explain, postgres_fdw, hstore) and things we don't for one reason
or another (e.g. pgcrypto, adminpack)


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: `make check` doesn't pass on MacOS Catalina

2021-04-20 Thread Aleksander Alekseev

Hi Tom,

Many thanks, running "make install" before "make check" helped.


On Tue, Apr 20, 2021 at 6:02 PM Tom Lane  wrote:
>
> Aleksander Alekseev  writes:
> > While trying to build PostgreSQL from source (master branch, 95c3a195) on a
> > MacBook I discovered that `make check` fails:
>
> This is the usual symptom of not having disabled SIP :-(.
>
> If you don't want to do that, do "make install" before "make check".
>
> regards, tom lane



-- 
Best regards,
Aleksander Alekseev

Docs: Move parallel_leader_participation GUC description under relevant category

Hi,

It looks like even though the commit e5253fdc4f that added the
parallel_leader_participation GUC correctly categorized it as
RESOURCES_ASYNCHRONOUS parameter in the code, but in the docs it is kept
under irrelevant section i.e. "Query Planning/Other Planner Options". This
is reported in the bugs list [1], cc-ed the reporter.

Attaching a small patch that moves the GUC description to the right place.
Thoughts?

[1]
https://www.postgresql.org/message-id/16972-42d4b0c15aa1d5f5%40postgresql.org

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From a26f043368173c1b690ed0d170f2d6286ba9f898 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 20 Apr 2021 13:32:02 +0530
Subject: [PATCH v1] Move parallel_leader_participation to Resource Consumption
 Category

parallel_leader_partiticipation GUC is rightly categorized as
Resource Usage/Asynchronous Behavior parameter in the code, but in
the docs, it is still under Query Planning/Other Planner Options
category. Just move it to the correct place i.e. under Resource
Consumption/Asynchronous Behaviour.
---
 doc/src/sgml/config.sgml | 46 
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 776ab1a8c8..a4dcee1fc8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2520,6 +2520,29 @@ include_dir 'conf.d'

   
 
+  
+   
+   parallel_leader_participation (boolean)
+   
+parallel_leader_participation configuration parameter
+   
+   
+   
+
+ Allows the leader process to execute the query plan under
+ Gather and Gather Merge nodes
+ instead of waiting for worker processes.  The default is
+ on.  Setting this value to off
+ reduces the likelihood that workers will become blocked because the
+ leader is not reading tuples fast enough, but requires the leader
+ process to wait for worker processes to start up before the first
+ tuples can be produced.  The degree to which the leader can help or
+ hinder performance depends on the plan type, number of workers and
+ query duration.
+
+   
+ 
+
   
max_parallel_maintenance_workers (integer)

@@ -5889,29 +5912,6 @@ SELECT * FROM parent WHERE key = 2400;
   
  
 
- 
-  
-   parallel_leader_participation (boolean)
-   
-parallel_leader_participation configuration parameter
-   
-  
-  
-   
-Allows the leader process to execute the query plan under
-Gather and Gather Merge nodes
-instead of waiting for worker processes.  The default is
-on.  Setting this value to off
-reduces the likelihood that workers will become blocked because the
-leader is not reading tuples fast enough, but requires the leader
-process to wait for worker processes to start up before the first
-tuples can be produced.  The degree to which the leader can help or
-hinder performance depends on the plan type, number of workers and
-query duration.
-   
-  
- 
-
  
   plan_cache_mode (enum)
   
-- 
2.25.1

Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

James Coleman  writes:
> On Mon, Apr 19, 2021 at 7:10 PM Tom Lane  wrote:
>> After some more testing, that seems like a good thing to do,
>> so here's a v4.

> This all looks good to me.

Pushed, thanks for reviewing!

regards, tom lane

Re: pg_amcheck option to install extension

2021-04-20 Thread Mark Dilger

> On Apr 20, 2021, at 5:54 AM, Robert Haas  wrote:
> 
> On Tue, Apr 20, 2021 at 1:31 AM Mark Dilger
>  wrote:
>> I think you are conflating the concept of an operating system adminstrator 
>> with the concept of the database superuser/owner.
> 
> You should conflate those things, because there's no meaningful
> privilege boundary between them:

This discussion started in response to the idea that pg_amcheck needs to be 
moved into contrib, presumably because that's where amcheck lives.  I am 
arguing against the move.

The actual use case I have in mind is "Postgres as a service", where a company 
(Alice) rents space in the cloud and runs postgres databases which can be 
rented out to a tenant (Bob) who is the owner of his database, but not 
privileged on the underlying system in any way.  Bob has enough privileges to 
run CREATE EXTENSION, but is limited to the extensions that Alice has made 
available.  Alice evaluates packages and chooses not to install most of them, 
including amcheck.  Or perhaps Alice chooses not to install any contrib 
modules.  Either way, the location of amcheck in contrib is useful to Alice 
because it makes her choice not to install it very simple.

Bob, however, is connecting to databases provided by Alice, and is not trying 
to limit himself.  He is happy to have the pg_amcheck client installed.  If 
Alice's databases don't allow him to run amcheck, pg_amcheck is not useful 
relative to those databases, but perhaps Bob is also renting database space 
from Charlie and Charlie's databases have amcheck installed.

Now, the question is, "In which postgres package does Bob think pg_amcheck 
should reside?"  It would be strange to say that Bob needs to install the 
postgresql-contrib rpm in order to get the pg_amcheck client.  That rpm is 
mostly a bunch of modules, and may even have a package dependency on 
postgresql-server.  Bob doesn't want either of those.  He just wants the 
clients.

The discussion about using amcheck as a privilege escalation attack was mostly 
to give some background for why Alice might not want to install amcheck.  I 
think it got a bit out of hand, in no small part because I was being imprecise 
about Bob's exact privilege levels.  I was being imprecise about that part 
because my argument wasn't "here's how to leverage amcheck to exploit 
postgres", but rather, "here's what Alice might rationally be concerned about." 
 To run CREATE EXTENSION does not actually require superuser privileges.  It 
depends on the package.  At the moment, you can't load amcheck without 
superuser privileges, but you can load some others, such as intarray:

bob=> create extension amcheck;
2021-04-20 07:40:46.758 PDT [80340] ERROR:  permission denied to create 
extension "amcheck"
2021-04-20 07:40:46.758 PDT [80340] HINT:  Must be superuser to create this 
extension.
2021-04-20 07:40:46.758 PDT [80340] STATEMENT:  create extension amcheck;
ERROR:  permission denied to create extension "amcheck"
HINT:  Must be superuser to create this extension.
bob=> create extension intarray;
CREATE EXTENSION
bob=> 

Alice might prefer to avoid installing amcheck altogether, not wanting to have 
to evaluate on each upgrade whether the privileges necessary to load amcheck 
have changed.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: when the startup process doesn't

2021-04-20 Thread Jehan-Guillaume de Rorthais

On Tue, 20 Apr 2021 15:04:28 +0200
Magnus Hagander  wrote:
[...]
> Yeah, I think we should definitely limit this to local access, one way
> or another. Realistically using pg_hba is going to require catalog
> access, isn't it? And we can't just go ignore those rows in pg_hba
> that for example references role membership (as well as all those auth
> methods you can't use).

Two another options:

1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

2. maybe the startup process could ignore update_process_title? As far
as I understand the doc correctly, this GUC is mostly useful for backends on
Windows.

Regards,

Re: pg_amcheck option to install extension

Alvaro Herrera  writes:
> Actually I think the best balance would be to leave things where they
> are, and move amcheck to src/extensions/ once the next devel cycle
> opens.  That way, we avoid the (pretty much pointless) laborious task of
> moving pg_amcheck to contrib only to move it back on the next cycle.

> What I'm afraid of, if we move pg_amcheck to contrib, is that during the
> next cycle people will say that they are both perfectly fine in contrib/
> and so we don't need to move anything anywhere.

Indeed.  But I'm down on this idea of inventing src/extensions,
because then there will constantly be questions about whether FOO
belongs in contrib/ or src/extensions/.  Unless we just move
everything there, and then the question becomes why bother.  Sure,
"contrib" is kind of a legacy name, but PG is full of legacy names.

regards, tom lane

Re: pg_amcheck option to install extension



On 4/20/21 8:47 AM, Robert Haas wrote:
> On Mon, Apr 19, 2021 at 2:55 PM Andrew Dunstan  wrote:
>> There are at least two other client side programs in contrib. So this
>> argument doesn't quite hold water from a consistency POV.
> I thought that at first, too. But then I realized that those programs
> are oid2name and vacuumlo. And oid2name, at least, seems like
> something we ought to just consider removing. It's unclear why this is
> something that really deserves a command-line utility rather than just
> some additional psql options or something. Does anyone really use it?
>
> vacuumlo isn't that impressive either, since it makes the very tenuous
> assumption that an oid column is intended to reference a large object,
> and the documentation doesn't even acknowledge what a shaky idea that
> actually is. But I suspect it has much better chances of being useful
> in practice than oid2name. In fact, I've heard of people using it and,
> I think, finding it useful, so we probably don't want to just nuke it.
>
> But the point is, as things stand today, almost everything in contrib
> is an extension, not a binary. And we might want to view the
> exceptions as loose ends to be cleaned up, rather than a pattern to
> emulate.
>
> It's a judgement call, though.
>


Yeah. I'll go along with Alvaro and say let's let sleeping dogs lie at
this stage of the dev process, and pick the discussion up after we branch.


I will just note one thing: the binaries in contrib have one important
function that hasn't been mentioned, namely to test using pgxs to build
binaries. That doesn't have to live in contrib, but we should have
something that does that somewhere in the build process, so if we
remmove oid2name and vacuumlo from contrib we need to look into that.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Free port choosing freezes when PostgresNode::use_tcp is used on BSD systems

Andrew Dunstan  writes:
> On 4/19/21 7:22 PM, Tom Lane wrote:
>> I wonder whether we could get away with just replacing the $use_tcp
>> test with $TestLib::windows_os.  It's not really apparent to me
>> why we should care about 127.0.0.not-1 on Unix-oid systems.

> Yeah
> The comment is a bit strange anyway - Cygwin is actually going to use
> Unix sockets, not TCP.
> I think I would just change the test to this: $use_tcp &&
> $TestLib::windows_os.

Works for me, but we need to revise the comment to match.

regards, tom lane

Re: `make check` doesn't pass on MacOS Catalina

Aleksander Alekseev  writes:
> While trying to build PostgreSQL from source (master branch, 95c3a195) on a
> MacBook I discovered that `make check` fails:

This is the usual symptom of not having disabled SIP :-(.

If you don't want to do that, do "make install" before "make check".

regards, tom lane

Re: Free port choosing freezes when PostgresNode::use_tcp is used on BSD systems