Re: Printing LSN made easy

2021-01-19 Thread Ashutosh Bapat
On Wed, Jan 20, 2021 at 11:55 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-11-27 11:40, Ashutosh Bapat wrote:
> > The solution seems to be simple though. In the attached patch, I have
> > added two macros
> > #define LSN_FORMAT "%X/%X"
> > #define LSN_FORMAT_ARG(lsn) (uint32) ((lsn) >> 32), (uint32) (lsn)
> >
> > which can be used instead.
>
> It looks like we are not getting any consensus on this approach.  One
> reduced version I would consider is just the second part, so you'd write
> something like
>
>  snprintf(lsnchar, sizeof(lsnchar), "%X/%X",
>   LSN_FORMAT_ARGS(lsn));
>
> This would still reduce notational complexity quite a bit but avoid any
> funny business with the format strings.
>

Thanks for looking into this. I would like to keep both the LSN_FORMAT and
LSN_FORMAT_ARGS but with a note that the first can not be used in elog() or
in messages which require localization. We have many other places doing
snprintf() and such stuff, which can use LSN_FORMAT. If we do so, the
functions to output string representation will not be needed so they can be
removed.

--
Best Wishes,
Ashutosh


Re: Discarding DISCARD ALL

2021-01-19 Thread Peter Eisentraut

On 2020-12-23 15:33, Simon Riggs wrote:

Poolers such as pgbouncer would then be able to connect transaction
mode pools by setting transaction_cleanup=on at time of connection,
avoiding any need to issue a server_reset_query, removing the DISCARD
ALL command from the normal execution path, while still achieving the
same thing.


PgBouncer does not send DISCARD ALL in transaction mode.  There is a 
separate setting to do that, but it's not the default, and it's more of 
a workaround for bad client code.  So I don't know if this feature would 
be of much use for PgBouncer.  Other connection poolers might have other 
opinions.






Re: Printing LSN made easy

2021-01-19 Thread Michael Paquier
On Wed, Jan 20, 2021 at 07:25:37AM +0100, Peter Eisentraut wrote:
> It looks like we are not getting any consensus on this approach.  One
> reduced version I would consider is just the second part, so you'd write
> something like
> 
> snprintf(lsnchar, sizeof(lsnchar), "%X/%X",
>  LSN_FORMAT_ARGS(lsn));
> 
> This would still reduce notational complexity quite a bit but avoid any
> funny business with the format strings.

That seems reasonable to me.  So +1.
--
Michael


signature.asc
Description: PGP signature


Re: Release SPI plans for referential integrity with DISCARD ALL

2021-01-19 Thread Peter Eisentraut

On 2021-01-13 09:47, yuzuko wrote:

But we are also considering another option to solve this problem, which
makes users to release cached SPI plans for referential integrity as well as
plain cached plans with DISCARD ALL.  To do that, we added a new
function, RI_DropAllPreparedPlan() which deletes all plans from
ri_query_cache and
modified DISCARD ALL to execute that function.


I don't have a comment on the memory management issue, but I think the 
solution of dropping all cached plans as part of DISCARD ALL seems a bit 
too extreme of a solution.  In the context of connection pooling, 
getting a new session with pre-cached plans seems like a good thing, and 
this change could potentially have a performance impact without a 
practical way to opt out.






Re: a misbehavior of partition row movement (?)

2021-01-19 Thread Peter Eisentraut

On 2021-01-08 09:54, Amit Langote wrote:

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

I tried to look that up, but I couldn't find any discussion about this. Do you 
have any ideas in which thread that was handled?

It was discussed here:

https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com

It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
relevant emails.  You might notice that the developers who
participated in that discussion gave various opinions and what we have
today got there as a result of a majority of them voting for the
current approach.  Someone also said this during the discussion:
"Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this. I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people." So what we've got is that
"halfway-reasonable" thing, YMMV. :)


Could you summarize here what you are trying to do with respect to what 
was decided before?  I'm a bit confused, looking through the patches you 
have posted.  The first patch you posted hard-coded FK trigger OIDs 
specifically, other patches talk about foreign key triggers in general 
or special case internal triggers or talk about all triggers.





Re: Boundary value check in lazy_tid_reaped()

2021-01-19 Thread Peter Eisentraut

On 2020-10-30 02:43, Masahiko Sawada wrote:

Using the integer set is very memory efficient (5MB vs. 114MB in the
base case) and there is no 1GB limitation. Looking at the execution
time, I had expected that using the integer set is slower on recording
TIDs than using the simple array but in this experiment, there is no
big difference among methods. Perhaps the result will vary if vacuum
needs to record much more dead tuple TIDs. From the results, I can see
a good improvement in the integer set case and probably a good start
but if we want to use it also for lazy vacuum, we will need to improve
it so that it can be used on DSA for the parallel vacuum.

I've attached the patch I used for the experiment that adds xx_vacuum
GUC parameter that controls the method of recording TIDs. Please note
that it doesn't support parallel vacuum.


How do you want to proceed here?  The approach of writing a wrapper for 
bsearch with bound check sounded like a good start.  All the other ideas 
discussed here seem larger projects and would probably be out of scope 
of this commit fest.





Re: list of extended statistics on psql

2021-01-19 Thread Tatsuro Yamada

Hi Tomas,

On 2021/01/20 11:35, Tatsuro Yamada wrote:

Apologies for all the extra work - I haven't realized this flaw when pushing 
for showing more stuff :-(


Don't worry about it. We didn't notice the problem even when viewed by multiple
people on -hackers. Let's keep moving forward. :-D

I'll send a patch including a regression test on the next patch.



I created patches and my test results on PG10, 11, 12, and 14 are fine.

  0001:
- Fix query to use pg_statistic_ext only
- Replace statuses "required" and "built" with "defined"
- Remove the size columns
- Fix document
- Add schema name as a filter condition on the query

  0002:
- Fix all results of \dX
- Add new testcase by non-superuser

Please find attached files. :-D


Regards,
Tatsuro Yamada
From 1aac3df2af2f6c834ffab10ddd1be1dee5970eb3 Mon Sep 17 00:00:00 2001
From: Tatsuro Yamada 
Date: Wed, 20 Jan 2021 15:33:04 +0900
Subject: [PATCH 2/2] psql \dX regression test

Add a test by non-superuser
---
 src/test/regress/expected/stats_ext.out | 116 
 src/test/regress/sql/stats_ext.sql  |  37 ++
 2 files changed, 153 insertions(+)

diff --git a/src/test/regress/expected/stats_ext.out 
b/src/test/regress/expected/stats_ext.out
index f094731e32..0ff4e51055 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1727,6 +1727,122 @@ INSERT INTO tststats.priv_test_tbl
 CREATE STATISTICS tststats.priv_test_stats (mcv) ON a, b
   FROM tststats.priv_test_tbl;
 ANALYZE tststats.priv_test_tbl;
+-- Check printing info about extended statistics by \dX
+create table stts_t1 (a int, b int);
+create statistics stts_1 (ndistinct) on a, b from stts_t1;
+create statistics stts_2 (ndistinct, dependencies) on a, b from stts_t1;
+create statistics stts_3 (ndistinct, dependencies, mcv) on a, b from stts_t1;
+create table stts_t2 (a int, b int, c int);
+create statistics stts_4 on b, c from stts_t2;
+create table stts_t3 (col1 int, col2 int, col3 int);
+create statistics stts_hoge on col1, col2, col3 from stts_t3;
+create schema stts_s1;
+create schema stts_s2;
+create statistics stts_s1.stts_foo on col1, col2 from stts_t3;
+create statistics stts_s2.stts_yama (dependencies, mcv) on col1, col3 from 
stts_t3;
+insert into stts_t1 select i,i from generate_series(1,100) i;
+analyze stts_t1;
+\dX
+  List of extended statistics
+  Schema  |  Name  |  Definition  | 
Ndistinct | Dependencies |   MCV   
+--++--+---+--+-
+ public   | func_deps_stat | a, b, c FROM functional_dependencies |
   | defined  | 
+ public   | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays|
   |  | defined
+ public   | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool  |
   |  | defined
+ public   | mcv_lists_stats| a, b, d FROM mcv_lists   |
   |  | defined
+ public   | stts_1 | a, b FROM stts_t1| 
defined   |  | 
+ public   | stts_2 | a, b FROM stts_t1| 
defined   | defined  | 
+ public   | stts_3 | a, b FROM stts_t1| 
defined   | defined  | defined
+ public   | stts_4 | b, c FROM stts_t2| 
defined   | defined  | defined
+ public   | stts_hoge  | col1, col2, col3 FROM stts_t3| 
defined   | defined  | defined
+ stts_s1  | stts_foo   | col1, col2 FROM stts_t3  | 
defined   | defined  | defined
+ stts_s2  | stts_yama  | col1, col3 FROM stts_t3  |
   | defined  | defined
+ tststats | priv_test_stats| a, b FROM tststats.priv_test_tbl |
   |  | defined
+(12 rows)
+
+\dX stts_?
+   List of extended statistics
+ Schema |  Name  |Definition | Ndistinct | Dependencies |   MCV   
+++---+---+--+-
+ public | stts_1 | a, b FROM stts_t1 | defined   |  | 
+ public | stts_2 | a, b FROM stts_t1 | defined   | defined  | 
+ public | stts_3 | a, b FROM stts_t1 | defined   | defined  | defined
+ public | stts_4 | b, c FROM stts_t2 | defined   | defined  | defined
+(4 rows)
+
+\dX *stts_hoge
+   List of extended statistics
+ Schema |   Name|  Definition   | Ndistinct | Dependencies 
|   MCV   
++---+---+---+--+-
+ public | stts_hoge | col1, col2, col3 FROM stts_t3 | defined   | defined  
| defined
+(1 row)
+
+\dX+
+  List of extended statistics
+  Schema  |   

Re: New IndexAM API controlling index vacuum strategies

2021-01-19 Thread Masahiko Sawada
On Wed, Jan 20, 2021 at 9:45 AM Peter Geoghegan  wrote:
>
> On Tue, Jan 19, 2021 at 2:57 PM Peter Geoghegan  wrote:
> > * Maybe it would be better if you just changed the definition such
> > that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
> > no other changes? (Some variant of this suggestion might be better,
> > not sure.)
> >
> > For some reason that feels a bit safer: we still have an "imaginary
> > tuple header", but it's just 1 MAXALIGN() quantum now. This is still
> > much less than the current 3 MAXALIGN() quantums (i.e. what
> > MaxHeapTuplesPerPage treats as the tuple header size). Do you think
> > that this alternative approach will be noticeably less effective
> > within vacuumlazy.c?
>
> BTW, I think that increasing MaxHeapTuplesPerPage will make it
> necessary to consider tidbitmap.c. Comments at the top of that file
> say that it is assumed that MaxHeapTuplesPerPage is about 256. So
> there is a risk of introducing performance regressions affecting
> bitmap scans here.
>
> Apparently some other DB systems make the equivalent of
> MaxHeapTuplesPerPage dynamically configurable at the level of heap
> tables. It usually doesn't matter, but it can matter with on-disk
> bitmap indexes, where the bitmap must be encoded from raw TIDs (this
> must happen before the bitmap is compressed -- there must be a simple
> mapping from every possible TID to some bit in a bitmap first). The
> item offset component of each heap TID is not usually very large, so
> there is a trade-off between keeping the representation of bitmaps
> efficient and not unduly restricting the number of distinct heap
> tuples on each heap page. I think that there might be a similar
> consideration here, in tidbitmap.c (even though it's not concerned
> about on-disk bitmaps).

That's a good point. With the patch, MaxHeapTuplesPerPage increased to
2042 with 8k page, and to 8186 with 32k page whereas it's currently
291 with 8k page and 1169 with 32k page. So it is likely to be a
problem as you pointed out. If we change
"MAXALIGN(SizeofHeapTupleHeader)" to "MAXIMUM_ALIGNOF", it's 680 with
8k patch and 2728 with 32k page, which seems much better.

The purpose of increasing MaxHeapTuplesPerPage in the patch is to have
a heap page accumulate more LP_DEAD line pointers. As I explained
before, considering MaxHeapTuplesPerPage, we cannot calculate how many
LP_DEAD line pointers can be accumulated into the space taken by
fillfactor simply by ((the space taken by fillfactor) / (size of line
pointer)). We need to consider both how many line pointers are
available for LP_DEAD and how much space is available for LP_DEAD.

For example, suppose the tuple size is 50 bytes and fillfactor is 80,
each page has 1633 bytes (=(8192-24)*0.2) free space taken by
fillfactor, where 408 line pointers can fit. However, if we store 250
LP_DEAD line pointers into that space, the number of tuples that can
be stored on the page is only 41, although we have 6534 bytes
(=(8192-24)*0.8) where 121 tuples (+line pointers) can fit because
MaxHeapTuplesPerPage is 291. In this case, where the tuple size is 50
and fillfactor is 80, we can accumulate up to about 170 LP_DEAD line
pointers while storing 121 tuples. Increasing MaxHeapTuplesPerPage
raises this 291 limit and enables us to forget the limit when
calculating the maximum number of LP_DEAD line pointers that can be
accumulated on a single page.

An alternative approach would be to calculate it using the average
tuple's size. I think if we know the tuple size, the maximum number of
LP_DEAD line pointers can be accumulated into the single page is the
minimum of the following two formula:

(1) MaxHeapTuplesPerPage - (((BLCKSZ - SizeOfPageHeaderData) *
(fillfactor/100)) / (sizeof(ItemIdData) + tuple_size))); //how many
line pointers are available for LP_DEAD?

(2) ((BLCKSZ - SizeOfPageHeaderData) * ((1 - fillfactor)/100)) /
sizeof(ItemIdData); //how much space is available for LP_DEAD?

But I'd prefer to increase MaxHeapTuplesPerPage but not to affect the
bitmap much rather than introducing a complex theory.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-19 Thread Tom Lane
Peter Eisentraut  writes:
> This is the first I've heard in years of shared memory limits being a 
> problem on macOS.  What settings or what levels of concurrency do you 
> use to provoke these errors?

I suppose it wouldn't be too hard to run into shmmni with aggressive
parallel testing; the default is just 32.  But AFAIK it still works
to set the shm limits via /etc/sysctl.conf.  (I wonder if you have
to disable SIP to mess with that, though.)

regards, tom lane




Re: Odd, intermittent failure in contrib/pageinspect

2021-01-19 Thread Michael Paquier
On Tue, Jan 19, 2021 at 05:03:49PM -0500, Tom Lane wrote:
> It looks to me like heap_surgery ought to be okay, because it's operating
> on a temp table; if there are any page access conflicts on that, we've
> got BIG trouble ;-)

Bah, of course.  I managed to miss this part.

> Poking around, I found a few other places where it looked like a skipped
> page could produce diffs in the expected output:
> contrib/amcheck/t/001_verify_heapam.pl
> contrib/pg_visibility/sql/pg_visibility.sql
> 
> There are lots of other vacuums of course, but they don't look like
> a missed page would have any effect on the visible results, so I think
> we should leave them alone.

Yeah, I got to wonder a bit about check_btree.sql on a second look,
but that's no big deal to leave it alone either.

> In short I propose the attached patch, which also gets rid of
> that duplicate query.

Agreed, +1.
--
Michael


signature.asc
Description: PGP signature


Re: Printing LSN made easy

2021-01-19 Thread Peter Eisentraut

On 2020-11-27 11:40, Ashutosh Bapat wrote:

The solution seems to be simple though. In the attached patch, I have
added two macros
#define LSN_FORMAT "%X/%X"
#define LSN_FORMAT_ARG(lsn) (uint32) ((lsn) >> 32), (uint32) (lsn)

which can be used instead.


It looks like we are not getting any consensus on this approach.  One 
reduced version I would consider is just the second part, so you'd write 
something like


snprintf(lsnchar, sizeof(lsnchar), "%X/%X",
 LSN_FORMAT_ARGS(lsn));

This would still reduce notational complexity quite a bit but avoid any 
funny business with the format strings.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-19 Thread Fujii Masao




On 2021/01/19 12:09, Bharath Rupireddy wrote:

On Mon, Jan 18, 2021 at 9:11 PM Fujii Masao  wrote:

Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
function, 0002 is for keep_connections GUC and 0003 is for
keep_connection server level option.


Thanks!



Please review it further.


+   server = GetForeignServerByName(servername, true);
+
+   if (!server)
+   ereport(ERROR,
+   
(errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
+errmsg("foreign server \"%s\" does not 
exist", servername)));

ISTM we can simplify this code as follows.

  server = GetForeignServerByName(servername, false);


Done.


+   hash_seq_init(, ConnectionHash);
+   while ((entry = (ConnCacheEntry *) hash_seq_search()))

When the server name is specified, even if its connection is successfully
closed, postgres_fdw_disconnect() scans through all the entries to check
whether there are active connections. But if "result" is true and
active_conn_exists is true, we can get out of this loop to avoid unnecessary
scans.


My initial thought was that it's possible to have two entries with the
same foreign server name but with different user mappings, looks like
it's not possible. I tried associating a foreign server with two
different user mappings [1], then the cache entry is getting
associated initially with the user mapping that comes first in the
pg_user_mappings, if this user mapping is dropped then the cache entry
gets invalidated, so next time the second user mapping is used.

Since there's no way we can have two cache entries with the same
foreign server name, we can get out of the loop when we find the cache
entry match with the given server. I made the changes.


So, furthermore, we can use hash_search() to find the target cached
connection, instead of using hash_seq_search(), when the server name
is given. This would simplify the code a bit more? Of course,
hash_seq_search() is necessary when closing all the connections, though.




[1]
postgres=# select * from pg_user_mappings ;
  umid  | srvid |  srvname  | umuser | usename | umoptions
---+---+---++-+---
  16395 | 16394 | loopback1 | 10 | bharath |-> cache entry
is initially made with this user mapping.
  16399 | 16394 | loopback1 |  0 | public  |   -> if the
above user mapping is dropped, then the cache entry is made with this
user mapping.


+   /*
+* Destroy the cache if we discarded all active connections i.e. if 
there
+* is no single active connection, which we can know while scanning the
+* cached entries in the above loop. Destroying the cache is better 
than to
+* keep it in the memory with all inactive entries in it to save some
+* memory. Cache can get initialized on the subsequent queries to 
foreign
+* server.

How much memory is assumed to be saved by destroying the cache in
many cases? I'm not sure if it's really worth destroying the cache to save
the memory.


I removed the cache destroying code, if somebody complains in
future(after the feature commit), we can really revisit then.


+  a warning is issued and false is returned. 
false
+  is returned when there are no open connections. When there are some open
+  connections, but there is no connection for the given foreign server,
+  then false is returned. When no foreign server exists
+  with the given name, an error is emitted. Example usage of the function:

When a non-existent server name is specified, postgres_fdw_disconnect()
emits an error if there is at least one open connection, but just returns
false otherwise. At least for me, this behavior looks inconsitent and strange.
In that case, IMO the function always should emit an error.


Done.

Attaching v13 patch set, please review it further.


Thanks!

+ * 2) If no input argument is provided, then it tries to disconnect all the
+ *connections.

I'm concerned that users can easily forget to specify the argument and
accidentally discard all the connections. So, IMO, to alleviate this situation,
what about changing the function name (only when closing all the connections)
to something postgres_fdw_disconnect_all(), like we have
pg_advisory_unlock_all() against pg_advisory_unlock()?

+   if (result)
+   {
+   /* We closed at least one connection, others 
are in use. */
+   ereport(WARNING,
+   (errmsg("cannot close all 
connections because some of them are still in use")));
+   }

Sorry if this was already discussed upthread. Isn't it more helpful to
emit a warning for every connections that fail to be closed? For example,

WARNING:  cannot close connection for server "loopback1" because it is still in 
use

Re: [HACKERS] Custom compression methods

2021-01-19 Thread Dilip Kumar
On Wed, Jan 20, 2021 at 12:37 AM Justin Pryzby  wrote:
>
> Thanks for updating the patch.

Thanks for the review

> On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby  wrote:
> > The most recent patch doesn't compile --without-lz4:
> On Tue, Jan 05, 2021 at 11:19:33AM +0530, Dilip Kumar wrote:
> > On Mon, Jan 4, 2021 at 10:08 PM Justin Pryzby  wrote:
> > > I think I first saw it on cfbot and I reproduced it locally, too.
> > > http://cfbot.cputube.org/dilip-kumar.html
> > >
> > > I think you'll have to make --without-lz4 the default until the build
> > > environments include it, otherwise the patch checker will show red :(
> >
> > Oh ok,  but if we make by default --without-lz4 then the test cases
> > will start failing which is using lz4 compression.  Am I missing
> > something?
>
> The CIs are failing like this:
>
> http://cfbot.cputube.org/dilip-kumar.html
> |checking for LZ4_compress in -llz4... no
> |configure: error: lz4 library not found
> |If you have lz4 already installed, see config.log for details on the
> |failure.  It is possible the compiler isn't looking in the proper directory.
> |Use --without-lz4 to disable lz4 support.
>
> I thought that used to work (except for windows).  I don't see that anything
> changed in the configure tests...  Is it because the CI moved off travis 2
> weeks ago ?  I don't' know whether the travis environment had liblz4, and I
> don't remember if the build was passing or if it was failing for some other
> reason.  I'm guessing historic logs from travis are not available, if they 
> ever
> were.
>
> I'm not sure how to deal with that, but maybe you'd need:
> 1) A separate 0001 patch *allowing* LZ4 to be enabled/disabled;
> 2) Current patchset needs to compile with/without LZ4, and pass tests in both
> cases - maybe you can use "alternate test" output [0] to handle the "without"
> case.

Okay, let me think about how to deal with this.

> 3) Eventually, the CI and build environments may have LZ4 installed, and then
> we can have a separate debate about whether to enable it by default.
>
> [0] cp -iv src/test/regress/results/compression.out 
> src/test/regress/expected/compression_1.out
>
> On Tue, Jan 05, 2021 at 02:20:26PM +0530, Dilip Kumar wrote:
> > On Tue, Jan 5, 2021 at 11:19 AM Dilip Kumar  wrote:
> > > On Mon, Jan 4, 2021 at 10:08 PM Justin Pryzby  
> > > wrote:
> > > > I see the windows build is failing:
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.123730
> > > > |undefined symbol: HAVE_LIBLZ4 at src/include/pg_config.h line 350 at 
> > > > src/tools/msvc/Mkvcbuild.pm line 852.
> > > > This needs to be patched: src/tools/msvc/Solution.pm
> > > > You can see my zstd/pg_dump patch for an example, if needed (actually 
> > > > I'm not
> > > > 100% sure it's working yet, since the windows build failed for another 
> > > > reason).
> > >
> > > Okay, I will check that.
>
> This still needs help.
> perl ./src/tools/msvc/mkvcbuild.pl
> ...
> undefined symbol: HAVE_LIBLZ4 at src/include/pg_config.h line 350 at 
> /home/pryzbyj/src/postgres/src/tools/msvc/Mkvcbuild.pm line 852.
>
> Fix like:
>
> +   HAVE_LIBLZ4 => $self->{options}->{zlib} ? 1 : 
> undef,

I will do that.

> Some more language fixes:
>
> commit 3efafee52414503a87332fa6070541a3311a408c
> Author: dilipkumar 
> Date:   Tue Sep 8 15:24:33 2020 +0530
>
> Built-in compression method
>
> +  If the compression method is not specified for the compressible type 
> then
> +  it will have the default compression method.  The default compression
>
> I think this should say:
> If no compression method is specified, then compressible types will have the
> default compression method (pglz).
>
> + *
> + * Since version 11 TOAST_COMPRESS_SET_RAWSIZE also marks compressed
>
> Should say v14 ??
>
> diff --git a/src/include/catalog/pg_attribute.h 
> b/src/include/catalog/pg_attribute.h
> index 059dec3647..e4df6bc5c1 100644
> --- a/src/include/catalog/pg_attribute.h
> +++ b/src/include/catalog/pg_attribute.h
> @@ -156,6 +156,14 @@ CATALOG(pg_attribute,1249,AttributeRelationId) 
> BKI_BOOTSTRAP BKI_ROWTYPE_OID(75,
> /* attribute's collation */
> Oid attcollation;
>
> +   /*
> +* Oid of the compression method that will be used for compressing 
> the value
> +* for this attribute.  For the compressible atttypid this must 
> always be a
>
> say "For compressible types, ..."
>
> +* valid Oid irrespective of what is the current value of the 
> attstorage.
> +* And for the incompressible atttypid this must always be an invalid 
> Oid.
>
> say "must be InvalidOid"
>
> @@ -685,6 +686,7 @@ typedef enum TableLikeOption
> CREATE_TABLE_LIKE_INDEXES = 1 << 5,
> CREATE_TABLE_LIKE_STATISTICS = 1 << 6,
> CREATE_TABLE_LIKE_STORAGE = 1 << 7,
> +   CREATE_TABLE_LIKE_COMPRESSION = 1 << 8,
>
> This is interesting...
> I have a patch to implement LIKE .. (INCLUDING ACCESS METHOD).
> 

Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-19 Thread Peter Eisentraut

On 2020-11-22 09:19, James Hilliard wrote:

OSX implements sysv shmem support via a mach wrapper, however the mach
sysv shmem wrapper has some severe restrictions that prevent us from
allocating enough memory segments in some cases.

These limits appear to be due to the way the wrapper itself is
implemented and not mach.

For example when running a test suite that spins up many postgres
instances I commonly see this issue:
DETAIL:  Failed system call was shmget(key=5432002, size=56, 03600).


This is the first I've heard in years of shared memory limits being a 
problem on macOS.  What settings or what levels of concurrency do you 
use to provoke these errors?






Re: pg_class.reltype -> pg_type.oid missing for pg_toast table

2021-01-19 Thread Joel Jacobson
On Tue, Jan 19, 2021, at 17:43, Tom Lane wrote:
>I'm too lazy to check the code right now, but my recollection is that we
>do not bother to make composite-type entries for toast tables.  However,
>they should have reltype = 0 if so, so I'm not quite sure where the
>above failure is coming from.

My apologies, false alarm.

The problem turned out to be due to doing

CREATE TABLE catalog_fks.%1$I AS
SELECT * FROM pg_catalog.%1$I

which causes changes to e.g. pg_catalog.pg_class during the command is running.

Solved by instead using COPY ... TO to first copy catalogs to files on disk,
which doesn't cause changes to the catalogs,
and then using COPY .. FROM to copy the data into the replicated table 
structures.

/Joel

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-19 Thread Peter Geoghegan
On Tue, Jan 19, 2021 at 7:54 PM Amit Kapila  wrote:
> The worst cases could be (a) when there is just one such duplicate
> (indexval logically unchanged) on the page and that happens to be the
> last item and others are new insertions, (b) same as (a) and along
> with it lets say there is an open transaction due to which we can't
> remove even that duplicate version. Have we checked the overhead or
> results by simulating such workloads?

There is no such thing as a workload that has page splits caused by
non-HOT updaters, but almost no actual version churn from the same
non-HOT updaters. It's possible that a small number of individual page
splits will work out like that, of course, but they'll be extremely
rare, and impossible to see in any kind of consistent way.

That just leaves long running transactions. Of course it's true that
eventually a long-running transaction will make it impossible to
perform any cleanup, for the usual reasons. And at that point this
mechanism is bound to fail (which costs additional cycles -- the
wasted access to a single heap page, some CPU cycles). But it's still
a bargain to try. Even with a long running transactions there will be
a great many bottom-up deletion passes that still succeed earlier on
(because at least some of the dups are deletable, and we can still
delete those that became garbage right before the long running
snapshot was acquired).

Victor independently came up with a benchmark that ran over several
hours, with cleanup consistently held back by ~5 minutes by a long
running transaction:

https://www.postgresql.org/message-id/cagnebogatzs1mwmvx8fzzhmxzudecb10anvwwhctxtibpg3...@mail.gmail.com

This was actually one of the most favorable cases of all for the patch
-- the patch prevented logically unchanged indexes from growing (this
is a mix of inserts, updates, and deletes, not just updates, so it was
less than perfect -- we did see the indexes grow by a half of one
percent). Whereas without the patch each of the same 3 indexes grew by
30% - 60%.

So yes, I did think about long running transactions, and no, the
possibility of wasting one heap block access here and there when the
database is melting down anyway doesn't seem like a big deal to me.

> I feel unlike LP_DEAD optimization this new bottom-up scheme can cost
> us extra CPU and I/O because there seems to be not much consideration
> given to the fact that we might not be able to delete any item (or
> very few) due to long-standing open transactions except that we limit
> ourselves when we are not able to remove even one tuple from any
> particular heap page.

There was plenty of consideration given to that. It was literally
central to the design, and something I poured over at length. Why
don't you go read some of that now? Or, why don't you demonstrate an
actual regression using a tool like pgbench?

I do not appreciate being accused of having acted carelessly. You
don't have a single shred of evidence.

The design is counter-intuitive. I think that you simply don't understand it.

> Now, say due to open transactions, we are able
> to remove very few tuples (for the sake of argument say there is only
> 'one' such tuple) from the heap page then we will keep on accessing
> the heap pages without much benefit. I feel extending the deletion
> mechanism based on the number of LP_DEAD items sounds more favorable
> than giving preference to duplicate items. Sure, it will give equally
> good or better results if there are no long-standing open
> transactions.

As Andres says, LP_DEAD bits need to be set by index scans. Otherwise
nothing happens. The simple deletion case can do nothing without that
happening. It's good that it's possible to reuse work from index scans
opportunistically, but it's not reliable.

> > I personally will
> > never vote for a theoretical risk with only a theoretical benefit. And
> > right now that's what the idea of doing bottom-up deletions in more
> > marginal cases (the page flag thing) looks like.
> >
>
> I don't think we can say that it is purely theoretical because I have
> dome shown some basic computation where it can lead to fewer splits.

I'm confused. You realize that this makes it *more* likely that
bottom-up deletion passes will take place, right? It sounds like
you're arguing both sides of the issue at the same time.

-- 
Peter Geoghegan




Re: Deleting older versions in unique indexes to avoid page splits

2021-01-19 Thread Andres Freund
Hi,

On 2021-01-20 09:24:35 +0530, Amit Kapila wrote:
> I feel extending the deletion mechanism based on the number of LP_DEAD
> items sounds more favorable than giving preference to duplicate
> items. Sure, it will give equally good or better results if there are
> no long-standing open transactions.

There's a lot of workloads that never set LP_DEAD because all scans are
bitmap index scans. And there's no obvious way to address that. So I
don't think it's wise to purely rely on LP_DEAD.

Greetings,

Andres Freund




Re: Some coverage for DROP OWNED BY with pg_default_acl

2021-01-19 Thread Michael Paquier
On Tue, Jan 19, 2021 at 05:49:03PM -0300, Alvaro Herrera wrote:
> Heh, interesting case.  Added coverage is good, so +1.

Thanks.  I read through it again and applied the test.

> Since the role regress_priv_user2 is "private" to the privileges.sql
> script, there's no danger of a concurrent test getting the added lines
> in trouble AFAICS.

It seems to me that it could lead to some trouble if a test running in
parallel expects a set of ACLs with no extra noise, as this stuff adds
data to the catalogs for all objects created while the default
permissions are visible.  Perhaps that's an over-defensive position,
but it does not hurt either to be careful similarly to the test run a
couple of lines above.
--
Michael


signature.asc
Description: PGP signature


Re: Stronger safeguard for archive recovery not to miss data

2021-01-19 Thread Fujii Masao




On 2021/01/20 1:05, Laurenz Albe wrote:

On Mon, 2021-01-18 at 07:34 +, osumi.takami...@fujitsu.com wrote:

I noticed that this message should cover both archive recovery modes,
which means in recovery mode and standby mode. Then, I combined your
suggestion above with this point of view. Have a look at the updated patch.
I also enriched the new tap tests to show this perspective.


Looks good, thanks.

I'll mark this patch as "ready for committer".


+errhint("Run recovery again from a new base backup 
taken after setting wal_level higher than minimal")));

Isn't it impossible to do this in normal archive recovery case? In that case,
since the server crashed and the database got corrupted, probably
we cannot take a new base backup.

Originally even when users accidentally set wal_level to minimal, they could
start the server from the backup by disabling hot_standby and salvage the data.
But with the patch, we lost the way to do that. Right? I was wondering that
WARNING was used intentionally there for that case.


if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
ereport(ERROR,
(errmsg("hot standby is not possible because wal_level 
was not set to \"replica\" or higher on the primary server"),
 errhint("Either set wal_level to 
\"replica\" on the primary, or turn off hot_standby here.")));

With the patch, we never reach the above code?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Deleting older versions in unique indexes to avoid page splits

2021-01-19 Thread Amit Kapila
On Tue, Jan 19, 2021 at 3:03 AM Peter Geoghegan  wrote:
>
> On Mon, Jan 18, 2021 at 1:10 PM Victor Yegorov  wrote:
> > I must admit, that it's a bit difficult to understand you here (at least 
> > for me).
> >
> > I assume that by "bet" you mean flagged tuple, that we marked as such
> > (should we implement the suggested case).
> > As heapam will give up early in case there are no deletable tuples, why do 
> > say,
> > that "bet" is no longer low? At the end, we still decide between page split 
> > (and
> > index bloat) vs a beneficial space cleanup.
>
> Well, as I said, there are various ways in which our inferences (say
> the ones in nbtdedup.c) are likely to be wrong. You understand this
> already. For example, obviously if there are two duplicate index
> tuples pointing to the same heap page then it's unlikely that both
> will be deletable, and there is even a fair chance that neither will
> be (for many reasons). I think that it's important to justify why we
> use stuff like that to drive our decisions -- the reasoning really
> matters. It's very much not like the usual optimization problem thing.
> It's a tricky thing to discuss.
>
> I don't assume that I understand all workloads, or how I might
> introduce regressions. It follows that I should be extremely
> conservative about imposing new costs here. It's good that we
> currently know of no workloads that the patch is likely to regress,
> but absence of evidence isn't evidence of absence.
>

The worst cases could be (a) when there is just one such duplicate
(indexval logically unchanged) on the page and that happens to be the
last item and others are new insertions, (b) same as (a) and along
with it lets say there is an open transaction due to which we can't
remove even that duplicate version. Have we checked the overhead or
results by simulating such workloads?

I feel unlike LP_DEAD optimization this new bottom-up scheme can cost
us extra CPU and I/O because there seems to be not much consideration
given to the fact that we might not be able to delete any item (or
very few) due to long-standing open transactions except that we limit
ourselves when we are not able to remove even one tuple from any
particular heap page. Now, say due to open transactions, we are able
to remove very few tuples (for the sake of argument say there is only
'one' such tuple) from the heap page then we will keep on accessing
the heap pages without much benefit. I feel extending the deletion
mechanism based on the number of LP_DEAD items sounds more favorable
than giving preference to duplicate items. Sure, it will give equally
good or better results if there are no long-standing open
transactions.

> I personally will
> never vote for a theoretical risk with only a theoretical benefit. And
> right now that's what the idea of doing bottom-up deletions in more
> marginal cases (the page flag thing) looks like.
>

I don't think we can say that it is purely theoretical because I have
dome shown some basic computation where it can lead to fewer splits.

-- 
With Regards,
Amit Kapila.




Re: Add statistics to pg_stat_wal view for wal related parameter tuning

2021-01-19 Thread Masahiro Ikeda

On 2020-12-22 11:16, Masahiro Ikeda wrote:

Thanks for your comments.

On 2020-12-22 09:39, Andres Freund wrote:

Hi,

On 2020-12-21 13:16:50 -0800, Andres Freund wrote:

On 2020-12-02 13:52:43 +0900, Fujii Masao wrote:
> Pushed. Thanks!

Why are wal_records/fpi long, instead of uint64?
longwal_records;/* # of WAL records produced */
longwal_fpi;/* # of WAL full page images 
produced */
uint64  wal_bytes;  /* size of WAL records produced 
*/

long is only 4 byte e.g. on windows, and it is entirely possible to 
wrap

a 4 byte record counter. It's also somewhat weird that wal_bytes is
unsigned, but the others are signed?

This is made doubly weird because on the SQL level you chose to make
wal_records, wal_fpi bigint. And wal_bytes numeric?


I'm sorry I don't know the reason.

The following thread is related to the patch and the type of wal_bytes
is changed from long to uint64 because XLogRecPtr is uint64.
https://www.postgresql.org/message-id/flat/20200402144438.GF64485%40nol#1f0127c98df430104c63426fdc285c20

I assumed that the reason why the type of wal_records/fpi is long
is BufferUsage have the members (i.e, shared_blks_hit) of the same 
types.


So, I think it's better if to change the type of wal_records/fpi from
long to uint64,
to change the types of BufferUsage's members too.


I've done a little more research so I'll share the results.

IUCC, theoretically this leads to caliculate the statistics less,
but actually, it's not happened.

The above "wal_records", "wal_fpi" are accumulation values and when 
WalUsageAccumDiff()
is called, we can know how many wals are generated for specific 
executions,
for example, planning/executing a query, processing a utility command, 
and vacuuming one relation.


The following variable has accumulated "wal_records" and "wal_fpi" per 
process.


```
typedef struct WalUsage
{
longwal_records;/* # of WAL records produced */
longwal_fpi;/* # of WAL full page images 
produced */
uint64  wal_bytes;  /* size of WAL records produced 
*/
} WalUsage;

WalUsagepgWalUsage;
```

Although this may be overflow, it doesn't affect to caliculate the 
difference
of wal usage between some execution points. If to generate over 2 
billion wal
records per executions, 4 bytes is not enough and collected statistics 
will be

lost, but I think it's not happened.


In addition, "wal_records" and "wal_fpi" values sent by processes are
accumulated in the statistic collector and their types are 
PgStat_Counter(int64).


```
typedef struct PgStat_WalStats
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
uint64  wal_bytes;
PgStat_Counter wal_buffers_full;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
```



Some more things:
- There's both PgStat_MsgWal WalStats; and static PgStat_WalStats 
walStats;

  that seems *WAY* too confusing. And the former imo shouldn't be
  global.


Sorry for the confusing name.
I referenced the following variable name.

 static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];

How about to change from "PgStat_MsgWal WalStats"
to "PgStat_MsgWal MsgWalStats"?

Is it better to change the following name too?
 "PgStat_MsgBgWriter BgWriterStats;"
 "static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];"

Since PgStat_MsgWal is called in xlog.c and pgstat.c,
I thought it's should be global.


I made an attached patch to rename the above variable names.
What do you think?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 8bde948e5e91dbfbcf79b091af51f022aa32191a Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Wed, 20 Jan 2021 12:13:24 +0900
Subject: [PATCH] Refactor variable names of global statistics messages

Refactor the variable names of global statistics messages
for bgwriter, wal, and SLRU because the names are too confusing.

Now, their names are BgWriterStats, WalStats, and SLRUStats. But,
there are alike names that are defined in the statistic collector.
Their names are walStats and slruStats. Since this is confusing,
this patch renames variable names of messages to make it easy to
understand that they are messages.

Author: Masahiro Ikeda
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20201222003935.47aoxfmokltlr...@alap3.anarazel.de
---
 src/backend/access/transam/xlog.c |  6 ++--
 src/backend/postmaster/checkpointer.c |  8 +++---
 src/backend/postmaster/pgstat.c   | 40 +--
 src/backend/storage/buffer/bufmgr.c   |  8 +++---
 src/include/pgstat.h  |  4 +--
 5 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 470e113b33..a64ad34f21 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ 

Re: Change default of checkpoint_completion_target

2021-01-19 Thread japin


On Wed, 20 Jan 2021 at 03:47, Stephen Frost  wrote:
> Greetings,
>
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Stephen Frost  writes:
>> > Any further comments or thoughts on this one?
>> 
>> This:
>> 
>> +total time between checkpoints. The default is 0.9, which spreads 
>> the
>> +checkpoint across the entire checkpoint timeout period of time,
>> 
>> is confusing because 0.9 is obviously not 1.0; people will wonder
>> whether the scale is something strange or the text is just wrong.
>> They will also wonder why not use 1.0 instead.  So perhaps more like
>> 
>>  ... The default is 0.9, which spreads the checkpoint across almost
>>  all the available interval, providing fairly consistent I/O load
>>  while also leaving some slop for checkpoint completion overhead.
>> 
>> The other chunk of text seems accurate, but there's no reason to let
>> this one be misleading.
>
> Good point, updated along those lines.
>
> In passing, I noticed that we have a lot of documentation like:
>
> This parameter can only be set in the postgresql.conf file or on the
> server command line.
>
> ... which hasn't been true since the introduction of ALTER SYSTEM.  I
> don't really think it's this patch's job to clean that up but it doesn't
> seem quite right that we don't include ALTER SYSTEM in that list either.
> If this was C code, maybe we could get away with just changing such
> references as we find them, but I don't think we'd want the
> documentation to be in an inconsistent state regarding that.
>

I have already mentioned this in [1], however it seems unattractive.

[1] - 
https://www.postgresql.org/message-id/flat/199703E4-A795-4FB8-911C-D0DE9F51519C%40hotmail.com

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: list of extended statistics on psql

2021-01-19 Thread Tatsuro Yamada

Hi Tomas,

On 2021/01/19 11:52, Tomas Vondra wrote:



I'm going to create the WIP patch to use the above queriy.
Any comments welcome. :-D


Yes, I think using this simpler query makes sense. If we decide we need 
something more elaborate, we can improve that by in future PostgreSQL versions 
(after adding view/function to core), but I'd leave that as a work for the 
future.



I see, thanks!



Apologies for all the extra work - I haven't realized this flaw when pushing 
for showing more stuff :-(



Don't worry about it. We didn't notice the problem even when viewed by multiple
people on -hackers. Let's keep moving forward. :-D

I'll send a patch including a regression test on the next patch.

Regards,
Tatsuro Yamada





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-19 Thread Julien Rouhaud
Hello Yamada-san,

On Wed, Jan 20, 2021 at 10:06 AM Tatsuro Yamada
 wrote:
>
> Hi Julien,
>
>
> >> Rebase only, thanks to the cfbot!  V16 attached.
> >
> > I tested the v16 patch on a0efda88a by using "make installcheck-parallel", 
> > and
> > my result is the following. Attached file is regression.diffs.
>
>
> Sorry, my environment was not suitable for the test when I sent my previous 
> email.
> I fixed my environment and tested it again, and it was a success. See below:
>
> ===
>   All 202 tests passed.
> ===

No worries, thanks a lot for testing!




Re: Use boolean array for nulls parameters

2021-01-19 Thread japin


On Tue, 19 Jan 2021 at 23:45, Tom Lane  wrote:
> japin  writes:
>> When I review the [1], I find that the tuple's nulls array use char type.
>> However there are many places use boolean array to repsent the nulls array,
>> so I think we can replace the char type nulls array to boolean type.  This
>> change will break the SPI_xxx API, I'm not sure whether this chagnges cause
>> other problems or not.  Any thought?
>
> We have always considered that changing the APIs of published SPI
> interfaces is a non-starter.  The entire reason those calls still
> exist at all is for the benefit of third-party extensions.
>

Thanks for your clarify.  I agree that we should keep the APIs stable, maybe we
can modify this in someday when the APIs must be changed.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-19 Thread Tatsuro Yamada

Hi Julien,



Rebase only, thanks to the cfbot!  V16 attached.


I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
my result is the following. Attached file is regression.diffs.



Sorry, my environment was not suitable for the test when I sent my previous 
email.
I fixed my environment and tested it again, and it was a success. See below:

===
 All 202 tests passed.
===

Regards,
Tatsuro Yamada






Re: POC: postgres_fdw insert batching

2021-01-19 Thread Amit Langote
On Wed, Jan 20, 2021 at 1:01 AM Tomas Vondra
 wrote:
> On 1/19/21 7:23 AM, Amit Langote wrote:
> > On Tue, Jan 19, 2021 at 2:06 PM tsunakawa.ta...@fujitsu.com
> >> Actually, I tried to do it (adding the GetModifyBatchSize() call after 
> >> BeginForeignModify()), but it failed.  Because 
> >> postgresfdwGetModifyBatchSize() wants to know if RETURNING is specified, 
> >> and ResultRelInfo->projectReturning is created after the above part.  
> >> Considering the context where GetModifyBatchSize() implementations may 
> >> want to know the environment, I placed the call as late as possible in the 
> >> initialization phase.  As for the future(?) multi-target DML statements, I 
> >> think we can change this together with other many(?) parts that assume a 
> >> single target table.
> >
> > Okay, sometime later then.
> >
> > I wasn't sure if bringing it up here would be appropriate, but there's
> > a patch by me to refactor ModfiyTable result relation allocation that
> > will have to remember to move this code along to an appropriate place
> > [1].  Thanks for the tip about the dependency on how RETURNING is
> > handled.  I will remember it when rebasing my patch over this.
> >
>
> Thanks. The last version (v12) should be addressing all the comments and
> seems fine to me, so barring objections I'll get that pushed shortly.

+1

> One thing that seems a bit annoying is that with the partitioned table
> the explain (verbose) looks like this:
>
>   QUERY PLAN
> -
>   Insert on public.batch_table
> ->  Function Scan on pg_catalog.generate_series i
>   Output: i.i
>   Function Call: generate_series(1, 66)
> (4 rows)
>
> That is, there's no information about the batch size :-( But AFAICS
> that's due to how explain shows (or rather does not) partitions in this
> type of plan.

Yeah.  Partition result relations are always lazily allocated for
INSERT, so EXPLAIN (without ANALYZE) has no idea what to show for
them, nor does it know which partitions will be used in the first
place.  With ANALYZE however, you could get them from
es_tuple_routing_result_relations and maybe list them if you want, but
that sounds like a project on its own.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Odd, intermittent failure in contrib/pageinspect

2021-01-19 Thread Tom Lane
Andres Freund  writes:
> I think you don't event need checkpointer to be involved, normal buffer
> replacement would do the trick. We briefly pin the page in BufferAlloc()
> even if the page is clean. Longer when it's dirty, of course.

True, but it seems unlikely that the pages in question here would be
chosen as replacement victims.  These are non-parallel tests, so
there's little competitive pressure.  I could believe that a background
autovacuum is active, but not that it's dirtied so many pages that
tables the test script just created need to get swapped out.

The checkpointer theory seems good because it requires no assumptions
at all about competing demand for buffers.  If the clock sweep gets
to the table page (which we know is recently dirtied) at just the right
time, we'll see a failure.

regards, tom lane




Re: Odd, intermittent failure in contrib/pageinspect

2021-01-19 Thread Andres Freund
Hi,

On 2021-01-18 19:40:05 -0300, Alvaro Herrera wrote:
> > [ thinks for a bit... ]  Does the checkpointer pin pages it's writing
> > out?  I guess it'd have to ...
> 
> It does, per SyncOneBuffer(), called from BufferSync(), called from
> CheckPointBuffers().

I think you don't event need checkpointer to be involved, normal buffer
replacement would do the trick. We briefly pin the page in BufferAlloc()
even if the page is clean. Longer when it's dirty, of course.

Greetings,

Andres Freund




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread Tom Lane
James Hilliard  writes:
> Actually, this looks path looks wrong in general, the value for
> "xcrun --sdk macosx --show-sdk-path" should take precedence over
> "xcrun --show-sdk-path" as the latter may be used for IOS potentially.

What is "potentially"?  I've found no direct means to control the
SDK path at all, but so far it appears that "xcrun --show-sdk-path"
agrees with the compiler's default -isysroot path as seen in the
compiler's -v output.  I suspect that this isn't coincidental,
but reflects xcrun actually being used in the compiler launch
process.  If it were to flip over to using a IOS SDK, that would
mean that bare "cc" would generate nonfunctional executables,
which just about any onlooker would agree is broken.

I'm really not excited about trying to make the build work with
a non-native SDK as you are proposing.  I think that's just going
to lead to a continuing stream of problems, because of Apple's
opinions about how cross-version compatibility should work.
It also seems like unnecessary complexity, because there is always
(AFAICS) a native SDK version available.  We just need to find it.

regards, tom lane




RE: Parallel INSERT (INTO ... SELECT ...)

2021-01-19 Thread tsunakawa.ta...@fujitsu.com
From: Tang, Haiying 
> Execute EXPLAIN on Patched:
>  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=44.139..44.140 rows=0 loops=1)
>Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
>->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.007..0.201 rows=1000 loops=1)
>  Output: test_data1.a, test_data1.b
>  Buffers: shared hit=5

> Execute EXPLAIN on non-Patched:
>  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=72.656..72.657 rows=0 loops=1)
>Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
>->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.010..0.175 rows=1000 loops=1)
>  Output: test_data1.a, test_data1.b
>  Buffers: shared hit=5

I don't know if this is related to this issue, but I felt "shared hit=5" for 
Seq Scan is strange.  This test case reads 1,000 rows from 1,000 partitions, 
one row per partition, so I thought the shared hit should be 1,000 in Seq Scan. 
 I wonder if the 1,000 is included in Insert node?


Regards
Takayuki Tsunakawa



RE: Add Nullif case for eval_const_expressions_mutator

2021-01-19 Thread Hou, Zhijie
Hi

Thanks for the review.

> It's a bit unfortunate now that between OpExpr, DistinctExpr, NullIfExpr,
> and to a lesser extent ScalarArrayOpExpr we will now have several different
> implementations of nearly the same thing, without any explanation why one
> approach was chosen here and another there.  We should at least document
> this.

I am not quiet sure where to document the difference.
Temporarily, I tried to add some comments for the Nullif to explain why this 
one is different.

+   /*
+* Since NullIf is not common enough to deserve 
specially
+* optimized code, use ece_generic_processing 
and
+* ece_evaluate_expr to simplify the code as 
much as possible.
+*/

Any suggestions ?

> Some inconsistencies I found: The code for DistinctExpr calls
> expression_tree_mutator() directly, but your code for NullIfExpr calls
> ece_generic_processing(), even though the explanation in the comment for
> DistinctExpr would apply there as well.
> 
> Your code for NullIfExpr doesn't appear to call set_opfuncid() anywhere.

IMO, we will call set_opfuncid in function ece_evaluate_expr.

Like the following flow:
ece_evaluate_expr-->evaluate_expr--> fix_opfuncids--> 
fix_opfuncids_walker--> set_opfuncid

And we do not need the opfuncid till we call ece_evaluate_expr.
So, to simplify the code as much as possible, I did not call set_opfuncid in 
eval_const_expressions_mutator again.


> I would move your new block for NullIfExpr after the block for DistinctExpr.
> That's the order in which these blocks appear elsewhere for generic node
> processing.
> 

Changed.


> Check your whitespace usage:
> 
>  if(!has_nonconst_input)
> 
> should have a space after the "if".  (It's easy to fix of course, but I
> figure I'd point it out here since you have submitted several patches with
> this style, so it's perhaps a habit to break.)

Changed.


> Perhaps add a comment to the tests like this so it's clear what they are
> for:
> 
> diff --git a/src/test/regress/sql/case.sql
> b/src/test/regress/sql/case.sql index 4742e1d0e0..98e3fb8de5 100644
> --- a/src/test/regress/sql/case.sql
> +++ b/src/test/regress/sql/case.sql
> @@ -137,6 +137,7 @@ CREATE TABLE CASE2_TBL (
> FROM CASE_TBL a, CASE2_TBL b
> WHERE COALESCE(f,b.i) = 2;
> 
> +-- Tests for constant subexpression simplification
>   explain (costs off)
>   SELECT * FROM CASE_TBL WHERE NULLIF(1, 2) = 2;

Added.

Attatching v3 patch, please consider it for further review.

Best regards,
houzj





v3_0001-add-nullif-case-for-eval_const_expressions.patch
Description: v3_0001-add-nullif-case-for-eval_const_expressions.patch


Re: New IndexAM API controlling index vacuum strategies

2021-01-19 Thread Peter Geoghegan
On Tue, Jan 19, 2021 at 4:45 PM Peter Geoghegan  wrote:
> BTW, I think that increasing MaxHeapTuplesPerPage will make it
> necessary to consider tidbitmap.c. Comments at the top of that file
> say that it is assumed that MaxHeapTuplesPerPage is about 256. So
> there is a risk of introducing performance regressions affecting
> bitmap scans here.

More concretely, WORDS_PER_PAGE increases from 5 on the master branch
to 16 with the latest version of the patch series on most platforms
(while WORDS_PER_CHUNK is 4 with or without the patches).

-- 
Peter Geoghegan




Re: New IndexAM API controlling index vacuum strategies

2021-01-19 Thread Peter Geoghegan
On Tue, Jan 19, 2021 at 2:57 PM Peter Geoghegan  wrote:
> * Maybe it would be better if you just changed the definition such
> that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
> no other changes? (Some variant of this suggestion might be better,
> not sure.)
>
> For some reason that feels a bit safer: we still have an "imaginary
> tuple header", but it's just 1 MAXALIGN() quantum now. This is still
> much less than the current 3 MAXALIGN() quantums (i.e. what
> MaxHeapTuplesPerPage treats as the tuple header size). Do you think
> that this alternative approach will be noticeably less effective
> within vacuumlazy.c?

BTW, I think that increasing MaxHeapTuplesPerPage will make it
necessary to consider tidbitmap.c. Comments at the top of that file
say that it is assumed that MaxHeapTuplesPerPage is about 256. So
there is a risk of introducing performance regressions affecting
bitmap scans here.

Apparently some other DB systems make the equivalent of
MaxHeapTuplesPerPage dynamically configurable at the level of heap
tables. It usually doesn't matter, but it can matter with on-disk
bitmap indexes, where the bitmap must be encoded from raw TIDs (this
must happen before the bitmap is compressed -- there must be a simple
mapping from every possible TID to some bit in a bitmap first). The
item offset component of each heap TID is not usually very large, so
there is a trade-off between keeping the representation of bitmaps
efficient and not unduly restricting the number of distinct heap
tuples on each heap page. I think that there might be a similar
consideration here, in tidbitmap.c (even though it's not concerned
about on-disk bitmaps).

-- 
Peter Geoghegan




Re: Support for NSS as a libpq TLS backend

2021-01-19 Thread Jacob Champion
On Tue, 2021-01-19 at 21:21 +0100, Daniel Gustafsson wrote:
> There is something iffy with these certs (the test fails
> on mismatching ciphers and/or signature algorithms) that I haven't been able 
> to
> pin down, but to get more eyes on this I'm posting the patch with the test
> enabled.

Removing `--keyUsage keyEncipherment` from the native_server-* CSR
generation seems to let the tests pass for me, but I'm wary of just
pushing that as a solution because I don't understand why that would
have anything to do with the failure mode
(SSL_ERROR_NO_SUPPORTED_SIGNATURE_ALGORITHM).

> The NSS toolchain requires interactive input which makes the Makefile
> a bit hacky, ideas on cleaning that up are appreciated.

Hm. I got nothing, short of a feature request to NSS...

--Jacob


Re: Add table access method as an option to pgbench

2021-01-19 Thread David Zhang

On 2021-01-15 1:22 p.m., Andres Freund wrote:


Hi,

On 2020-11-25 12:41:25 +0900, Michael Paquier wrote:

On Tue, Nov 24, 2020 at 03:32:38PM -0800, David Zhang wrote:

But, providing another option for the end user may not be a bad idea, and it
might make the tests easier at some points.

My first thought is that we have no need to complicate pgbench with
this option because there is a GUC able to do that, but we do that for
tablespaces, so...  No objections from here.

I think that objection is right. All that's needed to change this from
the client side is to do something like
PGOPTIONS='-c default_table_access_method=foo' pgbench ...
Yeah, this is a better solution for me too. Thanks a lot for all the 
feedback.

I don't think adding pgbench options for individual GUCs really is a
useful exercise?

Greetings,

Andres Freund

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: WIP: BRIN multi-range indexes

2021-01-19 Thread Tomas Vondra

On 1/19/21 9:44 PM, John Naylor wrote:
On Tue, Jan 12, 2021 at 1:42 PM Tomas Vondra 
mailto:tomas.von...@enterprisedb.com>> 
wrote:

 > I suspect it'd due to minmax having to decide which "ranges" to merge,
 > which requires repeated sorting, etc. I certainly don't dare to claim
 > the current algorithm is perfect. I wouldn't have expected such big
 > difference, though - so definitely worth investigating.

It seems that monotonically increasing (or decreasing) values in a table 
are a worst case scenario for multi-minmax indexes, or basically, unique 
values within a range. I'm guessing it's because it requires many passes 
to fit all the values into a limited number of ranges. I tried using 
smaller pages_per_range numbers, 32 and 8, and that didn't help.


Now, with a different data distribution, using only 10 values that 
repeat over and over, the results are muchs more sympathetic to multi-minmax:


insert into iot (num, create_dt)
select random(), '2020-01-01 0:00'::timestamptz + (x % 10 || ' 
seconds')::interval

from generate_series(1,5*365*24*60*60) x;

create index cd_single on iot using brin(create_dt);
27.2s

create index cd_multi on iot using brin(create_dt 
timestamptz_minmax_multi_ops);

30.4s

create index cd_bt on iot using btree(create_dt);
61.8s

Circling back to the monotonic case, I tried running a simple perf 
record on a backend creating a multi-minmax index on a timestamptz 
column and these were the highest non-kernel calls:
+   21.98%    21.91%  postgres         postgres            [.] 
FunctionCall2Coll
+    9.31%     9.29%  postgres         postgres            [.] 
compare_combine_ranges

+    8.60%     8.58%  postgres         postgres            [.] qsort_arg
+    5.68%     5.66%  postgres         postgres            [.] 
brin_minmax_multi_add_value

+    5.63%     5.60%  postgres         postgres            [.] timestamp_lt
+    4.73%     4.71%  postgres         postgres            [.] 
reduce_combine_ranges
+    3.80%     0.00%  postgres         [unknown]           [.] 
0x032001680004

+    3.51%     3.50%  postgres         postgres            [.] timestamp_eq

There's no one place that's pathological enough to explain the 4x 
slowness over traditional BRIN and nearly 3x slowness over btree when 
using a large number of unique values per range, so making progress here 
would have to involve a more holistic approach.




Yeah. This very much seems like the primary problem is in how we build 
the ranges incrementally - with monotonic sequences, we end up having to 
merge the ranges over and over again. I don't know what was the 
structure of the table, but I guess it was kinda narrow (very few 
columns), which exacerbates the problem further, because the number of 
rows per range will be way higher than in real-world.


I do think the solution to this might be to allow more values during 
batch index creation, and only "compress" to the requested number at the 
very end (when serializing to on-disk format).


There are a couple additional comments about possibly replacing 
sequential scan with a binary search, that could help a bit too.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-19 Thread Tatsuro Yamada

Hi Julien,


Rebase only, thanks to the cfbot!  V16 attached.


I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
my result is the following. Attached file is regression.diffs.


 1 of 202 tests failed.


The differences that caused some tests to fail can be viewed in the
file "/home/postgres/PG140/src/test/regress/regression.diffs".  A copy of the 
test summary that you see
above is saved in the file 
"/home/postgres/PG140/src/test/regress/regression.out".


src/test/regress/regression.diffs
-
diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out 
/home/postgres/PG140/src/test/regress/results/rules.out
--- /home/postgres/PG140/src/test/regress/expected/rules.out2021-01-20 
08:41:16.383175559 +0900
+++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 
08:43:46.589171774 +0900
@@ -1760,10 +1760,9 @@
 s.state,
 s.backend_xid,
 s.backend_xmin,
-s.queryid,
 s.query,
 s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
...

Thanks,
Tatsuro Yamada
diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out 
/home/postgres/PG140/src/test/regress/results/rules.out
--- /home/postgres/PG140/src/test/regress/expected/rules.out2021-01-20 
08:41:16.383175559 +0900
+++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 
08:52:10.891159065 +0900
@@ -1760,10 +1760,9 @@
 s.state,
 s.backend_xid,
 s.backend_xmin,
-s.queryid,
 s.query,
 s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
  LEFT JOIN pg_database d ON ((s.datid = d.oid)))
  LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1875,7 +1874,7 @@
 s.gss_auth AS gss_authenticated,
 s.gss_princ AS principal,
 s.gss_enc AS encrypted
-   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
   WHERE (s.client_port IS NOT NULL);
 pg_stat_progress_analyze| SELECT s.pid,
 s.datid,
@@ -2032,7 +2031,7 @@
 w.sync_priority,
 w.sync_state,
 w.reply_time
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
+   FROM 

Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread James Hilliard
On Tue, Jan 19, 2021 at 3:47 PM James Hilliard
 wrote:
>
> On Tue, Jan 19, 2021 at 1:54 PM Tom Lane  wrote:
> >
> > James Hilliard  writes:
> > > On Tue, Jan 19, 2021 at 10:17 AM Tom Lane  wrote:
> > >> Ah, got it.  So "xcrun --show-sdk-path" tells us the right thing (that
> > >> is, it *does* give us a symlink to a 10.15 SDK) but by refusing to
> > >> believe we've got the right thing, we end up picking MacOSX11.1.sdk.
> > >> Drat.  I suppose we could drop the heuristic about wanting a version
> > >> number in the SDK path, but I really don't want to do that.  Now I'm
> > >> thinking about trying to dereference the symlink after the first step.
> >
> > > The MacOSX11.1.sdk can build for a 10.15 target just fine when passed
> > > an appropriate MACOSX_DEPLOYMENT_TARGET, so that SDK should be
> > > fine.
> >
> > But our out-of-the-box default should be to build for the current
> > platform; we don't want users to have to set MACOSX_DEPLOYMENT_TARGET
> > for that case.  Besides, the problem we're having is exactly that Apple's
> > definition of "builds for a 10.15 target just fine" is different from
> > ours.  They think you should use a run-time test not a compile-time test
> > to discover whether preadv is available, and we don't want to do that.
> The default for MACOSX_DEPLOYMENT_TARGET is always the current
> running OS version from my understanding. So if I build with MacOSX11.1.sdk
> on 10.15 with default settings the binaries will work fine because the
> MACOSX_DEPLOYMENT_TARGET gets set to 10.15 automatically even
> if the same SDK is capable of producing incompatible binaries if you set
> MACOSX_DEPLOYMENT_TARGET to 11.0.
> >
> > In almost all of the cases I've seen so far, Apple's compiler actually
> > does default to using an SDK matching the platform.  The problem we
> > have is that we try to name the SDK explicitly, and the current
> > method is failing to pick the right one in your case.  There are
> > several reasons for using an explicit -isysroot rather than just
> > letting the compiler default:
> No, it's only the MACOSX_DEPLOYMENT_TARGET that matches the
> platform, SDK can be arbitrary more or less, but it will work fine because
> the autoselected MACOSX_DEPLOYMENT_TARGET will force compatibility
> no matter what SDK version you use. This is always how it has worked
> from what I've seen.
> >
> > * We have seen cases in which the compiler acts as though it has
> > *no* default sysroot, and we have to help it out.
> >
> > * The explicit root reduces version-skew build hazards for extensions
> > that are not built at the same time as the core system.
> The deployment target is effectively entirely separate from SDK version,
> so it really shouldn't make a difference unless the SDK is significantly
> older or newer than the running version from what I can tell.
> >
> > * There are a few tests in configure itself that need to know the
> > sysroot path to check for files there.
> >
> > Anyway, the behavior you're seeing shows that 4823621db is still a
> > bit shy of a load.  I'm thinking about the attached as a further
> > fix --- can you verify it helps for you?
> Best I can tell it provides no change for me(this patch is tested on top of 
> it)
> because it does not provide any MACOSX_DEPLOYMENT_TARGET
> based feature detection for pwritev at all.
Actually, this looks path looks wrong in general, the value for
"xcrun --sdk macosx --show-sdk-path" should take precedence over
"xcrun --show-sdk-path" as the latter may be used for IOS potentially.
On my system "xcodebuild -version -sdk macosx Path" and
"xcrun --sdk macosx --show-sdk-path" both point to the
correct latest MacOSX11.1.sdk SDK while "xcrun --show-sdk-path"
points to the older one.
> >
> > regards, tom lane
> >




Re: New IndexAM API controlling index vacuum strategies

2021-01-19 Thread Peter Geoghegan
On Sun, Jan 17, 2021 at 9:18 PM Masahiko Sawada  wrote:
> After more thought, I think that ambulkdelete needs to be able to
> refer the answer to amvacuumstrategy. That way, the index can skip
> bulk-deletion when lazy vacuum doesn't vacuum heap and it also doesn’t
> want to do that.

Makes sense.

BTW, your patch has bitrot already. Peter E's recent pageinspect
commit happens to conflict with this patch. It might make sense to
produce a new version that just fixes the bitrot, so that other people
don't have to deal with it each time.

> I’ve attached the updated version patch that includes the following changes:

Looks good. I'll give this version a review now. I will do a lot more
soon. I need to come up with a good benchmark for this, that I can
return to again and again during review as needed.

Some feedback on the first patch:

* Just so you know: I agree with you about handling
VACOPT_TERNARY_DISABLED in the index AM's amvacuumstrategy routine. I
think that it's better to do that there, even though this choice may
have some downsides.

* Can you add some "stub" sgml doc changes for this? Doesn't have to
be complete in any way. Just a placeholder for later, that has the
correct general "shape" to orientate the reader of the patch. It can
just be a FIXME comment, plus basic mechanical stuff -- details of the
new amvacuumstrategy_function routine and its signature.

Some feedback on the second patch:

* Why do you move around IndexVacuumStrategy in the second patch?
Looks like a rebasing oversight.

* Actually, do we really need the first and second patches to be
separate patches? I agree that the nbtree patch should be a separate
patch, but dividing the first two sets of changes doesn't seem like it
adds much. Did I miss some something?

* Is the "MAXALIGN(sizeof(ItemIdData)))" change in the definition of
MaxHeapTuplesPerPage appropriate? Here is the relevant section from
the patch:

diff --git a/src/include/access/htup_details.h
b/src/include/access/htup_details.h
index 7c62852e7f..038e7cd580 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -563,17 +563,18 @@ do { \
 /*
  * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can
  * fit on one heap page.  (Note that indexes could have more, because they
- * use a smaller tuple header.)  We arrive at the divisor because each tuple
- * must be maxaligned, and it must have an associated line pointer.
+ * use a smaller tuple header.)  We arrive at the divisor because each line
+ * pointer must be maxaligned.
*** SNIP ***
 #define MaxHeapTuplesPerPage\
-((int) ((BLCKSZ - SizeOfPageHeaderData) / \
-(MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData
+((int) ((BLCKSZ - SizeOfPageHeaderData) / (MAXALIGN(sizeof(ItemIdData)

It's true that ItemIdData structs (line pointers) are aligned, but
they're not MAXALIGN()'d. If they were then the on-disk size of line
pointers would generally be 8 bytes, not 4 bytes.

* Maybe it would be better if you just changed the definition such
that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
no other changes? (Some variant of this suggestion might be better,
not sure.)

For some reason that feels a bit safer: we still have an "imaginary
tuple header", but it's just 1 MAXALIGN() quantum now. This is still
much less than the current 3 MAXALIGN() quantums (i.e. what
MaxHeapTuplesPerPage treats as the tuple header size). Do you think
that this alternative approach will be noticeably less effective
within vacuumlazy.c?

Note that you probably understand the issue with MaxHeapTuplesPerPage
for vacuumlazy.c better than I do currently. I'm still trying to
understand your choices, and to understand what is really important
here.

* Maybe add a #define for the value 0.7? (I refer to the value used in
choose_vacuum_strategy() to calculate a "this is the number of LP_DEAD
line pointers that we consider too many" cut off point, which is to be
applied throughout lazy_scan_heap() processing.)

* I notice that your new lazy_vacuum_table_and_indexes() function is
the only place that calls lazy_vacuum_table_and_indexes(). I think
that you should merge them together -- replace the only remaining call
to lazy_vacuum_table_and_indexes() with the body of the function
itself. Having a separate lazy_vacuum_table_and_indexes() function
doesn't seem useful to me -- it doesn't actually hide complexity, and
might even be harder to maintain.

* I suggest thinking about what the last item will mean for the
reporting that currently takes place in
lazy_vacuum_table_and_indexes(), but will now go in an expanded
lazy_vacuum_table_and_indexes() -- how do we count the total number of
index scans now?

I don't actually believe that the logic needs to change, but some kind
of consolidation and streamlining seems like it might be helpful.
Maybe just a comment that says "note that all index scans might just
be no-ops because..." -- stuff like that.

* Any 

Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread James Hilliard
On Tue, Jan 19, 2021 at 1:54 PM Tom Lane  wrote:
>
> James Hilliard  writes:
> > On Tue, Jan 19, 2021 at 10:17 AM Tom Lane  wrote:
> >> Ah, got it.  So "xcrun --show-sdk-path" tells us the right thing (that
> >> is, it *does* give us a symlink to a 10.15 SDK) but by refusing to
> >> believe we've got the right thing, we end up picking MacOSX11.1.sdk.
> >> Drat.  I suppose we could drop the heuristic about wanting a version
> >> number in the SDK path, but I really don't want to do that.  Now I'm
> >> thinking about trying to dereference the symlink after the first step.
>
> > The MacOSX11.1.sdk can build for a 10.15 target just fine when passed
> > an appropriate MACOSX_DEPLOYMENT_TARGET, so that SDK should be
> > fine.
>
> But our out-of-the-box default should be to build for the current
> platform; we don't want users to have to set MACOSX_DEPLOYMENT_TARGET
> for that case.  Besides, the problem we're having is exactly that Apple's
> definition of "builds for a 10.15 target just fine" is different from
> ours.  They think you should use a run-time test not a compile-time test
> to discover whether preadv is available, and we don't want to do that.
The default for MACOSX_DEPLOYMENT_TARGET is always the current
running OS version from my understanding. So if I build with MacOSX11.1.sdk
on 10.15 with default settings the binaries will work fine because the
MACOSX_DEPLOYMENT_TARGET gets set to 10.15 automatically even
if the same SDK is capable of producing incompatible binaries if you set
MACOSX_DEPLOYMENT_TARGET to 11.0.
>
> In almost all of the cases I've seen so far, Apple's compiler actually
> does default to using an SDK matching the platform.  The problem we
> have is that we try to name the SDK explicitly, and the current
> method is failing to pick the right one in your case.  There are
> several reasons for using an explicit -isysroot rather than just
> letting the compiler default:
No, it's only the MACOSX_DEPLOYMENT_TARGET that matches the
platform, SDK can be arbitrary more or less, but it will work fine because
the autoselected MACOSX_DEPLOYMENT_TARGET will force compatibility
no matter what SDK version you use. This is always how it has worked
from what I've seen.
>
> * We have seen cases in which the compiler acts as though it has
> *no* default sysroot, and we have to help it out.
>
> * The explicit root reduces version-skew build hazards for extensions
> that are not built at the same time as the core system.
The deployment target is effectively entirely separate from SDK version,
so it really shouldn't make a difference unless the SDK is significantly
older or newer than the running version from what I can tell.
>
> * There are a few tests in configure itself that need to know the
> sysroot path to check for files there.
>
> Anyway, the behavior you're seeing shows that 4823621db is still a
> bit shy of a load.  I'm thinking about the attached as a further
> fix --- can you verify it helps for you?
Best I can tell it provides no change for me(this patch is tested on top of it)
because it does not provide any MACOSX_DEPLOYMENT_TARGET
based feature detection for pwritev at all.
>
> regards, tom lane
>




Re: Add primary keys to system catalogs

2021-01-19 Thread Mark Rofail
I'll post tomorrow the latest rebased patch with a summary of the issues.
Let's continue this in the foreign key array thread, as to not clutter this
thread.

Regards, Mark Rofail.

On Tue, Jan 19, 2021, 11:00 PM Joel Jacobson  wrote:

> On Tue, Jan 19, 2021, at 18:25, Mark Rofail wrote:
> >Dear Joel,
> >
> >I would love to start working on this again, I tried to revive the patch
> again in 2019, however, I faced the same problems as >last time (detailed
> in the thread) and I didn't get the support I needed to continue with this
> patch.
> >
> >Are you willing to help rebase and finally publish this patch?
>
> Willing, sure, not perhaps not able,
> if there are complex problems that need a deep understanding of code base,
> I'm not sure I will be of much help, but I can for sure do testing and
> reviewing.
>
> /Joel
>


Re: Odd, intermittent failure in contrib/pageinspect

2021-01-19 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jan 18, 2021 at 05:47:40PM -0500, Tom Lane wrote:
>> So, do we have any other tests that are invoking a manual vacuum
>> and assuming it won't skip any pages?  By this theory, they'd
>> all be failures waiting to happen.

> check_heap.sql and heap_surgery.sql have one VACUUM FREEZE each and it
> seems to me that we had better be sure that no pages are skipped for
> their cases?

It looks to me like heap_surgery ought to be okay, because it's operating
on a temp table; if there are any page access conflicts on that, we've
got BIG trouble ;-)

Poking around, I found a few other places where it looked like a skipped
page could produce diffs in the expected output:
contrib/amcheck/t/001_verify_heapam.pl
contrib/pg_visibility/sql/pg_visibility.sql

There are lots of other vacuums of course, but they don't look like
a missed page would have any effect on the visible results, so I think
we should leave them alone.

In short I propose the attached patch, which also gets rid of
that duplicate query.

regards, tom lane

diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
index 882f853d56..1fb3823142 100644
--- a/contrib/amcheck/expected/check_heap.out
+++ b/contrib/amcheck/expected/check_heap.out
@@ -109,7 +109,7 @@ ERROR:  ending block number must be between 0 and 0
 SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 1, endblock := 11000);
 ERROR:  starting block number must be between 0 and 0
 -- Vacuum freeze to change the xids encountered in subsequent tests
-VACUUM FREEZE heaptest;
+VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) heaptest;
 -- Check that valid options are not rejected nor corruption reported
 -- for a non-empty frozen table
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
index c10a25f21c..298de6886a 100644
--- a/contrib/amcheck/sql/check_heap.sql
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -51,7 +51,7 @@ SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock :=
 SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 1, endblock := 11000);
 
 -- Vacuum freeze to change the xids encountered in subsequent tests
-VACUUM FREEZE heaptest;
+VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) heaptest;
 
 -- Check that valid options are not rejected nor corruption reported
 -- for a non-empty frozen table
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index 1581e51f3c..a2f65b826d 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -46,7 +46,7 @@ detects_heap_corruption(
 # Check a corrupt table with all-frozen data
 #
 fresh_test_table('test');
-$node->safe_psql('postgres', q(VACUUM FREEZE test));
+$node->safe_psql('postgres', q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test));
 corrupt_first_page('test');
 detects_heap_corruption("verify_heapam('test')",
 	"all-frozen corrupted table");
diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 4cd0db8018..4da28f0a1d 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,7 +1,7 @@
 CREATE EXTENSION pageinspect;
 CREATE TABLE test1 (a int, b int);
 INSERT INTO test1 VALUES (16777217, 131584);
-VACUUM test1;  -- set up FSM
+VACUUM (DISABLE_PAGE_SKIPPING) test1;  -- set up FSM
 -- The page contents can vary, so just test that it can be read
 -- successfully, but don't keep the output.
 SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
@@ -87,18 +87,8 @@ SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0));
 (1 row)
 
 -- If we freeze the only tuple on test1, the infomask should
--- always be the same in all test runs. we show raw flags by
--- default: HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID.
-VACUUM FREEZE test1;
-SELECT t_infomask, t_infomask2, raw_flags, combined_flags
-FROM heap_page_items(get_raw_page('test1', 0)),
- LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2);
- t_infomask | t_infomask2 | raw_flags |   combined_flags   
-+-+---+
-   2816 |   2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID} | {HEAP_XMIN_FROZEN}
-(1 row)
-
--- output the decoded flag HEAP_XMIN_FROZEN instead
+-- always be the same in all test runs.
+VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test1;
 SELECT t_infomask, t_infomask2, raw_flags, combined_flags
 FROM heap_page_items(get_raw_page('test1', 0)),
  LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2);
diff --git a/contrib/pageinspect/sql/page.sql b/contrib/pageinspect/sql/page.sql
index 01844cb629..d333b763d7 100644
--- a/contrib/pageinspect/sql/page.sql
+++ b/contrib/pageinspect/sql/page.sql

Re: cfbot building docs - serving results

2021-01-19 Thread Thomas Munro
On Wed, Jan 20, 2021 at 10:22 AM Erik Rijkers  wrote:
> I am wondering if the cfbot at the moment is building the docs
> (html+pdf), for the patches that it tests.  I suppose that it does?  If
> so, what happens with the resulting (doc)files? To /dev/null?   They are
> not available as far as I can see.  Would it be feasible to make them
> available, either serving the html, or to make docs html+pdf a
> downloadable zipfile?

It does build the docs as part of the Linux build.  I picked that
because Cirrus has more Linux horsepower available than the other
OSes, and there's no benefit to doing that on all the OSes.

That's a good idea, and I suspect it could be handled as an
"artifact", though I haven't looked into that:

https://cirrus-ci.org/guide/writing-tasks/#artifacts-instruction

It'd also be nice to (somehow) know which .html pages changed so you
could go straight to the new stuff without the intermediate step of
wondering where .sgml changes come out!

Another good use for artifacts that I used once or twice is the
ability to allow the results of the Windows build to be downloaded in
a .zip file and tested by non-developers without the build tool chain.

> (it would also be useful to be able see at a glance somewhere if the
> patch contains sgml-changes at all...)

True.  Basically you want to be able to find the diffstat output quickly.




Re: compression libraries and CF bot

2021-01-19 Thread Tom Lane
Thomas Munro  writes:
> On Wed, Jan 20, 2021 at 9:56 AM Justin Pryzby  wrote:
>> Also, what's the process for having new libraries installed in the CI
>> environment ?

> I have added lz4 to the FreeBSD and Ubuntu build tasks, so we'll see
> if that helps at the next periodic build or when a new patch is
> posted.  It's failing on Windows because there is no HAVE_LIBLZ4 in
> Solution.pm, and I don't know how to install that on a Mac.  Is this
> patch supposed to be adding a new required dependency, or a new
> optional dependency?

It had better be optional.

regards, tom lane




Re: compression libraries and CF bot

2021-01-19 Thread Thomas Munro
On Wed, Jan 20, 2021 at 9:56 AM Justin Pryzby  wrote:
> Do you know if the old travis build environment had liblz4 installed ?

It sounds like it.

> I'm asking regarding Dilip's patch, which was getting to "check world" 2 weeks
> ago but now failing to even compile, not apparently due to any change in the
> patch.  Also, are the historic logs available somewhere ?
> http://cfbot.cputube.org/dilip-kumar.html

I can find some of them but not that one, because Travis's "branches"
page truncates well before our ~250 active branches, and that one
isn't in there.

https://travis-ci.org/github/postgresql-cfbot/postgresql/branches

> Also, what's the process for having new libraries installed in the CI
> environment ?

I have added lz4 to the FreeBSD and Ubuntu build tasks, so we'll see
if that helps at the next periodic build or when a new patch is
posted.  It's failing on Windows because there is no HAVE_LIBLZ4 in
Solution.pm, and I don't know how to install that on a Mac.  Is this
patch supposed to be adding a new required dependency, or a new
optional dependency?

In general, you could ask for changes here, or send me a pull request for eg:

https://github.com/macdice/cfbot/blob/master/cirrus/.cirrus.yml

If we eventually think the CI control file is good enough, and can get
past the various political discussions required to put CI
vendor-specific material in our tree, it'd be just a regular patch
proposal and could even be tweaked as part of a feature submission.

> There's 3 compression patches going around, so I think eventually we'll ask to
> get libzstd-devel (for libpq and pg_dump) and liblz4-devel (for toast and
> libpq).  Maybe all compression methods would be supported in each place - I
> hope the patches will share common code.

+1, nice to see modern compression coming to PostgreSQL.




Re: Printing backtrace of postgres processes

2021-01-19 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 19, 2021 at 12:50 PM Tom Lane  wrote:
>> I think it's got security hazards as well.  If we restricted the
>> feature to cause a trace of only one process at a time, and required
>> that process to be logged in as the same database user as the
>> requestor (or at least someone with the privs of that user, such
>> as a superuser), that'd be less of an issue.

> I am not sure I see a security hazard here. I think the checks for
> this should have the same structure as pg_terminate_backend() and
> pg_cancel_backend(); whatever is required there should be required
> here, too, but not more, unless we have a real clear reason for such
> an inconsistency.

Yeah, agreed.  The "broadcast" option seems inconsistent with doing
things that way, but I don't have a problem with being able to send
a trace signal to the same processes you could terminate.

>> I share your estimate that there'll be small-fraction-of-a-percent
>> hazards that could still add up to dangerous instability if people
>> go wild with this.

> Right. I was more concerned about whether we could, for example, crash
> while inside the function that generates the backtrace, on some
> platforms or in some scenarios. That would be super-sad. I assume that
> the people who wrote the code tried to make sure that didn't happen
> but I don't really know what's reasonable to expect.

Recursion is scary, but it should (I think) not be possible if this
is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
of those in libbacktrace.

One point here is that it might be a good idea to suppress elog.c's
calls to functions in the error context stack.  As we saw in another
connection recently, allowing that to happen makes for a *very*
large increase in the footprint of code that you are expecting to
work at any random CHECK_FOR_INTERRUPTS call site.

BTW, it also looks like the patch is doing nothing to prevent the
backtrace from being sent to the connected client.  I'm not sure
what I think about whether it'd be okay from a security standpoint
to do that on the connection that requested the trace, but I sure
as heck don't want it to happen on connections that didn't.  Also,
whatever you think about security concerns, it seems like there'd be
pretty substantial reentrancy hazards if the interrupt occurs
anywhere in code dealing with normal client communication.

Maybe, given all of these things, we should forget using elog at
all and just emit the trace with fprintf(stderr).  That seems like
it would decrease the odds of trouble by about an order of magnitude.
It would make it more painful to collect the trace in some logging
setups, of course.

regards, tom lane




cfbot building docs - serving results

2021-01-19 Thread Erik Rijkers

Hi Thomas,

I am wondering if the cfbot at the moment is building the docs 
(html+pdf), for the patches that it tests.  I suppose that it does?  If 
so, what happens with the resulting (doc)files? To /dev/null?   They are 
not available as far as I can see.  Would it be feasible to make them 
available, either serving the html, or to make docs html+pdf a 
downloadable zipfile?


(it would also be useful to be able see at a glance somewhere if the 
patch contains sgml-changes at all...)



Thanks,

Erik Rijkers




Re: create table like: ACCESS METHOD

2021-01-19 Thread Justin Pryzby
On Wed, Dec 30, 2020 at 12:33:56PM +, Simon Riggs wrote:
> There are no tests for the new functionality, please could you add some?

Did you look at the most recent patch?

+CREATE ACCESS METHOD heapdup TYPE TABLE HANDLER heap_tableam_handler;
+CREATE TABLE likeam() USING heapdup;
+CREATE TABLE likeamlike(LIKE likeam INCLUDING ALL);

   

Also, I just realized that Dilip's toast compression patch adds "INCLUDING
COMPRESSION", which is stored in pg_am.  That's an implementation detail of
that patch, but it's not intuitive that "including access method" wouldn't
include the compression stored there.  So I think this should use "INCLUDING
TABLE ACCESS METHOD" not just ACCESS METHOD.  

-- 
Justin
>From f27fd6291aa10af1ca0be4bc72a656811c8e0c9f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 15 Nov 2020 16:54:53 -0600
Subject: [PATCH v3] create table (like .. including ACCESS METHOD)

---
 doc/src/sgml/ref/create_table.sgml  | 12 +++-
 src/backend/parser/gram.y   |  1 +
 src/backend/parser/parse_utilcmd.c  | 10 ++
 src/include/nodes/parsenodes.h  |  1 +
 src/test/regress/expected/create_table_like.out | 12 
 src/test/regress/sql/create_table_like.sql  |  8 
 6 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 569f4c9da7..e3c607f6b1 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -87,7 +87,7 @@ class="parameter">referential_action ] [ ON UPDATE and like_option is:
 
-{ INCLUDING | EXCLUDING } { COMMENTS | CONSTRAINTS | DEFAULTS | GENERATED | IDENTITY | INDEXES | STATISTICS | STORAGE | ALL }
+{ INCLUDING | EXCLUDING } { COMMENTS | CONSTRAINTS | DEFAULTS | GENERATED | IDENTITY | INDEXES | STATISTICS | STORAGE | TABLE ACCESS METHOD | ALL }
 
 and partition_bound_spec is:
 
@@ -689,6 +689,16 @@ WITH ( MODULUS numeric_literal, REM
 

 
+   
+INCLUDING TABLE ACCESS METHOD
+
+ 
+  The table's access method will be copied.  By default, the
+  default_table_access_method is used.
+ 
+
+   
+

 INCLUDING ALL
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 31c95443a5..719ac838e3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3708,6 +3708,7 @@ TableLikeOption:
 | INDEXES			{ $$ = CREATE_TABLE_LIKE_INDEXES; }
 | STATISTICS		{ $$ = CREATE_TABLE_LIKE_STATISTICS; }
 | STORAGE			{ $$ = CREATE_TABLE_LIKE_STORAGE; }
+| TABLE ACCESS METHOD { $$ = CREATE_TABLE_LIKE_TABLE_ACCESS_METHOD; }
 | ALL{ $$ = CREATE_TABLE_LIKE_ALL; }
 		;
 
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index b31f3afa03..f34f42aae3 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -96,6 +96,7 @@ typedef struct
 	bool		ispartitioned;	/* true if table is partitioned */
 	PartitionBoundSpec *partbound;	/* transformed FOR VALUES */
 	bool		ofType;			/* true if statement contains OF typename */
+	char		*accessMethod;	/* table access method */
 } CreateStmtContext;
 
 /* State shared by transformCreateSchemaStmt and its subroutines */
@@ -252,6 +253,7 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	cxt.ispartitioned = stmt->partspec != NULL;
 	cxt.partbound = stmt->partbound;
 	cxt.ofType = (stmt->ofTypename != NULL);
+	cxt.accessMethod = NULL;
 
 	Assert(!stmt->ofTypename || !stmt->inhRelations);	/* grammar enforces */
 
@@ -346,6 +348,9 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	stmt->tableElts = cxt.columns;
 	stmt->constraints = cxt.ckconstraints;
 
+	if (cxt.accessMethod != NULL)
+		stmt->accessMethod = cxt.accessMethod;
+
 	result = lappend(cxt.blist, stmt);
 	result = list_concat(result, cxt.alist);
 	result = list_concat(result, save_alist);
@@ -1118,6 +1123,11 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla
 		cxt->likeclauses = lappend(cxt->likeclauses, table_like_clause);
 	}
 
+	/* ACCESS METHOD doesn't apply and isn't copied for partitioned tables */
+	if ((table_like_clause->options & CREATE_TABLE_LIKE_TABLE_ACCESS_METHOD) != 0 &&
+		!cxt->ispartitioned)
+		cxt->accessMethod = get_am_name(relation->rd_rel->relam);
+
 	/*
 	 * We may copy extended statistics if requested, since the representation
 	 * of CreateStatsStmt doesn't depend on column numbers.
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index dc2bb40926..600856c229 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -685,6 +685,7 @@ typedef enum TableLikeOption
 	

Re: Add primary keys to system catalogs

2021-01-19 Thread Joel Jacobson
On Tue, Jan 19, 2021, at 18:25, Mark Rofail wrote:
>Dear Joel,
>
>I would love to start working on this again, I tried to revive the patch again 
>in 2019, however, I faced the same problems as >last time (detailed in the 
>thread) and I didn't get the support I needed to continue with this patch.
>
>Are you willing to help rebase and finally publish this patch?

Willing, sure, not perhaps not able,
if there are complex problems that need a deep understanding of code base,
I'm not sure I will be of much help, but I can for sure do testing and 
reviewing.

/Joel

Re: [PATCH v2 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread Tom Lane
James Hilliard  writes:
> I also don't think this is a weak symbol.

> From the header file it is not have __attribute__((weak_import)):
> ssize_t pwritev(int, const struct iovec *, int, off_t)
> __DARWIN_NOCANCEL(pwritev) __API_AVAILABLE(macos(11.0), ios(14.0),
> watchos(7.0), tvos(14.0));

See the other thread.  I found by looking at the asm output that
what __API_AVAILABLE actually does is cause the compiler to emit
a ".weak_reference" directive when calling a function it thinks
might not be available.  So there's some sort of weak linking
going on, though it's certainly possible that it's not shaped
in a way that'd help us do this the way we'd prefer.

regards, tom lane




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-19 Thread Matthias van de Meent
On Mon, 18 Jan 2021, 21:25 Álvaro Herrera,  wrote:
>
> On 2021-Jan-18, Matthias van de Meent wrote:
>
> > Example:
> >
> > 1.) RI starts
> > 2.) PHASE 2: filling the index:
> > 2.1.) scanning the heap (live tuple is cached)
> > < tuple is deleted
> > < last transaction other than RI commits, only snapshot of RI exists
> > < vacuum drops the tuple, and cannot remove it from the new index
> > because this new index is not yet populated.
> > 2.2.) sorting tuples
> > 2.3.) index filled with tuples, incl. deleted tuple
> > 3.) PHASE 3: wait for transactions
> > 4.) PHASE 4: validate does not remove the tuple from the index,
> > because it is not built to do so: it will only insert new tuples.
> > Tuples that are marked for deletion are removed from the index only
> > through VACUUM (and optimistic ALL_DEAD detection).
> >
> > According to my limited knowledge of RI, it requires VACUUM to not run
> > on the table during the initial index build process (which is
> > currently guaranteed through the use of a snapshot).
>
> VACUUM cannot run concurrently with CIC or RI in a table -- both acquire
> ShareUpdateExclusiveLock, which conflicts with itself, so this cannot
> occur.

Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock.
Are there no other ways that pages are optimistically pruned?

But the base case still stands, ignoring CIC snapshots in  would give
the semantic of all_dead to tuples that are actually still considered
alive in some context, and should not yet be deleted (you're deleting
data from an in-use snapshot). Any local pruning optimizations using
all_dead mechanics now cannot be run on the table unless they hold an
ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms
(other than below).

> I do wonder if the problem you suggest (or something similar) can occur
> via HOT pruning, though.

It could not, at least not at the current HEAD, as only one tuple in a
HOT-chain can be alive at one point, and all indexes point to the root
of the HOT-chain, which is never HOT-pruned. See also the
src/backend/access/heap/README.HOT.

Regards,

Matthias van de Meent




Re: Printing backtrace of postgres processes

2021-01-19 Thread Robert Haas
On Tue, Jan 19, 2021 at 12:50 PM Tom Lane  wrote:
> The thing that is scaring me the most is the "broadcast" aspect.
> For starters, I think that that is going to be useless in the
> field because of the likelihood that different backends' stack
> traces will get interleaved in whatever log file the traces are
> going to.  Also, having hundreds of processes spitting dozens of
> lines to the same place at the same time is going to expose any
> little weaknesses in your logging arrangements, such as rsyslog's
> tendency to drop messages under load.

+1. I don't think broadcast is a good idea.

> I think it's got security hazards as well.  If we restricted the
> feature to cause a trace of only one process at a time, and required
> that process to be logged in as the same database user as the
> requestor (or at least someone with the privs of that user, such
> as a superuser), that'd be less of an issue.

I am not sure I see a security hazard here. I think the checks for
this should have the same structure as pg_terminate_backend() and
pg_cancel_backend(); whatever is required there should be required
here, too, but not more, unless we have a real clear reason for such
an inconsistency.

> Beyond that, well, maybe it's all right.  In theory anyplace that
> there's a CHECK_FOR_INTERRUPTS should be okay to call elog from;
> but it's completely untested whether we can do that and then
> continue, as opposed to aborting the transaction or whole session.

I guess that's a theoretical risk but it doesn't seem very likely.
And, if we do have such a problem, I think that'd probably be a case
of bad code that we would want to fix either way.

> I share your estimate that there'll be small-fraction-of-a-percent
> hazards that could still add up to dangerous instability if people
> go wild with this.

Right. I was more concerned about whether we could, for example, crash
while inside the function that generates the backtrace, on some
platforms or in some scenarios. That would be super-sad. I assume that
the people who wrote the code tried to make sure that didn't happen
but I don't really know what's reasonable to expect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




compression libraries and CF bot

2021-01-19 Thread Justin Pryzby
Do you know if the old travis build environment had liblz4 installed ?

I'm asking regarding Dilip's patch, which was getting to "check world" 2 weeks
ago but now failing to even compile, not apparently due to any change in the
patch.  Also, are the historic logs available somewhere ?
http://cfbot.cputube.org/dilip-kumar.html

Also, what's the process for having new libraries installed in the CI
environment ?

There's 3 compression patches going around, so I think eventually we'll ask to
get libzstd-devel (for libpq and pg_dump) and liblz4-devel (for toast and
libpq).  Maybe all compression methods would be supported in each place - I
hope the patches will share common code.

-- 
Justin




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread Tom Lane
James Hilliard  writes:
> On Tue, Jan 19, 2021 at 10:17 AM Tom Lane  wrote:
>> Ah, got it.  So "xcrun --show-sdk-path" tells us the right thing (that
>> is, it *does* give us a symlink to a 10.15 SDK) but by refusing to
>> believe we've got the right thing, we end up picking MacOSX11.1.sdk.
>> Drat.  I suppose we could drop the heuristic about wanting a version
>> number in the SDK path, but I really don't want to do that.  Now I'm
>> thinking about trying to dereference the symlink after the first step.

> The MacOSX11.1.sdk can build for a 10.15 target just fine when passed
> an appropriate MACOSX_DEPLOYMENT_TARGET, so that SDK should be
> fine.

But our out-of-the-box default should be to build for the current
platform; we don't want users to have to set MACOSX_DEPLOYMENT_TARGET
for that case.  Besides, the problem we're having is exactly that Apple's
definition of "builds for a 10.15 target just fine" is different from
ours.  They think you should use a run-time test not a compile-time test
to discover whether preadv is available, and we don't want to do that.

In almost all of the cases I've seen so far, Apple's compiler actually
does default to using an SDK matching the platform.  The problem we
have is that we try to name the SDK explicitly, and the current
method is failing to pick the right one in your case.  There are
several reasons for using an explicit -isysroot rather than just
letting the compiler default:

* We have seen cases in which the compiler acts as though it has
*no* default sysroot, and we have to help it out.

* The explicit root reduces version-skew build hazards for extensions
that are not built at the same time as the core system.

* There are a few tests in configure itself that need to know the
sysroot path to check for files there.

Anyway, the behavior you're seeing shows that 4823621db is still a
bit shy of a load.  I'm thinking about the attached as a further
fix --- can you verify it helps for you?

regards, tom lane

diff --git a/src/template/darwin b/src/template/darwin
index 1868c147cb..e14d53b601 100644
--- a/src/template/darwin
+++ b/src/template/darwin
@@ -7,13 +7,20 @@
 if test x"$PG_SYSROOT" = x"" ; then
   # This is far more complicated than it ought to be.  We first ask
   # "xcrun --show-sdk-path", which seems to match the default -isysroot
-  # setting of Apple's compilers.  However, that may produce no result or
-  # a result that is not version-specific (i.e., just ".../SDKs/MacOSX.sdk").
-  # Using a version-specific sysroot seems desirable, so if there are not
-  # digits in the directory name, try "xcrun --sdk macosx --show-sdk-path";
-  # and if that still doesn't work, fall back to asking xcodebuild,
-  # which is often a good deal slower.
+  # setting of Apple's compilers.
   PG_SYSROOT=`xcrun --show-sdk-path 2>/dev/null`
+  # That may fail, or produce a result that is not version-specific (i.e.,
+  # just ".../SDKs/MacOSX.sdk").  Using a version-specific sysroot seems
+  # desirable, so if the path is a non-version-specific symlink, expand it.
+  if test -L "$PG_SYSROOT"; then
+if expr x"$PG_SYSROOT" : '.*[0-9]\.[0-9][^/]*$' >/dev/null ; then : okay
+else
+  PG_SYSROOT=`expr "$PG_SYSROOT" : '\(.*\)/'`/`readlink "$PG_SYSROOT"`
+fi
+  fi
+  # If there are still not digits in the directory name, try
+  # "xcrun --sdk macosx --show-sdk-path"; and if that still doesn't work,
+  # fall back to asking xcodebuild, which is often a good deal slower.
   if expr x"$PG_SYSROOT" : '.*[0-9]\.[0-9][^/]*$' >/dev/null ; then : okay
   else
 PG_SYSROOT=`xcrun --sdk macosx --show-sdk-path 2>/dev/null`


Re: Some coverage for DROP OWNED BY with pg_default_acl

2021-01-19 Thread Alvaro Herrera
On 2021-Jan-19, Michael Paquier wrote:

> And while reviewing the thing, I have spotted that there is a specific
> path for pg_default_acl in RemoveRoleFromObjectACL() that has zero
> coverage.  This can be triggered with DROP OWNED BY, and it is
> actually safe to run as long as this is done in a separate transaction
> to avoid any interactions with parallel regression sessions.
> privileges.sql already has similar tests, so I'd like to add some
> coverage as per the attached (the duplicated role name is wanted).

Heh, interesting case.  Added coverage is good, so +1.
Since the role regress_priv_user2 is "private" to the privileges.sql
script, there's no danger of a concurrent test getting the added lines
in trouble AFAICS.

> +SELECT count(*) FROM pg_shdepend
> +  WHERE deptype = 'a' AND
> +refobjid = 'regress_priv_user2'::regrole AND
> + classid = 'pg_default_acl'::regclass;
> + count 
> +---
> + 5
> +(1 row)

Shrug.  Seems sufficient.

-- 
Álvaro Herrera   Valdivia, Chile




Re: [Patch] ALTER SYSTEM READ ONLY

2021-01-19 Thread Robert Haas
On Thu, Jan 14, 2021 at 6:29 AM Amul Sul  wrote:
> To move development, testing, and the review forward, I have commented out the
> code acquiring CheckpointLock from CreateCheckPoint() in the 0003 patch and
> added the changes for the checkpointer so that system read-write state change
> request can be processed as soon as possible, as suggested by Robert[1].

I am extremely doubtful about SetWALProhibitState()'s claim that "The
final state can only be requested by the checkpointer or by the
single-user so that there will be no chance that the server is already
in the desired final state." It seems like there is an obvious race
condition: CompleteWALProhibitChange() is called with a cur_state_gen
argument which embeds the last state we saw, but there's nothing to
keep it from changing between the time we saw it and the time that
function calls SetWALProhibitState(), is there? We aren't holding any
lock. It seems to me that SetWALProhibitState() needs to be rewritten
to avoid this assumption.

On a related note, SetWALProhibitState() has only two callers. One
passes is_final_state as true, and the other as false: it's never a
variable. The two cases are handled mostly differently. This doesn't
seem good. A lot of the logic in this function should probably be
moved to the calling sites, especially because it's almost certainly
wrong for this function to be basing what it does on the *current* WAL
prohibit state rather than the WAL prohibit state that was in effect
at the time we made the decision to call this function in the first
place. As I mentioned in the previous paragraph, that's a built-in
race condition. To put that another way, this function should NOT feel
free to call GetWALProhibitStateGen().

I don't really see why we should have both an SQL callable function
pg_alter_wal_prohibit_state() and also a DDL command for this. If
we're going to go with a functional interface, and I guess the idea of
that is to make it so GRANT EXECUTE works, then why not just get rid
of the DDL?

RequestWALProhibitChange() doesn't look very nice. It seems like it's
basically the second half of pg_alter_wal_prohibit_state(), not being
called from anywhere else. It doesn't seem to add anything to separate
it out like this; the interface between the two is not especially
clean.

It seems odd that ProcessWALProhibitStateChangeRequest() returns
without doing anything if !AmCheckpointerProcess(), rather than having
that be an Assert(). Why is it like that?

I think WALProhibitStateShmemInit() would probably look more similar
to other functions if it did if (found) { stuff; } rather than if
(!found) return; stuff; -- but I might be wrong about the existing
precedent.

The SetLastCheckPointSkipped() and LastCheckPointIsSkipped() stuff
seems confusingly-named, because we have other reasons for skipping a
checkpoint that are not what we're talking about here. I think this is
talking about whether we've performed a checkpoint after recovery, and
the naming should reflect that. But I think there's something else
wrong with the design, too: why is this protected by a spinlock? I
have questions in both directions. On the one hand, I wonder why we
need any kind of lock at all. On the other hand, if we do need a lock,
I wonder why a spinlock that protects only the setting and clearing of
the flag and nothing else is sufficient. There are zero comments
explaining what the idea behind this locking regime is, and I can't
understand why it should be correct.

In fact, I think this area needs a broader rethink. Like, the way you
integrated that stuff into StartupXLog(), it sure looks to me like we
might skip the checkpoint but still try to write other WAL records.
Before we reach the offending segment of code, we call
UpdateFullPageWrites(). Afterwards, we call XLogReportParameters().
Both of those are going to potentially write WAL. I guess you could
argue that's OK, on the grounds that neither function is necessarily
going to log anything, but I don't think I believe that. If I make my
server read only, take the OS down, change some GUCs, and then start
it again, I don't expect it to PANIC.

Also, I doubt that it's OK to skip the checkpoint as this code does
and then go ahead and execute recovery_end_command and update the
control file anyway. It sure looks like the existing code is written
with the assumption that the checkpoint happens before those other
things. One idea I just had was: suppose that, if the system is READ
ONLY, we don't actually exit recovery right away, and the startup
process doesn't exit. Instead we just sit there and wait for the
system to be made read-write again before doing anything else. But
then if hot_standby=false, there's no way for someone to execute a
ALTER SYSTEM READ WRITE and/or pg_alter_wal_prohibit_state(), which
seems bad. So perhaps we need to let in regular connections *as if*
the system were read-write while postponing not just the
end-of-recovery checkpoint but also the other associated 

Re: WIP: BRIN multi-range indexes

2021-01-19 Thread John Naylor
On Tue, Jan 12, 2021 at 1:42 PM Tomas Vondra 
wrote:
> I suspect it'd due to minmax having to decide which "ranges" to merge,
> which requires repeated sorting, etc. I certainly don't dare to claim
> the current algorithm is perfect. I wouldn't have expected such big
> difference, though - so definitely worth investigating.

It seems that monotonically increasing (or decreasing) values in a table
are a worst case scenario for multi-minmax indexes, or basically, unique
values within a range. I'm guessing it's because it requires many passes
to fit all the values into a limited number of ranges. I tried using
smaller pages_per_range numbers, 32 and 8, and that didn't help.

Now, with a different data distribution, using only 10 values that repeat
over and over, the results are much more sympathetic to multi-minmax:

insert into iot (num, create_dt)
select random(), '2020-01-01 0:00'::timestamptz + (x % 10 || '
seconds')::interval
from generate_series(1,5*365*24*60*60) x;

create index cd_single on iot using brin(create_dt);
27.2s

create index cd_multi on iot using brin(create_dt
timestamptz_minmax_multi_ops);
30.4s

create index cd_bt on iot using btree(create_dt);
61.8s

Circling back to the monotonic case, I tried running a simple perf record
on a backend creating a multi-minmax index on a timestamptz column and
these were the highest non-kernel calls:
+   21.98%21.91%  postgres postgres[.]
FunctionCall2Coll
+9.31% 9.29%  postgres postgres[.]
compare_combine_ranges
+8.60% 8.58%  postgres postgres[.] qsort_arg
+5.68% 5.66%  postgres postgres[.]
brin_minmax_multi_add_value
+5.63% 5.60%  postgres postgres[.] timestamp_lt
+4.73% 4.71%  postgres postgres[.]
reduce_combine_ranges
+3.80% 0.00%  postgres [unknown]   [.]
0x032001680004
+3.51% 3.50%  postgres postgres[.] timestamp_eq

There's no one place that's pathological enough to explain the 4x slowness
over traditional BRIN and nearly 3x slowness over btree when using a large
number of unique values per range, so making progress here would have to
involve a more holistic approach.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Support for NSS as a libpq TLS backend

2021-01-19 Thread Daniel Gustafsson
> On 4 Dec 2020, at 01:57, Jacob Champion  wrote:
> 
> On Nov 17, 2020, at 7:00 AM, Daniel Gustafsson  wrote:
>> 
>> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
>> which also fixes client side error reporting to be more readable.
> 
> I was testing handshake failure modes and noticed that some FATAL
> messages are being sent through to the client in cleartext. The OpenSSL
> implementation doesn't do this, because it logs handshake problems at
> COMMERROR level. Should we switch all those ereport() calls in the NSS
> be_tls_open_server() to COMMERROR as well (and return explicitly), to
> avoid this? Or was there a reason for logging at FATAL/ERROR level?

The ERROR logging made early development easier but then stuck around, I've
changed them to COMMERROR returning an error instead in the v21 patch just
sent to the list.

> Related note, at the end of be_tls_open_server():
> 
>>...
>>port->ssl_in_use = true;
>>return 0;
>> 
>> error:
>>return 1;
>> }
> 
> This needs to return -1 in the error case; the only caller of
> secure_open_server() does a direct `result == -1` comparison rather than
> checking `result != 0`.

Fixed.

cheers ./daniel



Re: Add primary keys to system catalogs

2021-01-19 Thread Tom Lane
Laurenz Albe  writes:
> On Sun, 2021-01-17 at 17:07 -0500, Tom Lane wrote:
>> [...] do we really want to prefer
>> using the OID indexes as the primary keys?  In most cases there's some
>> other index that seems to me to be what a user would think of as the
>> pkey, for example pg_class_relname_nsp_index for pg_class or
>> pg_proc_proname_args_nsp_index for pg_proc.  Preferring OID where it
>> exists is a nice simple rule, which has some attractiveness, but the
>> OID indexes seem to me like a lookup aid rather than the "real" object
>> identity.

> I disagree.  The OID is the real, immutable identity of an object.
> The "relname" of a "pg_class" encatalogtry can change any time.
> Since there are no foreign keys that reference catalogs, that won't cause
> problems, but I still think that primary keys should change as little as
> possible.

Yeah, there's also the point that the OID is what we use for "foreign
key" references from other catalogs.  I don't deny that there are
reasons to think of OID as the pkey.  But as long as these constraints
are basically cosmetic, it seemed to me that we should consider the
other approach.  I'm not dead set on that, just wanted to discuss it.

regards, tom lane




Re: Change default of checkpoint_completion_target

2021-01-19 Thread Tom Lane
Stephen Frost  writes:
> In passing, I noticed that we have a lot of documentation like:

> This parameter can only be set in the postgresql.conf file or on the
> server command line.

> ... which hasn't been true since the introduction of ALTER SYSTEM.

Well, it's still true if you understand "the postgresql.conf file"
to cover whatever's included by postgresql.conf, notably
postgresql.auto.conf (and the include facility existed long before
that, too, so you needed the expanded interpretation even then).
Still, I take your point that it's confusing.

I like your suggestion of shortening all of these to be "can only be set
at server start", or maybe better "cannot be changed after server start".
I'm not sure whether or not we really need new text elsewhere; I think
section 20.1 is pretty long already.

regards, tom lane




Re: Change default of checkpoint_completion_target

2021-01-19 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Any further comments or thoughts on this one?
> 
> This:
> 
> +total time between checkpoints. The default is 0.9, which spreads the
> +checkpoint across the entire checkpoint timeout period of time,
> 
> is confusing because 0.9 is obviously not 1.0; people will wonder
> whether the scale is something strange or the text is just wrong.
> They will also wonder why not use 1.0 instead.  So perhaps more like
> 
>   ... The default is 0.9, which spreads the checkpoint across almost
>   all the available interval, providing fairly consistent I/O load
>   while also leaving some slop for checkpoint completion overhead.
> 
> The other chunk of text seems accurate, but there's no reason to let
> this one be misleading.

Good point, updated along those lines.

In passing, I noticed that we have a lot of documentation like:

This parameter can only be set in the postgresql.conf file or on the
server command line.

... which hasn't been true since the introduction of ALTER SYSTEM.  I
don't really think it's this patch's job to clean that up but it doesn't
seem quite right that we don't include ALTER SYSTEM in that list either.
If this was C code, maybe we could get away with just changing such
references as we find them, but I don't think we'd want the
documentation to be in an inconsistent state regarding that.

Anyone want to opine about what to do with that?  Should we consider
changing those to mention ALTER SYSTEM?  Or perhaps have a way of saying
"at server start" that then links to "how to set options at server
start", perhaps..

Thanks,

Stephen
From 97c24d92e4ae470a257aa2ac9501032aba5edd82 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change the default of checkpoint_completion_target to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml  | 12 ++--
 doc/src/sgml/wal.sgml | 29 ---
 src/backend/utils/misc/guc.c  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl|  1 -
 5 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 82864bbb24..666b467eda 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3266,9 +3266,15 @@ include_dir 'conf.d'
   

 Specifies the target of checkpoint completion, as a fraction of
-total time between checkpoints. The default is 0.5.
-This parameter can only be set in the postgresql.conf
-file or on the server command line.
+total time between checkpoints. The default is 0.9, which spreads the
+checkpoint across almost all of the available interval, providing fairly
+consistent I/O load while also leaving some slop for checkpoint
+completion overhead.  Reducing this parameter is not recommended as that
+causes the I/O from the checkpoint to have to complete faster, resulting
+in a higher I/O rate, while then having a period of less I/O between the
+completion of the checkpoint and the start of the next scheduled
+checkpoint.  This parameter can only be set in the
+postgresql.conf file or on the server command line.

   
  
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..733eba22db 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
writing dirty buffers during a checkpoint is spread over a period of time.
That period is controlled by
, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   checkpoint_timeout).
The I/O rate is adjusted so that the checkpoint finishes when the
given fraction of
checkpoint_timeout seconds have elapsed, or before
max_wal_size is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
PostgreSQL can be expected to complete each checkpoint
-   in about half the time before the 

Re: Change default of checkpoint_completion_target

2021-01-19 Thread Tom Lane
Stephen Frost  writes:
> Any further comments or thoughts on this one?

This:

+total time between checkpoints. The default is 0.9, which spreads the
+checkpoint across the entire checkpoint timeout period of time,

is confusing because 0.9 is obviously not 1.0; people will wonder
whether the scale is something strange or the text is just wrong.
They will also wonder why not use 1.0 instead.  So perhaps more like

... The default is 0.9, which spreads the checkpoint across almost
all the available interval, providing fairly consistent I/O load
while also leaving some slop for checkpoint completion overhead.

The other chunk of text seems accurate, but there's no reason to let
this one be misleading.

regards, tom lane




Re: Change default of checkpoint_completion_target

2021-01-19 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:
> On 2021-01-13 23:10, Stephen Frost wrote:
> >>Yes, I agree, and am involved in that thread as well- currently waiting
> >>feedback from others about the proposed approach.
> >I've tried to push that forward.  I'm happy to update this patch once
> >we've got agreement to move forward on that, to wit, adding to an
> >'obsolete' section in the documentation information about this
> >particular GUC and how it's been removed due to not being sensible or
> >necessary to continue to have.
> 
> Some discussion a few days ago was arguing that it was still necessary in
> some cases as a way to counteract the possible lack of tuning in the kernel
> flushing behavior.  I think in light of that we should go with your first
> patch that just changes the default, possibly with the documentation updated
> a bit.

Rebased and updated patch attached which moves back to just changing the
default instead of removing the option, with a more explicit call-out of
the '90%', as suggested by Michael on the other patch.

Any further comments or thoughts on this one?

Thanks,

Stephen
From 335b8e630fae6c229f27f70f85847e29dfc1b783 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change the default of checkpoint_completion_target to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml  |  8 -
 doc/src/sgml/wal.sgml | 29 ---
 src/backend/utils/misc/guc.c  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl|  1 -
 5 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 82864bbb24..7e06d0febb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3266,7 +3266,13 @@ include_dir 'conf.d'
   

 Specifies the target of checkpoint completion, as a fraction of
-total time between checkpoints. The default is 0.5.
+total time between checkpoints. The default is 0.9, which spreads the
+checkpoint across the entire checkpoint timeout period of time,
+providing a consistent amount of I/O during the entire checkpoint.
+Reducing this parameter is not recommended as that causes the I/O from
+the checkpoint to have to complete faster, resulting in a higher I/O
+rate, while then having a period of less I/O between the completion of
+the checkpoint and the start of the next scheduled checkpoint.
 This parameter can only be set in the postgresql.conf
 file or on the server command line.

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..733eba22db 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
writing dirty buffers during a checkpoint is spread over a period of time.
That period is controlled by
, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   checkpoint_timeout).
The I/O rate is adjusted so that the checkpoint finishes when the
given fraction of
checkpoint_timeout seconds have elapsed, or before
max_wal_size is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
PostgreSQL can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase checkpoint_completion_target
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   checkpoint_completion_target can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   

Re: [HACKERS] Custom compression methods

2021-01-19 Thread Justin Pryzby
Thanks for updating the patch.

On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby  wrote:
> The most recent patch doesn't compile --without-lz4:
On Tue, Jan 05, 2021 at 11:19:33AM +0530, Dilip Kumar wrote:
> On Mon, Jan 4, 2021 at 10:08 PM Justin Pryzby  wrote:
> > I think I first saw it on cfbot and I reproduced it locally, too.
> > http://cfbot.cputube.org/dilip-kumar.html
> >
> > I think you'll have to make --without-lz4 the default until the build
> > environments include it, otherwise the patch checker will show red :(
> 
> Oh ok,  but if we make by default --without-lz4 then the test cases
> will start failing which is using lz4 compression.  Am I missing
> something?

The CIs are failing like this:

http://cfbot.cputube.org/dilip-kumar.html
|checking for LZ4_compress in -llz4... no
|configure: error: lz4 library not found
|If you have lz4 already installed, see config.log for details on the
|failure.  It is possible the compiler isn't looking in the proper directory.
|Use --without-lz4 to disable lz4 support.

I thought that used to work (except for windows).  I don't see that anything
changed in the configure tests...  Is it because the CI moved off travis 2
weeks ago ?  I don't' know whether the travis environment had liblz4, and I
don't remember if the build was passing or if it was failing for some other
reason.  I'm guessing historic logs from travis are not available, if they ever
were.

I'm not sure how to deal with that, but maybe you'd need:
1) A separate 0001 patch *allowing* LZ4 to be enabled/disabled;
2) Current patchset needs to compile with/without LZ4, and pass tests in both
cases - maybe you can use "alternate test" output [0] to handle the "without"
case.
3) Eventually, the CI and build environments may have LZ4 installed, and then
we can have a separate debate about whether to enable it by default.

[0] cp -iv src/test/regress/results/compression.out 
src/test/regress/expected/compression_1.out

On Tue, Jan 05, 2021 at 02:20:26PM +0530, Dilip Kumar wrote:
> On Tue, Jan 5, 2021 at 11:19 AM Dilip Kumar  wrote:
> > On Mon, Jan 4, 2021 at 10:08 PM Justin Pryzby  wrote:
> > > I see the windows build is failing:
> > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.123730
> > > |undefined symbol: HAVE_LIBLZ4 at src/include/pg_config.h line 350 at 
> > > src/tools/msvc/Mkvcbuild.pm line 852.
> > > This needs to be patched: src/tools/msvc/Solution.pm
> > > You can see my zstd/pg_dump patch for an example, if needed (actually I'm 
> > > not
> > > 100% sure it's working yet, since the windows build failed for another 
> > > reason).
> >
> > Okay, I will check that.

This still needs help.
perl ./src/tools/msvc/mkvcbuild.pl
...
undefined symbol: HAVE_LIBLZ4 at src/include/pg_config.h line 350 at 
/home/pryzbyj/src/postgres/src/tools/msvc/Mkvcbuild.pm line 852.

Fix like:

+   HAVE_LIBLZ4 => $self->{options}->{zlib} ? 1 : 
undef,

Some more language fixes:

commit 3efafee52414503a87332fa6070541a3311a408c
Author: dilipkumar 
Date:   Tue Sep 8 15:24:33 2020 +0530

Built-in compression method

+  If the compression method is not specified for the compressible type then
+  it will have the default compression method.  The default compression

I think this should say:
If no compression method is specified, then compressible types will have the
default compression method (pglz).

+ *
+ * Since version 11 TOAST_COMPRESS_SET_RAWSIZE also marks compressed

Should say v14 ??

diff --git a/src/include/catalog/pg_attribute.h 
b/src/include/catalog/pg_attribute.h
index 059dec3647..e4df6bc5c1 100644
--- a/src/include/catalog/pg_attribute.h
+++ b/src/include/catalog/pg_attribute.h
@@ -156,6 +156,14 @@ CATALOG(pg_attribute,1249,AttributeRelationId) 
BKI_BOOTSTRAP BKI_ROWTYPE_OID(75,
/* attribute's collation */
Oid attcollation;
 
+   /*
+* Oid of the compression method that will be used for compressing the 
value
+* for this attribute.  For the compressible atttypid this must always 
be a

say "For compressible types, ..."

+* valid Oid irrespective of what is the current value of the 
attstorage.
+* And for the incompressible atttypid this must always be an invalid 
Oid.

say "must be InvalidOid"
 
@@ -685,6 +686,7 @@ typedef enum TableLikeOption
CREATE_TABLE_LIKE_INDEXES = 1 << 5,
CREATE_TABLE_LIKE_STATISTICS = 1 << 6,
CREATE_TABLE_LIKE_STORAGE = 1 << 7,
+   CREATE_TABLE_LIKE_COMPRESSION = 1 << 8,

This is interesting...
I have a patch to implement LIKE .. (INCLUDING ACCESS METHOD).
I guess I should change it to say LIKE .. (TABLE ACCESS METHOD), right ?
https://commitfest.postgresql.org/31/2865/

Your first patch is large due to updating a large number of test cases to
include the "compression" column in \d+ output.  Maybe that column should be
hidden when HIDE_TABLEAM is set by pg_regress ?  I think that would allow
testing with alternate, default 

Re: Add docs stub for recovery.conf

2021-01-19 Thread Stephen Frost
Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Craig Ringer (craig.rin...@enterprisedb.com) wrote:
> > On Thu, 14 Jan 2021 at 03:44, Stephen Frost  wrote:
> > > Alright, how does this look?  The new entries are all under the
> > > 'obsolete' section to keep it out of the main line, but should work to
> > > 'fix' the links that currently 404 and provide a bit of a 'softer'
> > > landing for the other cases that currently just forcibly redirect using
> > > the website doc alias capability.
> > 
> > Thanks for expanding the change to other high profile obsoleted or renamed
> > features and tools.
> 
> Thanks for taking the time to review it and comment on it!
> 
> > One minor point. I'm not sure this is quite the best way to spell the index
> > entries:
> > 
> > +   
> > + obsolete
> > + pg_receivexlog
> > +   
> > 
> > as it will produce an index term "obsolete" with a list of various
> > components under it. While that concentrates them nicely, it means people
> > won't actually find them if they're using the index alphabetically.
> 
> Ah, yeah, that's definitely a good point and one that I hadn't really
> spent much time thinking about.
> 
> > I'd slightly prefer
> > 
> > +   
> > + pg_receivexlog
> > + pg_receivewal
> > +   
> > 
> > even though that bulks the index up a little, because then people are a bit
> > more likely to find it.
> 
> Yup, makes sense, updated patch attached which makes that change.
> 
> > > I ended up not actually doing this for the catalog -> view change of
> > > pg_replication_slots simply because I don't really think folks will
> > > misunderstand or be confused by that redirect since it's still the same
> > > relation.  If others disagree though, we could certainly change that
> > > too.
> > 
> > I agree with you.
> 
> Ok, great.
> 
> How does the attached look then?
> 
> Bruce, did you want to review or comment on this as to if it addresses
> your concerns appropriately?  Would be great to get this in as there's
> the follow-on for default roles.

... really attached now, sorry about that. :)

Thanks,

Stephen
diff --git a/doc/src/sgml/appendix-obsolete-pgreceivexlog.sgml b/doc/src/sgml/appendix-obsolete-pgreceivexlog.sgml
new file mode 100644
index 00..391eb5dcb2
--- /dev/null
+++ b/doc/src/sgml/appendix-obsolete-pgreceivexlog.sgml
@@ -0,0 +1,26 @@
+
+
+
+
+  The pg_receivexlog command
+
+   
+ pg_receivexlog
+ pg_receivewal
+   
+
+   
+PostgreSQL 9.6 and below provided a command named
+pg_receivexlog
+pg_receivexlog
+to fetch write-ahead-log (WAL) files.  This command was renamed to pg_receivewal, see
+ for documentation of pg_receivewal and see
+the release notes for PostgreSQL 10 for details
+on this change.
+   
+
+
diff --git a/doc/src/sgml/appendix-obsolete-pgresetxlog.sgml b/doc/src/sgml/appendix-obsolete-pgresetxlog.sgml
new file mode 100644
index 00..44452b5627
--- /dev/null
+++ b/doc/src/sgml/appendix-obsolete-pgresetxlog.sgml
@@ -0,0 +1,26 @@
+
+
+
+
+  The pg_resetxlog command
+
+   
+ pg_resetxlog
+ pg_resetwal
+   
+
+   
+PostgreSQL 9.6 and below provided a command named
+pg_resetxlog
+pg_resetxlog
+to reset the write-ahead-log (WAL) files.  This command was renamed to pg_resetwal, see
+ for documentation of pg_resetwal and see
+the release notes for PostgreSQL 10 for details
+on this change.
+   
+
+
diff --git a/doc/src/sgml/appendix-obsolete-pgxlogdump.sgml b/doc/src/sgml/appendix-obsolete-pgxlogdump.sgml
new file mode 100644
index 00..325316b4e6
--- /dev/null
+++ b/doc/src/sgml/appendix-obsolete-pgxlogdump.sgml
@@ -0,0 +1,26 @@
+
+
+
+
+  The pg_xlogdump command
+
+   
+ pg_xlogdump
+ pg_waldump
+   
+
+   
+PostgreSQL 9.6 and below provided a command named
+pg_xlogdump
+pg_xlogdump
+to read write-ahead-log (WAL) files.  This command was renamed to pg_waldump, see
+ for documentation of pg_waldump and see
+the release notes for PostgreSQL 10 for details
+on this change.
+   
+
+
diff --git a/doc/src/sgml/appendix-obsolete-recovery-config.sgml b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
new file mode 100644
index 00..3f858e5cb0
--- /dev/null
+++ b/doc/src/sgml/appendix-obsolete-recovery-config.sgml
@@ -0,0 +1,60 @@
+
+
+
+
+  The recovery.conf file
+
+   
+ recovery.conf
+   
+
+   
+PostgreSQL 11 and below used a configuration file named
+recovery.conf
+recovery.conf
+to manage replicas and standbys. Support for this file was removed in PostgreSQL 12. See
+the release notes for PostgreSQL 12 for details
+on this change.
+   
+
+   
+On PostgreSQL 12 and above,
+archive recovery, streaming replication, and PITR
+are configured using
+normal server configuration parameters.
+These are set in postgresql.conf or via
+ALTER SYSTEM
+like any other parameter.
+   
+
+   
+The server will not start if a recovery.conf exists.
+   

Re: Add docs stub for recovery.conf

2021-01-19 Thread Stephen Frost
Greetings,

* Craig Ringer (craig.rin...@enterprisedb.com) wrote:
> On Thu, 14 Jan 2021 at 03:44, Stephen Frost  wrote:
> > Alright, how does this look?  The new entries are all under the
> > 'obsolete' section to keep it out of the main line, but should work to
> > 'fix' the links that currently 404 and provide a bit of a 'softer'
> > landing for the other cases that currently just forcibly redirect using
> > the website doc alias capability.
> 
> Thanks for expanding the change to other high profile obsoleted or renamed
> features and tools.

Thanks for taking the time to review it and comment on it!

> One minor point. I'm not sure this is quite the best way to spell the index
> entries:
> 
> +   
> + obsolete
> + pg_receivexlog
> +   
> 
> as it will produce an index term "obsolete" with a list of various
> components under it. While that concentrates them nicely, it means people
> won't actually find them if they're using the index alphabetically.

Ah, yeah, that's definitely a good point and one that I hadn't really
spent much time thinking about.

> I'd slightly prefer
> 
> +   
> + pg_receivexlog
> + pg_receivewal
> +   
> 
> even though that bulks the index up a little, because then people are a bit
> more likely to find it.

Yup, makes sense, updated patch attached which makes that change.

> > I ended up not actually doing this for the catalog -> view change of
> > pg_replication_slots simply because I don't really think folks will
> > misunderstand or be confused by that redirect since it's still the same
> > relation.  If others disagree though, we could certainly change that
> > too.
> 
> I agree with you.

Ok, great.

How does the attached look then?

Bruce, did you want to review or comment on this as to if it addresses
your concerns appropriately?  Would be great to get this in as there's
the follow-on for default roles.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-19 Thread Pavel Stehule
Hi

I found minor issues.

Doc - missing tag

and three whitespaces issues

see attached patch

Following sentence is hard to read due long nested example

If the
+   path contradicts structure of modified jsonb for any
individual
+   value (e.g. path val['a']['b']['c'] assumes keys
+   'a' and 'b' have object values
+   assigned to them, but if val['a'] or
+   val['b'] is null, a string, or a number, then the
path
+   contradicts with the existing structure), an error is raised even if
other
+   values do conform.

It can be divided into two sentences - predicate, and example.

Regards

Pavel
commit 7ce4fbe2620a5d8efdff963b2368c3d0fd904c59
Author: ok...@github.com 
Date:   Tue Jan 19 19:37:02 2021 +0100

fix whitespaces

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 924762e128..4e19fe4fb8 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -613,6 +613,7 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu
e.g. in case of arrays it is a 0-based operation or that negative integers
that appear in path count from the end of JSON arrays.
The result of subscripting expressions is always jsonb data type.
+  
 
   
UPDATE statements may use subscripting in the
diff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.c
index 306c37b5a6..64979f3a5b 100644
--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -342,8 +342,8 @@ jsonb_subscript_fetch_old(ExprState *state,
 	{
 		Jsonb	*jsonbSource = DatumGetJsonbP(*op->resvalue);
 		sbsrefstate->prevvalue = jsonb_get_element(jsonbSource,
-	  			   sbsrefstate->upperindex,
-	  			   sbsrefstate->numupper,
+   sbsrefstate->upperindex,
+   sbsrefstate->numupper,
    >prevnull,
    false);
 	}
@@ -366,7 +366,7 @@ jsonb_exec_setup(const SubscriptingRef *sbsref,
 
 	/* Allocate type-specific workspace with space for per-subscript data */
 	workspace = palloc0(MAXALIGN(sizeof(JsonbSubWorkspace)) +
-	nupper * (sizeof(Datum) + sizeof(Oid)));
+		nupper * (sizeof(Datum) + sizeof(Oid)));
 	workspace->expectArray = false;
 	ptr = ((char *) workspace) + MAXALIGN(sizeof(JsonbSubWorkspace));
 	workspace->indexOid = (Oid *) ptr;
@@ -379,7 +379,7 @@ jsonb_exec_setup(const SubscriptingRef *sbsref,
 	foreach(lc, sbsref->refupperindexpr)
 	{
 		Node   *expr = lfirst(lc);
-		int 	i = foreach_current_index(lc);
+		int		i = foreach_current_index(lc);
 
 		workspace->indexOid[i] = exprType(expr);
 	}


Re: Rethinking plpgsql's assignment implementation

2021-01-19 Thread Pavel Stehule
Hi

Now, I am testing subscribing on the jsonb feature, and I found one issue,
that is not supported by parser.

When the target is scalar, then all is ok. But we can have a plpgsql array
of jsonb values.

postgres=# do $$
declare j jsonb[];
begin
  j[1] = '{"b":"Ahoj"}';
  raise notice '%', j;
  raise notice '%', (j[1])['b'];
end
$$;
NOTICE:  {"{\"b\": \"Ahoj\"}"}
NOTICE:  "Ahoj"
DO

Parenthesis work well in expressions, but are not supported on the left
side of assignment.

postgres=# do $$
declare j jsonb[];
begin
  (j[1])['b'] = '"Ahoj"';
  raise notice '%', j;
  raise notice '%', j[1]['b'];
end
$$;
ERROR:  syntax error at or near "("
LINE 4:   (j[1])['b'] = '"Ahoj"';
  ^

Regards

Pavel


Re: Add primary keys to system catalogs

2021-01-19 Thread Mark Rofail
Dear Joel,

I would love to start working on this again, I tried to revive the patch
again in 2019, however, I faced the same problems as last time (detailed in
the thread) and I didn't get the support I needed to continue with this
patch.

Are you willing to help rebase and finally publish this patch?

Best Regards,
Mark Rofail

On Tue, 19 Jan 2021 at 2:22 PM Joel Jacobson  wrote:

> On Mon, Jan 18, 2021, at 18:23, Tom Lane wrote:
> > I realized that there's a stronger roadblock for
> > treating catalog interrelationships as SQL foreign keys.  Namely,
> > that we always represent no-reference situations with a zero OID,
> > whereas it'd have to be NULL to look like a valid foreign-key case.
>
> Another roadblock is perhaps the lack of foreign keys on arrays,
> which would be needed by the following references:
>
> SELECT * FROM oidjoins WHERE column_type ~ '(vector|\[\])$' ORDER BY 1,2,3;
>   table_name  |  column_name   | column_type | ref_table_name |
> ref_column_name
>
> --++-++-
> pg_constraint| conexclop  | oid[]   | pg_operator| oid
> pg_constraint| conffeqop  | oid[]   | pg_operator| oid
> pg_constraint| confkey| int2[]  | pg_attribute   |
> attnum
> pg_constraint| conkey | int2[]  | pg_attribute   |
> attnum
> pg_constraint| conpfeqop  | oid[]   | pg_operator| oid
> pg_constraint| conppeqop  | oid[]   | pg_operator| oid
> pg_extension | extconfig  | oid[]   | pg_class   | oid
> pg_index | indclass   | oidvector   | pg_opclass | oid
> pg_index | indcollation   | oidvector   | pg_collation   | oid
> pg_index | indkey | int2vector  | pg_attribute   |
> attnum
> pg_partitioned_table | partattrs  | int2vector  | pg_attribute   |
> attnum
> pg_partitioned_table | partclass  | oidvector   | pg_opclass | oid
> pg_partitioned_table | partcollation  | oidvector   | pg_collation   | oid
> pg_policy| polroles   | oid[]   | pg_authid  | oid
> pg_proc  | proallargtypes | oid[]   | pg_type| oid
> pg_proc  | proargtypes| oidvector   | pg_type| oid
> pg_statistic_ext | stxkeys| int2vector  | pg_attribute   |
> attnum
> pg_trigger   | tgattr | int2vector  | pg_attribute   |
> attnum
> (18 rows)
>
> I see there is an old thread with work in this area,  "Re: GSoC 2017:
> Foreign Key Arrays":
>
>
> https://www.postgresql.org/message-id/flat/76a8d3d8-a22e-3299-7c4e-6e115dbf3762%40proxel.se#a3ddc34863465ef83adbd26022cdbcc0
>
> The last message in the thread is from 2018-10-02:
> >On Tue, 2 Oct, 2018 at 05:13:26AM +0200, Michael Paquier wrote:
> >>On Sat, Aug 11, 2018 at 05:20:57AM +0200, Mark Rofail wrote:
> >> I am still having problems rebasing this patch. I can not figure it out
> on
> >> my own.
> >Okay, it's been a couple of months since this last email, and nothing
> >has happened, so I am marking it as returned with feedback.
> >--
> >Michael
>
> Personally, I would absolutely *love* FKs on array columns.
> I always feel shameful when adding array columns to tables,
> when the elements are PKs in some other table.
> It's convenient and often the best design,
> but it feels dirty knowing there are no FKs to help detect application
> bugs.
>
> Let's hope the current FKs-on-catalog-discussion can ignite the old
> Foreign Key Arrays thread.
>
>
> /Joel
>


Re: [PATCH v2 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread James Hilliard
On Tue, Jan 19, 2021 at 10:29 AM Tom Lane  wrote:
>
> James Hilliard  writes:
> > Fixes:
> > fd.c:3661:10: warning: 'pwritev' is only available on macOS 11.0 or newer 
> > [-Wunguarded-availability-new]
>
> It's still missing preadv, and it still has nonzero chance of breaking
> successful detection of pwritev on platforms other than yours, and it's
> still really ugly.
Setting -Werror=unguarded-availability-new should in theory always
ensure that configure checks fail if the symbol is unavailable or marked
as requiring a target newer than the MACOSX_DEPLOYMENT_TARGET.
>
> But the main reason I don't want to go this way is that I don't think
> it'll stop with preadv/pwritev.  If we make it our job to build
> successfully even when using the wrong SDK version for the target
> platform, we're going to be in for more and more pain with other
> kernel APIs.
This issue really has nothing to do with the SDK version at all, it's the
MACOSX_DEPLOYMENT_TARGET that matters which must be taken
into account during configure in some way, this is what my patch does
by triggering the pwritev compile test error by setting
-Werror=unguarded-availability-new.

It's expected that MACOSX_DEPLOYMENT_TARGET=10.15 with a
MacOSX11.1.sdk will produce a binary that can run on OSX 10.15.

The MacOSX11.1.sdk is not the wrong SDK for a 10.15 target and
is fully capable of producing 10.15 compatible binaries.
>
> We could, of course, do what Apple wants us to do and try to build
> executables that work across versions.  I do not intend to put up
> with the sort of invasive, error-prone source-code-level runtime test
> they recommend ... but given that there is weak linking involved here,
> I wonder if there is a way to silently sub in src/port/pwritev.c
> when executing on a pre-11 macOS, by dint of marking it a weak
> symbol?
The check I added is strictly a compile time check still, not runtime.

I also don't think this is a weak symbol.

>From the header file it is not have __attribute__((weak_import)):
ssize_t pwritev(int, const struct iovec *, int, off_t)
__DARWIN_NOCANCEL(pwritev) __API_AVAILABLE(macos(11.0), ios(14.0),
watchos(7.0), tvos(14.0));
>
> regards, tom lane




Re: simplifying foreign key/RI checks

2021-01-19 Thread Corey Huinker
>
> I decided not to deviate from pk_ terminology so that the new code
> doesn't look too different from the other code in the file.  Although,
> I guess we can at least call the main function
> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
> changed that.
>

I agree with leaving the existing terminology where it is for this patch.
Changing the function name is probably enough to alert the reader that the
things that are called pks may not be precisely that.


Re: Printing backtrace of postgres processes

2021-01-19 Thread Tom Lane
Robert Haas  writes:
> On Sat, Jan 16, 2021 at 3:21 PM Tom Lane  wrote:
>> (Personally, I think this whole patch fails the safety-vs-usefulness
>> tradeoff, but I expect I'll get shouted down.)

> What do you see as the main safety risks here?

The thing that is scaring me the most is the "broadcast" aspect.
For starters, I think that that is going to be useless in the
field because of the likelihood that different backends' stack
traces will get interleaved in whatever log file the traces are
going to.  Also, having hundreds of processes spitting dozens of
lines to the same place at the same time is going to expose any
little weaknesses in your logging arrangements, such as rsyslog's
tendency to drop messages under load.

I think it's got security hazards as well.  If we restricted the
feature to cause a trace of only one process at a time, and required
that process to be logged in as the same database user as the
requestor (or at least someone with the privs of that user, such
as a superuser), that'd be less of an issue.

Beyond that, well, maybe it's all right.  In theory anyplace that
there's a CHECK_FOR_INTERRUPTS should be okay to call elog from;
but it's completely untested whether we can do that and then
continue, as opposed to aborting the transaction or whole session.
I share your estimate that there'll be small-fraction-of-a-percent
hazards that could still add up to dangerous instability if people
go wild with this.

regards, tom lane




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread James Hilliard
On Tue, Jan 19, 2021 at 10:17 AM Tom Lane  wrote:
>
> James Hilliard  writes:
> > On Tue, Jan 19, 2021 at 8:57 AM Tom Lane  wrote:
> >> It worked for me and for Sergey, so we need to figure out what's different
> >> about your setup.  What do you get from "xcrun --show-sdk-path" and
> >> "xcrun --sdk macosx --show-sdk-path"?  What have you got under
> >> /Library/Developer/CommandLineTools/SDKs ?
>
> > $ xcrun --show-sdk-path
> > /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
> > $ xcrun --sdk macosx --show-sdk-path
> > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
> > $ ls -laht /Library/Developer/CommandLineTools/SDKs
> > total 0
> > drwxr-xr-x  5 root  wheel   160B Jan 14  2020 .
> > drwxr-xr-x  8 root  wheel   256B Jan 14  2020 MacOSX10.15.sdk
> > drwxr-xr-x  7 root  wheel   224B Jan 14  2020 MacOSX10.14.sdk
> > lrwxr-xr-x  1 root  wheel15B Jan 14  2020 MacOSX.sdk -> MacOSX10.15.sdk
>
> Ah, got it.  So "xcrun --show-sdk-path" tells us the right thing (that
> is, it *does* give us a symlink to a 10.15 SDK) but by refusing to
> believe we've got the right thing, we end up picking MacOSX11.1.sdk.
> Drat.  I suppose we could drop the heuristic about wanting a version
> number in the SDK path, but I really don't want to do that.  Now I'm
> thinking about trying to dereference the symlink after the first step.
The MacOSX11.1.sdk can build for a 10.15 target just fine when passed
an appropriate MACOSX_DEPLOYMENT_TARGET, so that SDK should be
fine.
>
> BTW, it's curious that you get a reference to the MacOSX.sdk symlink
> where both Sergey and I got references to the actual directory.
> Do you happen to recall the order in which you installed/upgraded
> Xcode and its command line tools?
I generally just upgrade to the latest as it becomes available.
>
> >> I don't think I believe that argument.  As a counterexample, supposing
> >> that somebody were intentionally cross-compiling on an older OSX platform
> >> but using a newer SDK, shouldn't they get an executable suited to the
> >> SDK's target version?
>
> > Yep, that's exactly what this should fix:
> > MACOSX_DEPLOYMENT_TARGET=11.0 ./configure
> > checking for pwritev... yes
> > Which fails at runtime on 10.15:
>
> Well yeah, exactly.  It should fail at run-time, because you
> cross-compiled an executable that's not built for the machine
> you're on.  What we need is to prevent configure from setting up
> a cross-compile situation by default.
The toolchain already selects the correct deployment target by default, the
issue is just that the configure test for pwritev was being done in a way that
ignored the deployment target version, I fixed that.
>
> regards, tom lane




Re: New Table Access Methods for Multi and Single Inserts

2021-01-19 Thread Jeff Davis
On Mon, 2021-01-18 at 08:58 +0100, Luc Vlaming wrote:
> You mean how it could because of that the table modification API
> uses 
> the table_tuple_insert_speculative ? Just wondering if you think if
> it 
> generally cannot work or would like to see that path / more paths 
> integrated in to the patch.

I think the patch should support INSERT INTO ... SELECT, and it will be
easier to tell if we have the right API when that's integrated.

Regards,
Jeff Davis






Re: Printing backtrace of postgres processes

2021-01-19 Thread Robert Haas
On Sat, Jan 16, 2021 at 3:21 PM Tom Lane  wrote:
> I'd argue that backtraces for those processes aren't really essential,
> and indeed that trying to make the syslogger report its own backtrace
> is damn dangerous.

I agree. Ideally I'd like to be able to use the same mechanism
everywhere and include those processes too, but surely regular
backends and parallel workers are going to be the things that come up
most often.

> (Personally, I think this whole patch fails the safety-vs-usefulness
> tradeoff, but I expect I'll get shouted down.)

You and I are frequently on opposite sides of these kinds of
questions, but I think this is a closer call than many cases. I'm
convinced that it's useful, but I'm not sure whether it's safe. On the
usefulness side, backtraces are often the only way to troubleshoot
problems that occur on production systems. I wish we had better
logging and tracing tools instead of having to ask for this sort of
thing, but we don't. EDB support today frequently asks customers to
attach gdb and take a backtrace that way, and that has risks which
this implementation does not: for example, suppose you were unlucky
enough to attach during a spinlock protected critical section, and
suppose you didn't continue the stopped process before the 60 second
timeout expired and some other process caused a PANIC. Even if this
implementation were to end up emitting a backtrace with a spinlock
held, it would remove the risk of leaving the process stopped while
holding a critical lock, and would in that sense be safer. However, as
soon as you make something like this accessible via an SQL callable
function, some people are going to start spamming it. And, as soon as
they do that, any risks inherent in the implementation are multiplied.
If it carries an 0.01% chance of crashing the system, we'll have
people taking production systems down with this all the time. At that
point I wouldn't want the feature, even if the gdb approach had the
same risk (which I don't think it does).

What do you see as the main safety risks here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH v2 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread Tom Lane
James Hilliard  writes:
> Fixes:
> fd.c:3661:10: warning: 'pwritev' is only available on macOS 11.0 or newer 
> [-Wunguarded-availability-new]

It's still missing preadv, and it still has nonzero chance of breaking
successful detection of pwritev on platforms other than yours, and it's
still really ugly.

But the main reason I don't want to go this way is that I don't think
it'll stop with preadv/pwritev.  If we make it our job to build
successfully even when using the wrong SDK version for the target
platform, we're going to be in for more and more pain with other
kernel APIs.

We could, of course, do what Apple wants us to do and try to build
executables that work across versions.  I do not intend to put up
with the sort of invasive, error-prone source-code-level runtime test
they recommend ... but given that there is weak linking involved here,
I wonder if there is a way to silently sub in src/port/pwritev.c
when executing on a pre-11 macOS, by dint of marking it a weak
symbol?

regards, tom lane




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread Tom Lane
James Hilliard  writes:
> On Tue, Jan 19, 2021 at 8:57 AM Tom Lane  wrote:
>> It worked for me and for Sergey, so we need to figure out what's different
>> about your setup.  What do you get from "xcrun --show-sdk-path" and
>> "xcrun --sdk macosx --show-sdk-path"?  What have you got under
>> /Library/Developer/CommandLineTools/SDKs ?

> $ xcrun --show-sdk-path
> /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
> $ xcrun --sdk macosx --show-sdk-path
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
> $ ls -laht /Library/Developer/CommandLineTools/SDKs
> total 0
> drwxr-xr-x  5 root  wheel   160B Jan 14  2020 .
> drwxr-xr-x  8 root  wheel   256B Jan 14  2020 MacOSX10.15.sdk
> drwxr-xr-x  7 root  wheel   224B Jan 14  2020 MacOSX10.14.sdk
> lrwxr-xr-x  1 root  wheel15B Jan 14  2020 MacOSX.sdk -> MacOSX10.15.sdk

Ah, got it.  So "xcrun --show-sdk-path" tells us the right thing (that
is, it *does* give us a symlink to a 10.15 SDK) but by refusing to
believe we've got the right thing, we end up picking MacOSX11.1.sdk.
Drat.  I suppose we could drop the heuristic about wanting a version
number in the SDK path, but I really don't want to do that.  Now I'm
thinking about trying to dereference the symlink after the first step.

BTW, it's curious that you get a reference to the MacOSX.sdk symlink
where both Sergey and I got references to the actual directory.
Do you happen to recall the order in which you installed/upgraded
Xcode and its command line tools?

>> I don't think I believe that argument.  As a counterexample, supposing
>> that somebody were intentionally cross-compiling on an older OSX platform
>> but using a newer SDK, shouldn't they get an executable suited to the
>> SDK's target version?

> Yep, that's exactly what this should fix:
> MACOSX_DEPLOYMENT_TARGET=11.0 ./configure
> checking for pwritev... yes
> Which fails at runtime on 10.15:

Well yeah, exactly.  It should fail at run-time, because you
cross-compiled an executable that's not built for the machine
you're on.  What we need is to prevent configure from setting up
a cross-compile situation by default.

regards, tom lane




Re: psql \df choose functions by their arguments

2021-01-19 Thread Greg Sabino Mullane
Ha ha ha, my bad, I am not sure why I left those out. Here is a new patch
with int2, int4, and int8. Thanks for the email.

Cheers,
Greg


v6-psql-df-pick-function-by-type.patch
Description: Binary data


[PATCH v2 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread James Hilliard
Fixes:
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 
-I../../../../src/include  -isysroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
-c -o fd.o fd.c
fd.c:3661:10: warning: 'pwritev' is only available on macOS 11.0 or newer 
[-Wunguarded-availability-new]
part = pg_pwritev(fd, iov, iovcnt, offset);
   ^~
../../../../src/include/port/pg_iovec.h:49:20: note: expanded from macro 
'pg_pwritev'
   ^~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include/sys/uio.h:104:9:
 note: 'pwritev' has been marked as being introduced in macOS 11.0
  here, but the deployment target is macOS 10.15.0
ssize_t pwritev(int, const struct iovec *, int, off_t) 
__DARWIN_NOCANCEL(pwritev) __API_AVAILABLE(macos(11.0), ios(14.0), 
watchos(7.0), tvos(14.0));
^
fd.c:3661:10: note: enclose 'pwritev' in a __builtin_available check to silence 
this warning
part = pg_pwritev(fd, iov, iovcnt, offset);
   ^~
../../../../src/include/port/pg_iovec.h:49:20: note: expanded from macro 
'pg_pwritev'
   ^~~
1 warning generated.

This results in a runtime error:
running bootstrap script ... dyld: lazy symbol binding failed: Symbol not 
found: _pwritev
  Referenced from: /usr/local/pgsql/bin/postgres
  Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: _pwritev
  Referenced from: /usr/local/pgsql/bin/postgres
  Expected in: /usr/lib/libSystem.B.dylib

child process was terminated by signal 6: Abort trap: 6

To fix this we set -Werror=unguarded-availability-new so that a compile
test for pwritev will fail if the symbol is unavailable on the requested
SDK version.
---
Changes v1 -> v2:
  - Add AC_LIBOBJ(pwritev) when pwritev not available
  - set -Werror=unguarded-availability-new for CXX flags as well
---
 configure| 145 ++-
 configure.ac |  21 +++-
 2 files changed, 152 insertions(+), 14 deletions(-)

diff --git a/configure b/configure
index 8af4b99021..662b0ae9ce 100755
--- a/configure
+++ b/configure
@@ -5373,6 +5373,98 @@ if test x"$pgac_cv_prog_CC_cflags__Werror_vla" = x"yes"; 
then
 fi
 
 
+  # Prevent usage of symbols marked as newer than our target.
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports 
-Werror=unguarded-availability-new, for CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports 
-Werror=unguarded-availability-new, for CFLAGS... " >&6; }
+if ${pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${CFLAGS} -Werror=unguarded-availability-new"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new=yes
+else
+  pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" = 
x"yes"; then
+  CFLAGS="${CFLAGS} -Werror=unguarded-availability-new"
+fi
+
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports 
-Werror=unguarded-availability-new, for CXXFLAGS" >&5
+$as_echo_n "checking whether ${CXX} supports 
-Werror=unguarded-availability-new, for CXXFLAGS... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__Werror_unguarded_availability_new+:} false; 
then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CXXFLAGS=$CXXFLAGS
+pgac_save_CXX=$CXX
+CXX=${CXX}
+CXXFLAGS="${CXXFLAGS} -Werror=unguarded-availability-new"
+ac_save_cxx_werror_flag=$ac_cxx_werror_flag
+ac_cxx_werror_flag=yes
+ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS 
conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  pgac_cv_prog_CXX_cxxflags__Werror_unguarded_availability_new=yes
+else
+  pgac_cv_prog_CXX_cxxflags__Werror_unguarded_availability_new=no
+fi
+rm -f core 

Re: search_plan_tree(): handling of non-leaf CustomScanState nodes causes segfault

2021-01-19 Thread Tom Lane
I wrote:
> Now that I look at this, I strongly wonder whether whoever added
> MergeAppend support here understood what they were doing.  That
> looks broken, because child nodes will typically be positioned on
> tuples, whether or not the current top-level output came from them.
> So I fear we could get a false-positive confirmation that some
> tuple matches WHERE CURRENT OF.

Urgh, indeed it's buggy.  With the attached test script I get

...
BEGIN
DECLARE CURSOR
 f1 | f2  
+-
  1 | one
(1 row)

UPDATE 1
UPDATE 1
UPDATE 1
COMMIT
 f1 | f2  
+-
  1 | one updated
(1 row)

 f1 | f2  
+-
  2 | two updated
(1 row)

 f1 |  f2   
+---
  3 | three updated
(1 row)

where clearly only the row with f1=1 should have updated
(and if you leave off ORDER BY, so as to get a Merge not
MergeAppend plan, indeed only that row updates).

I shall go fix this, and try to improve the evidently-inadequate
comments in search_plan_tree.

regards, tom lane

drop table if exists t1,t2,t3;

create table t1 (f1 int unique, f2 text);
insert into t1 values (1, 'one');
create table t2 (f1 int unique, f2 text);
insert into t2 values (2, 'two');
create table t3 (f1 int unique, f2 text);
insert into t3 values (3, 'three');

explain
select * from t1 union all select * from t2 union all select * from t3
order by 1;

begin;

declare c cursor for
select * from t1 union all select * from t2 union all select * from t3
order by 1;

fetch 1 from c;

update t1 set f2 = f2 || ' updated' where current of c;
update t2 set f2 = f2 || ' updated' where current of c;
update t3 set f2 = f2 || ' updated' where current of c;

commit;

table t1;
table t2;
table t3;


Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread James Hilliard
On Tue, Jan 19, 2021 at 8:57 AM Tom Lane  wrote:
>
> James Hilliard  writes:
> > On Tue, Jan 19, 2021 at 8:27 AM Tom Lane  wrote:
> >> We already dealt with that by not selecting an SDK newer than the
> >> underlying OS (see 4823621db).
>
> > Tried that, doesn't work, not even sure how it could possibly fix this
> > issue at all,
>
> It worked for me and for Sergey, so we need to figure out what's different
> about your setup.  What do you get from "xcrun --show-sdk-path" and
> "xcrun --sdk macosx --show-sdk-path"?  What have you got under
> /Library/Developer/CommandLineTools/SDKs ?
$ xcrun --show-sdk-path
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
$ xcrun --sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
$ ls -laht /Library/Developer/CommandLineTools/SDKs
total 0
drwxr-xr-x  5 root  wheel   160B Jan 14  2020 .
drwxr-xr-x  8 root  wheel   256B Jan 14  2020 MacOSX10.15.sdk
drwxr-xr-x  7 root  wheel   224B Jan 14  2020 MacOSX10.14.sdk
lrwxr-xr-x  1 root  wheel15B Jan 14  2020 MacOSX.sdk -> MacOSX10.15.sdk
>
> > this can not be fixed properly by selecting a specific SDK version
> > alone, it's the symbols valid for a specific target deployment version
> > that matters here.
>
> I don't think I believe that argument.  As a counterexample, supposing
> that somebody were intentionally cross-compiling on an older OSX platform
> but using a newer SDK, shouldn't they get an executable suited to the
> SDK's target version?
Yep, that's exactly what this should fix:

MACOSX_DEPLOYMENT_TARGET=11.0 ./configure
checking for pwritev... yes

Which fails at runtime on 10.15:
dyld: lazy symbol binding failed: Symbol not found: _pwritev
  Referenced from: /usr/local/pgsql/bin/postgres (which was built for
Mac OS X 11.0)
  Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: _pwritev
  Referenced from: /usr/local/pgsql/bin/postgres (which was built for
Mac OS X 11.0)
  Expected in: /usr/lib/libSystem.B.dylib

child process was terminated by signal 6: Abort trap: 6

MACOSX_DEPLOYMENT_TARGET=10.15 ./configure
checking for pwritev... no

Noticed a couple small issues, I'll send a v2.
>
> (I realize that Apple thinks we ought to handle that through run-time
> not compile-time adaptation, but I have no interest in going there.)
>
> regards, tom lane




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-19 Thread Julien Rouhaud
On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud  wrote:
>
> v15 that fixes recent conflicts.

Rebase only, thanks to the cfbot!  V16 attached.
From a0388c53d9755cfd706513f7f02a08b31a48aacb Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 18 Mar 2019 18:55:50 +0100
Subject: [PATCH v16 2/3] Expose queryid in pg_stat_activity and
 log_line_prefix

Similarly to other fields in pg_stat_activity, only the queryid from the top
level statements are exposed, and if the backends status isn't active then the
queryid from the last executed statements is displayed.

Also add a %Q placeholder to include the queryid in the log_line_prefix, which
will also only expose top level statements.

Author: Julien Rouhaud
Reviewed-by: Evgeny Efimkin, Michael Paquier, Yamada Tatsuro, Atsushi Torikoshi
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 112 +++---
 doc/src/sgml/config.sgml  |  29 +++--
 doc/src/sgml/monitoring.sgml  |  16 +++
 src/backend/catalog/system_views.sql  |   1 +
 src/backend/executor/execMain.c   |   8 ++
 src/backend/executor/execParallel.c   |  14 ++-
 src/backend/executor/nodeGather.c |   3 +-
 src/backend/executor/nodeGatherMerge.c|   4 +-
 src/backend/parser/analyze.c  |   5 +
 src/backend/postmaster/pgstat.c   |  65 ++
 src/backend/tcop/postgres.c   |   5 +
 src/backend/utils/adt/pgstatfuncs.c   |   7 +-
 src/backend/utils/error/elog.c|   9 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  |  29 +++--
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/executor/execParallel.h   |   3 +-
 src/include/pgstat.h  |   5 +
 src/include/utils/queryjumble.h   |   2 +-
 src/test/regress/expected/rules.out   |   9 +-
 20 files changed, 223 insertions(+), 110 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 3db4fa2f7a..ce166f417e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -65,6 +65,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/queryjumble.h"
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
@@ -99,6 +100,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
+/*
+ * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
+ * ignores.
+ */
+#define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
+	!IsA(n, PrepareStmt) && \
+	!IsA(n, DeallocateStmt))
+
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -307,7 +316,6 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
-static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
 	   pgssStoreKind kind,
@@ -804,16 +812,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		return;
 
 	/*
-	 * Utility statements get queryId zero.  We do this even in cases where
-	 * the statement contains an optimizable statement for which a queryId
-	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
-	 * runtime control will first go through ProcessUtility and then the
-	 * executor, and we don't want the executor hooks to do anything, since we
-	 * are already measuring the statement's costs at the utility level.
+	 * Clear queryId for prepared statements related utility, as those will
+	 * inherit from the underlying statement's one (except DEALLOCATE which is
+	 * entirely untracked).
 	 */
 	if (query->utilityStmt)
 	{
-		query->queryId = UINT64CONST(0);
+		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+			query->queryId = UINT64CONST(0);
 		return;
 	}
 
@@ -1055,6 +1061,23 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	DestReceiver *dest, QueryCompletion *qc)
 {
 	Node	   *parsetree = pstmt->utilityStmt;
+	uint64		saved_queryId = pstmt->queryId;
+
+	/*
+	 * Force utility statements to get queryId zero.  We do this even in cases
+	 * where the statement contains an optimizable statement for which a
+	 * queryId could be derived (such as EXPLAIN or DECLARE CURSOR).  For such
+	 * cases, runtime control will first go through ProcessUtility and then 

Re: pg_class.reltype -> pg_type.oid missing for pg_toast table

2021-01-19 Thread Tom Lane
"Joel Jacobson"  writes:
> When copying all tables in pg_catalog, to a separate schema with the purpose
> of testing if foreign keys could be added for all oid columns, I got an error 
> for a toast table:
> ERROR:  insert or update on table "pg_class" violates foreign key constraint 
> "pg_class_reltype_fkey"
> DETAIL:  Key (reltype)=(86987582) is not present in table "pg_type".

I'm too lazy to check the code right now, but my recollection is that we
do not bother to make composite-type entries for toast tables.  However,
they should have reltype = 0 if so, so I'm not quite sure where the
above failure is coming from.

regards, tom lane




Re: popcount

2021-01-19 Thread Isaac Morland
On Tue, 19 Jan 2021 at 11:38, David Fetter  wrote:

> You bring up an excellent point, which is that our builtin functions
> could use a lot more documentation directly to hand than they now
> have.  For example, there's a lot of needless ambiguity created by
> function comments which leave it up in the air as to which positional
> argument does what in functions like string_to_array, which take
> multiple arguments. I'll try to get a patch in for the next CF with a
> fix for that, and a separate one that doesn't put it on people to use
> \df+ to find the comments we do provide. There have been proposals for
> including an optional space for COMMENT ON in DDL, but I suspect that
> those won't fly unless and until they make their way into the
> standard.
>

Since you mention \df+, I wonder if this is the time to consider removing
the source code from \df+ (since we have \sf) and adding in the proacl
instead?


Re: popcount

2021-01-19 Thread David Fetter
On Tue, Jan 19, 2021 at 07:58:12AM -0500, Robert Haas wrote:
> On Tue, Jan 19, 2021 at 3:06 AM Peter Eisentraut
>  wrote:
> > On 2021-01-18 16:34, Tom Lane wrote:
> > > Peter Eisentraut  writes:
> > >> [ assorted nits ]
> > >
> > > At the level of bikeshedding ... I quite dislike using the name "popcount"
> > > for these functions.  I'm aware that some C compilers provide primitives
> > > of that name, but I wouldn't expect a SQL programmer to know that;
> > > without that context the name seems pretty random and unintuitive.
> > > Moreover, it invites confusion with SQL's use of "pop" to abbreviate
> > > "population" in the statistical aggregates, such as var_pop().
> >
> > I was thinking about that too, but according to
> > , popcount is an accepted
> > high-level term, with "pop" also standing for "population".
> 
> Yeah, I am not sure that it's going to be good to invent our own
> name for this, although maybe. But at least I think we should make
> sure there are some good comments in an easily discoverable place.
> Some people seem to think every programmer in the universe should
> know what things like popcount() and fls() and ffs() and stuff like
> that are, but it's far from obvious and I often have to refresh my
> memory.  Let's make it easy for someone to figure out, if they don't
> know already.

I went with count_set_bits() for the next version, and I hope that
helps clarify what it does.

> Like just a comment that says "this returns the number of 1 bits in
> the integer supplied as an argument" or something can save somebody
> a lot of trouble.

You bring up an excellent point, which is that our builtin functions
could use a lot more documentation directly to hand than they now
have.  For example, there's a lot of needless ambiguity created by
function comments which leave it up in the air as to which positional
argument does what in functions like string_to_array, which take
multiple arguments. I'll try to get a patch in for the next CF with a
fix for that, and a separate one that doesn't put it on people to use
\df+ to find the comments we do provide. There have been proposals for
including an optional space for COMMENT ON in DDL, but I suspect that
those won't fly unless and until they make their way into the
standard.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




pg_class.reltype -> pg_type.oid missing for pg_toast table

2021-01-19 Thread Joel Jacobson
I have a memory of the catalog not being MVCC,
so maybe this is normal and expected,
but I wanted to report it in case it's not.

When copying all tables in pg_catalog, to a separate schema with the purpose
of testing if foreign keys could be added for all oid columns, I got an error 
for a toast table:

ERROR:  insert or update on table "pg_class" violates foreign key constraint 
"pg_class_reltype_fkey"
DETAIL:  Key (reltype)=(86987582) is not present in table "pg_type".
CONTEXT:  SQL statement "
ALTER TABLE catalog_fks.pg_class ADD FOREIGN KEY (reltype) REFERENCES 
catalog_fks.pg_type (oid)
  "

The copies of pg_catalog were executed in one and the same transaction,
but as separate queries in a PL/pgSQL function using EXECUTE.

/Joel

Re: Stronger safeguard for archive recovery not to miss data

2021-01-19 Thread Laurenz Albe
On Mon, 2021-01-18 at 07:34 +, osumi.takami...@fujitsu.com wrote:
> I noticed that this message should cover both archive recovery modes,
> which means in recovery mode and standby mode. Then, I combined your
> suggestion above with this point of view. Have a look at the updated patch.
> I also enriched the new tap tests to show this perspective.

Looks good, thanks.

I'll mark this patch as "ready for committer".

Yours,
Laurenz Albe





Re: Is Recovery actually paused?

2021-01-19 Thread Dilip Kumar
On Tue, Jan 19, 2021 at 8:34 AM Dilip Kumar  wrote:
>
> On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA  wrote:
>>
>> On Sun, 17 Jan 2021 11:33:52 +0530
>> Dilip Kumar  wrote:
>>
>> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA  wrote:
>> > >
>> > > On Wed, 13 Jan 2021 17:49:43 +0530
>> > > Dilip Kumar  wrote:
>> > >
>> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  
>> > > > wrote:
>> > > > >
>> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  
>> > > > > wrote:
>> > > > > >
>> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530
>> > > > > > Dilip Kumar  wrote:
>> > > > > >
>> > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused 
>> > > > > > > > > to wait.
>> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will 
>> > > > > > > > > be blocked forever,
>> > > > > > > > > although this setting may not be usual. In addition, some 
>> > > > > > > > > users may set
>> > > > > > > > > recovery_min_apply_delay for a large.  If such users call 
>> > > > > > > > > pg_is_wal_replay_paused,
>> > > > > > > > > it could wait for a long time.
>> > > > > > > > >
>> > > > > > > > > At least, I think we need some descriptions on document to 
>> > > > > > > > > explain
>> > > > > > > > > pg_is_wal_replay_paused could wait while a time.
>> > > > > > > >
>> > > > > > > > Ok
>> > > > > > >
>> > > > > > > Fixed this, added some comments in .sgml as well as in function 
>> > > > > > > header
>> > > > > >
>> > > > > > Thank you for fixing this.
>> > > > > >
>> > > > > > Also, is it better to fix the description of pg_wal_replay_pause 
>> > > > > > from
>> > > > > > "Pauses recovery." to "Request to pause recovery." in according 
>> > > > > > with
>> > > > > > pg_is_wal_replay_paused?
>> > > > >
>> > > > > Okay
>> > > > >
>> > > > > >
>> > > > > > > > > Also, how about adding a new boolean argument to 
>> > > > > > > > > pg_is_wal_replay_paused to
>> > > > > > > > > control whether this waits for recovery to get paused or 
>> > > > > > > > > not? By setting its
>> > > > > > > > > default value to true or false, users can use the old format 
>> > > > > > > > > for calling this
>> > > > > > > > > and the backward compatibility can be maintained.
>> > > > > > > >
>> > > > > > > > So basically, if the wait_recovery_pause flag is false then we 
>> > > > > > > > will
>> > > > > > > > immediately return true if the pause is requested?  I agree 
>> > > > > > > > that it is
>> > > > > > > > good to have an API to know whether the recovery pause is 
>> > > > > > > > requested or
>> > > > > > > > not but I am not sure is it good idea to make this API serve 
>> > > > > > > > both the
>> > > > > > > > purpose?  Anyone else have any thoughts on this?
>> > > > > > > >
>> > > > > >
>> > > > > > I think the current pg_is_wal_replay_paused() already has another 
>> > > > > > purpose;
>> > > > > > this waits recovery to actually get paused. If we want to limit 
>> > > > > > this API's
>> > > > > > purpose only to return the pause state, it seems better to fix 
>> > > > > > this to return
>> > > > > > the actual state at the cost of lacking the backward 
>> > > > > > compatibility. If we want
>> > > > > > to know whether pause is requested, we may add a new API like
>> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
>> > > > > > recovery to actually
>> > > > > > get paused, we may add an option to pg_wal_replay_pause() for this 
>> > > > > > purpose.
>> > > > > >
>> > > > > > However, this might be a bikeshedding. If anyone don't care that
>> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I 
>> > > > > > don't care either.
>> > > > >
>> > > > > I don't think that it will be blocked ever, because
>> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
>> > > > > recovery process will not be stuck on waiting for the WAL.
>> > >
>> > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck 
>> > > during resolving
>> > > a recovery conflict. The process could wait for 
>> > > max_standby_streaming_delay or
>> > > max_standby_archive_delay at most before recovery get completely paused.
>> >
>> > Okay, I agree that it is possible so for handling this we have a
>> > couple of options
>> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to
>> > actually get paused, but user have an option to cancel that.  So I
>> > agree that there is currently no option to just know that recovery
>> > pause is requested without waiting for its actually get paused if it
>> > is requested.  So one option is we can provide an another interface as
>> > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just
>> > return the request status.  I am not sure how useful it is.
>>
>> If it is acceptable that pg_is_wal_replay_paused() makes users wait,
>> I'm ok for the current interface. I don't feel the need of
>> pg_is_wal_replay_paluse_requeseted().
>>
>> >
>> > 2. Pass an option to pg_is_wal_replay_paused whether to 

Re: POC: postgres_fdw insert batching

2021-01-19 Thread Tomas Vondra




On 1/19/21 7:23 AM, Amit Langote wrote:

On Tue, Jan 19, 2021 at 2:06 PM tsunakawa.ta...@fujitsu.com
 wrote:

From: Amit Langote 

I apologize in advance for being maybe overly pedantic, but I noticed
that, in ExecInitModifyTable(), you decided to place the call outside
the loop that goes over resultRelations (shown below), although my
intent was to ask to place it next to the BeginForeignModify() in that
loop.


Actually, I tried to do it (adding the GetModifyBatchSize() call after 
BeginForeignModify()), but it failed.  Because postgresfdwGetModifyBatchSize() 
wants to know if RETURNING is specified, and ResultRelInfo->projectReturning is 
created after the above part.  Considering the context where GetModifyBatchSize() 
implementations may want to know the environment, I placed the call as late as 
possible in the initialization phase.  As for the future(?) multi-target DML 
statements, I think we can change this together with other many(?) parts that 
assume a single target table.


Okay, sometime later then.

I wasn't sure if bringing it up here would be appropriate, but there's
a patch by me to refactor ModfiyTable result relation allocation that
will have to remember to move this code along to an appropriate place
[1].  Thanks for the tip about the dependency on how RETURNING is
handled.  I will remember it when rebasing my patch over this.



Thanks. The last version (v12) should be addressing all the comments and 
seems fine to me, so barring objections I'll get that pushed shortly.


One thing that seems a bit annoying is that with the partitioned table 
the explain (verbose) looks like this:


 QUERY PLAN
-
 Insert on public.batch_table
   ->  Function Scan on pg_catalog.generate_series i
 Output: i.i
 Function Call: generate_series(1, 66)
(4 rows)

That is, there's no information about the batch size :-( But AFAICS 
that's due to how explain shows (or rather does not) partitions in this 
type of plan.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread Tom Lane
James Hilliard  writes:
> On Tue, Jan 19, 2021 at 8:27 AM Tom Lane  wrote:
>> We already dealt with that by not selecting an SDK newer than the
>> underlying OS (see 4823621db).

> Tried that, doesn't work, not even sure how it could possibly fix this
> issue at all,

It worked for me and for Sergey, so we need to figure out what's different
about your setup.  What do you get from "xcrun --show-sdk-path" and
"xcrun --sdk macosx --show-sdk-path"?  What have you got under
/Library/Developer/CommandLineTools/SDKs ?

> this can not be fixed properly by selecting a specific SDK version
> alone, it's the symbols valid for a specific target deployment version
> that matters here.

I don't think I believe that argument.  As a counterexample, supposing
that somebody were intentionally cross-compiling on an older OSX platform
but using a newer SDK, shouldn't they get an executable suited to the
SDK's target version?

(I realize that Apple thinks we ought to handle that through run-time
not compile-time adaptation, but I have no interest in going there.)

regards, tom lane




Re: Use boolean array for nulls parameters

2021-01-19 Thread Tom Lane
japin  writes:
> When I review the [1], I find that the tuple's nulls array use char type.
> However there are many places use boolean array to repsent the nulls array,
> so I think we can replace the char type nulls array to boolean type.  This
> change will break the SPI_xxx API, I'm not sure whether this chagnges cause
> other problems or not.  Any thought?

We have always considered that changing the APIs of published SPI
interfaces is a non-starter.  The entire reason those calls still
exist at all is for the benefit of third-party extensions.

regards, tom lane




Re: TOAST condition for column size

2021-01-19 Thread Tom Lane
Dilip Kumar  writes:
> On Tue, 19 Jan 2021 at 6:28 PM, Amit Kapila  wrote:
>> Won't it be safe because we don't align individual attrs of type
>> varchar where length is less than equal to 127?

> Yeah right,  I just missed that point.

Yeah, the minimum on biggest_size has nothing to do with alignment
decisions.  It's just a filter to decide whether it's worth trying
to toast anything.

Having said that, I'm pretty skeptical of this patch: I think its
most likely real-world effect is going to be to waste cycles (and
create TOAST-table bloat) on the way to failing anyway.  I do not
think that toasting a 20-byte field down to 18 bytes is likely to be
a productive thing to do in typical situations.  The given example
looks like a cherry-picked edge case rather than a useful case to
worry about.

IOW, if I were asked to review whether the current minimum is
well-chosen, I'd be wondering if we should increase it not
decrease it.

regards, tom lane




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread James Hilliard
On Tue, Jan 19, 2021 at 8:27 AM Tom Lane  wrote:
>
> James Hilliard  writes:
> > Fixes:
> > gcc -Wall -Wmissing-prototypes -Wpointer-arith 
> > -Wdeclaration-after-statement -Werror=vla -Wendif-labels 
> > -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
> > -Wno-unused-command-line-argument -O2 -I../../../../src/include  -isysroot 
> > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
> > -c -o fd.o fd.c
> > fd.c:3661:10: warning: 'pwritev' is only available on macOS 11.0 or newer 
> > [-Wunguarded-availability-new]
>
> We already dealt with that by not selecting an SDK newer than the
> underlying OS (see 4823621db).
Tried that, doesn't work, not even sure how it could possibly fix this
issue at all,
this can not be fixed properly by selecting a specific SDK version
alone, it's the
symbols valid for a specific target deployment version that matters here.
> I do not believe that your proposal
> is more reliable than that approach, and it's surely uglier.  Are
> we really going to abandon Autoconf's built-in checking method every
> time Apple adds an API they should have had ten years ago?  If so,
> you forgot preadv ...
I didn't run into an issue there for some reason...but this was the cleanest fix
I could come up with that seemed to work.
>
> regards, tom lane




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-19 Thread Tom Lane
James Hilliard  writes:
> Fixes:
> gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
> -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security 
> -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 
> -I../../../../src/include  -isysroot 
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
> -c -o fd.o fd.c
> fd.c:3661:10: warning: 'pwritev' is only available on macOS 11.0 or newer 
> [-Wunguarded-availability-new]

We already dealt with that by not selecting an SDK newer than the
underlying OS (see 4823621db).  I do not believe that your proposal
is more reliable than that approach, and it's surely uglier.  Are
we really going to abandon Autoconf's built-in checking method every
time Apple adds an API they should have had ten years ago?  If so,
you forgot preadv ...

regards, tom lane




Re: search_plan_tree(): handling of non-leaf CustomScanState nodes causes segfault

2021-01-19 Thread Tom Lane
David Geier  writes:
> On 18.01.21 23:42, Tom Lane wrote:
>> OK, cool.  I was afraid you'd argue that you really needed your CustomScan
>> node to be transparent in such cases.  We could imagine inventing an
>> additional custom-scan-provider callback to embed the necessary knowledge,
>> but I'd rather not add the complexity until someone has a use-case.

> I was thinking about that. Generally, having such possibility would be 
> very useful. Unfortunately, I believe that in my specific case even 
> having such functionality wouldn't allow for the query to work with 
> CURRENT OF, because my CSP behaves similarly to a materialize node.
> My understanding is only such plans are supported which consist of nodes 
> that guarantee that the tuple returned by the plan is the unmodified 
> tuple a scan leaf node is currently positioned on?

Doesn't have to be *unmodified* --- a projection is fine, for example.
But we have to be sure that the current output tuple of the plan tree
is based on the current output tuple of the bottom-level table scan
node.  As an example of the hazards here, it's currently safe for
search_plan_tree to descend through a Limit node, but it did not use to
be, because the old implementation of Limit was such that it could return
a different tuple from the one the underlying scan node thinks it is
positioned on.

As another example, descending through Append is OK because only one
of the child scans will be positioned-on-a-tuple at all; the rest
will be at EOF or not yet started, so they can't produce a match
to whatever tuple ID the WHERE CURRENT OF is asking about.

Now that I look at this, I strongly wonder whether whoever added
MergeAppend support here understood what they were doing.  That
looks broken, because child nodes will typically be positioned on
tuples, whether or not the current top-level output came from them.
So I fear we could get a false-positive confirmation that some
tuple matches WHERE CURRENT OF.

Anyway, it seems clearly possible that some nonleaf CustomScans
would operate in a manner that would allow descending through them
while others wouldn't.  But I don't really want to write the docs
explaining what a callback for this should do ;-)

regards, tom lane




Re: Use boolean array for nulls parameters

2021-01-19 Thread Hamid Akhtar
I personally don't see any benefit in this change. The focus shouldn't be
on fixing things that aren't broken. Perhaps, there is more value in using
bitmap data type to keep track of NULL values, which is typical storage vs
performance debate, and IMHO, it's better to err on using slightly more
storage for much better performance. IIRC, the bitmap idea has previously
discussed been rejected too.

On Tue, Jan 19, 2021 at 7:07 PM japin  wrote:

>
> Hi,
>
> When I review the [1], I find that the tuple's nulls array use char type.
> However there are many places use boolean array to repsent the nulls array,
> so I think we can replace the char type nulls array to boolean type.  This
> change will break the SPI_xxx API, I'm not sure whether this chagnges cause
> other problems or not.  Any thought?
>
> [1] -
> https://www.postgresql.org/message-id/flat/ca+hiwqgkfjfydeq5vhph6eqpkjsbfpddy+j-kxyfepqedts...@mail.gmail.com
>
> --
> Regrads,
> Japin Li.
> ChengDu WenWu Information Technology Co.,Ltd.
>
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


Re: ResourceOwner refactoring

2021-01-19 Thread Heikki Linnakangas

On 18/01/2021 16:34, Alvaro Herrera wrote:

On 2021-Jan-18, Heikki Linnakangas wrote:


+static ResourceOwnerFuncs jit_funcs =
+{
+   /* relcache references */
+   .name = "LLVM JIT context",
+   .phase = RESOURCE_RELEASE_BEFORE_LOCKS,
+   .ReleaseResource = ResOwnerReleaseJitContext,
+   .PrintLeakWarning = ResOwnerPrintJitContextLeakWarning
+};


I think you mean jit_resowner_funcs here; "jit_funcs" is a bit
excessively vague.  Also, why did you choose not to define
ResourceOwnerRememberJIT?  You do that in other modules and it seems
better.


I did it in modules that had more than one ResourceOwnerRemeber/Forget 
call. Didn't seem worth it in functions like IncrTupleDescRefCount(), 
for example.


Hayato Kuroda also pointed that out, though. So perhaps it's better to 
be consistent, to avoid the confusion. I'll add the missing wrappers.


- Heikki




Re: [PATCH] More docs on what to do and not do in extension code

2021-01-19 Thread Bharath Rupireddy
On Mon, Jan 18, 2021 at 1:27 PM Craig Ringer
 wrote:
> Hi folks
>
> The attached patch expands the xfunc docs and bgworker docs a little, 
> providing a starting point for developers to learn how to do some common 
> tasks the Postgres Way.
>
> It mentions in brief these topics:
>
> * longjmp() based exception handling with elog(ERROR), PG_CATCH() and 
> PG_RE_THROW() etc
> * Latches, spinlocks, LWLocks, heavyweight locks, condition variables
> * shm, DSM, DSA, shm_mq
> * syscache, relcache, relation_open(), invalidations
> * deferred signal handling, CHECK_FOR_INTERRUPTS()
> * Resource cleanup hooks and callbacks like on_exit, before_shmem_exit, the 
> resowner callbacks, etc
> * signal handling in bgworkers
>
> All very superficial, but all things I really wish I'd known a little about, 
> or even that I needed to learn about, when I started working on postgres.
>
> I'm not sure it's in quite the right place. I wonder if there should be a 
> separate part of xfunc.sgml that covers the slightly more advanced bits of 
> postgres backend and function coding like this, lists relevant README files 
> in the source tree, etc.
>
> I avoided going into details like how resource owners work. I don't want the 
> docs to have to cover all that in detail; what I hope to do is start 
> providing people with clear references to the right place in the code, 
> READMEs, etc to look when they need to understand specific topics.

Thanks for the patch.

Here are some comments:

[1]
   background worker's main function, and must be unblocked by it; this is to
allow the process to customize its signal handlers, if necessary.
-   Signals can be unblocked in the new process by calling
-   BackgroundWorkerUnblockSignals and blocked by calling
-   BackgroundWorkerBlockSignals.
+   It is important that all background workers set up and unblock signal
+   handling before they enter their main loops. Signal handling in background
+   workers is discussed separately in .
   

IMO, we can retain the statement about BackgroundWorkerUnblockSignals
and BackgroundWorkerBlockSignals, but mention the link to
"bgworker-signals" for more details and move the statement "it's
important to unblock signals before enter their main loop" to
"bgworker-signals" section and we can also reason there the
consequences if not done.

[2]
+   interupt-aware APIs for the purpose. Do not
usleep(),
+   system(), make blocking system calls, etc.
+  

Is it "Do not use usleep(),
system() or make blocking system calls etc." ?

[3] IMO, we can remove following from "bgworker-signals" if we retain
it where currently it is, as discussed in [1].
+Signals can be unblocked in the new process by calling
+BackgroundWorkerUnblockSignals and blocked by calling
+BackgroundWorkerBlockSignals.

[4] Can we say
+The default signal handlers set up for background workers do
+default background worker signal handlers, it should call

instead of
+The default signal handlers installed for background workers do
+default background worker signal handling it should call

[5] IMO, we can have something like below
+request, etc. Set up these handlers before unblocking signals as
+shown below:

instead of
+request, etc. To install these handlers, before unblocking interrupts
+run:

[6] I think logs and errors either elog() or ereport can be used, so how about
+Use elog() or ereport() for
+logging output or raising errors instead of any direct stdio calls.

instead of
+Use elog() and ereport() for
+logging output and raising errors instead of any direct stdio calls.

[7] Can we use child processes instead of subprocess ? If okay in
other places in the patch as well.
+and should only use the main thread. PostgreSQL generally
uses child processes
+that coordinate over shared memory instead of threads - for
instance, see
+.

instead of
+and should only use the main thread. PostgreSQL generally
uses subprocesses
+that coordinate over shared memory instead of threads - see
+.

[8] Why should file descriptor manager API be used to execute
subprocesses/child processes?
+To execute subprocesses and open files, use the routines provided by
+the file descriptor manager like BasicOpenFile
+and OpenPipeStream instead of a direct

[9] "should always be"? even if it's a blocking extesion, does it
work? If our intention is to recommend the developers, maybe we should
avoid using the term "should" in the patch in other places as well.
+Extension code should always be structured as a non-blocking

[10] I think it is
+you should avoid using sleep() or
usleep()

instead of
+you should sleep() or
usleep()


[11] I think it is
+block if this happens. So cleanup of resources is not
entirely managed by PostgreSQL, it
+   must be handled using appropriate callbacks provided by PostgreSQL

instead of
+block if this 

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-19 Thread Dmitry Dolgov
> On Thu, Jan 14, 2021 at 12:02:42PM -0500, Dian M Fay wrote:
> > Other than that, since I've already posted the patch for returning an
> > error option, it seems that the only thing left is to decide with which
> > version to go.
>
> The trigger issue (which I did verify) makes the "no update" option
> unworkable imo, JavaScript's behavior notwithstanding. But it should be
> called out very clearly in the documentation, since it does depart from
> what people more familiar with that behavior may expect. Here's a quick
> draft, based on your v44 patch:
>
> 
>  jsonb data type supports array-style subscripting expressions
>  to extract or update particular elements. It's possible to use multiple
>  subscripting expressions to extract nested values. In this case, a chain of
>  subscripting expressions follows the same rules as the
>  path argument in jsonb_set function,
>  e.g. in case of arrays it is a 0-based operation or that negative integers
>  that appear in path count from the end of JSON arrays.
>  The result of subscripting expressions is always of the jsonb data type.
> 
> 
>  UPDATE statements may use subscripting in the
>  SET clause to modify jsonb values. Every
>  affected value must conform to the path defined by the subscript(s). If the
>  path cannot be followed to its end for any individual value (e.g.
>  val['a']['b']['c'] where val['a'] or
>  val['b'] is null, a string, or a number), an error is
>  raised even if other values do conform.
> 
> 
>  An example of subscripting syntax:

Yes, makes sense. I've incorporated your suggestion into the last patch,
thanks.
>From c9143a620497dac5615c4de1d9349684e9af95b5 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 18 Dec 2020 17:19:51 +0100
Subject: [PATCH v45 1/3] Subscripting for jsonb

Subscripting implementation for jsonb. It does not support slices, does
not have a limit for number of subscripts and for assignment expects a
replace value to be of jsonb type. There is also one functional
difference in assignment via subscripting from jsonb_set, when an
original jsonb container is NULL, subscripting replaces it with an empty
jsonb and proceed with assignment.

For the sake of code reuse, some parts of jsonb functionality were
rearranged to allow use the same functions for jsonb_set and assign
subscripting operation.

The original idea belongs to Oleg Bartunov.

Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule
---
 doc/src/sgml/json.sgml  |  48 
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/jsonb_util.c  |  76 -
 src/backend/utils/adt/jsonbsubs.c   | 413 
 src/backend/utils/adt/jsonfuncs.c   | 180 ++--
 src/include/catalog/pg_proc.dat |   4 +
 src/include/catalog/pg_type.dat |   3 +-
 src/include/utils/jsonb.h   |   6 +-
 src/test/regress/expected/jsonb.out | 272 +-
 src/test/regress/sql/jsonb.sql  |  84 +-
 10 files changed, 982 insertions(+), 105 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonbsubs.c

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 5b9a5557a4..100d1a60f4 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -602,6 +602,54 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu
   
  
 
+ 
+  jsonb Subscripting
+  
+   jsonb data type supports array-style subscripting expressions
+   to extract or update particular elements. It's possible to use multiple
+   subscripting expressions to extract nested values. In this case, a chain of
+   subscripting expressions follows the same rules as the
+   path argument in jsonb_set function,
+   e.g. in case of arrays it is a 0-based operation or that negative integers
+   that appear in path count from the end of JSON arrays.
+   The result of subscripting expressions is always jsonb data type. An
+   example of subscripting syntax:
+
+-- Extract value by key
+SELECT ('{"a": 1}'::jsonb)['a'];
+
+-- Extract nested value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract element by index
+SELECT ('[1, "2", null]'::jsonb)[1];
+
+-- Update value by key, note the single quotes - the assigned value
+-- needs to be of jsonb type as well
+UPDATE table_name SET jsonb_field['key'] = '1';
+
+-- Select records using where clause with subscripting. Since the result of
+-- subscripting is jsonb and we basically want to compare two jsonb objects, we
+-- need to put the value in double quotes to be able to convert it to jsonb.
+SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"';
+
+
+  Subscripting for jsonb does not support slice expressions,
+  even if it contains an array.
+
+  In case if source jsonb is NULL, assignment
+  via subscripting will proceed as if it was an empty JSON object:
+
+-- If jsonb_field here is NULL, the result is {"a": 1}
+UPDATE table_name SET jsonb_field['a'] = '1';
+
+-- If jsonb_field here is NULL, the result is [1]

  1   2   >