Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2024-05-21 Thread Ajin Cherian
On Tue, Mar 19, 2024 at 1:49 PM Ajin Cherian  wrote:

>
>
>> Of course you can, but this will only convert disk space into memory
>> space.
>>  For details, please see the case in Email [1].
>>
>> [1]
>> https://www.postgresql.org/message-id/CAGfChW51P944nM5h0HTV9HistvVfwBxNaMt_s-OZ9t%3DuXz%2BZbg%40mail.gmail.com
>>
>> Regards, lijie
>>
>>
>
In some testing, I see a crash:
(gdb) bt
#0  0x7fa5bcbfd277 in raise () from /lib64/libc.so.6
#1  0x7fa5bcbfe968 in abort () from /lib64/libc.so.6
#2  0x009e0940 in ExceptionalCondition (
conditionName=conditionName@entry=0x7fa5ab8b9842 "RelationSyncCache !=
NULL",
fileName=fileName@entry=0x7fa5ab8b9820 "pgoutput.c",
lineNumber=lineNumber@entry=1991)
at assert.c:66
#3  0x7fa5ab8b7804 in get_rel_sync_entry (data=data@entry=0x2492288,
relation=relation@entry=0x7fa5be30a768) at pgoutput.c:1991
#4  0x7fa5ab8b7cda in pgoutput_table_filter (ctx=,
relation=0x7fa5be30a768,
change=0x24c5c20) at pgoutput.c:1671
#5  0x00813761 in filter_by_table_cb_wrapper (ctx=ctx@entry=0x2491fd0,

relation=relation@entry=0x7fa5be30a768, change=change@entry=0x24c5c20)
at logical.c:1268
#6  0x0080e20f in FilterByTable (ctx=ctx@entry=0x2491fd0,
change=change@entry=0x24c5c20)
at decode.c:690
#7  0x0080e8e3 in DecodeInsert (ctx=ctx@entry=0x2491fd0,
buf=buf@entry=0x7fff0db92550)
at decode.c:1070
#8  0x0080f43d in heap_decode (ctx=ctx@entry=0x2491fd0,
buf=buf@entry=0x7fff0db92550)
at decode.c:485
#9  0x0080eca6 in LogicalDecodingProcessRecord
(ctx=ctx@entry=0x2491fd0,
record=0x2492368)
at decode.c:118
#10 0x0081338f in DecodingContextFindStartpoint
(ctx=ctx@entry=0x2491fd0)
at logical.c:672
#11 0x0083c650 in CreateReplicationSlot (cmd=cmd@entry=0x2490970)
at walsender.c:1323
#12 0x0083fd48 in exec_replication_command (
cmd_string=cmd_string@entry=0x239c880 "CREATE_REPLICATION_SLOT
\"pg_16387_sync_16384_7371301304766135621\" LOGICAL pgoutput (SNAPSHOT
'use')") at walsender.c:2116

The reason for the crash is that the RelationSyncCache was NULL prior to
reaching a consistent point.
Hi li jie, I see that you created a new thread with an updated version of
this patch [1]. I used that patch and addressed the crash seen above,
rebased the patch and addressed a few other comments.
I'm happy to help you with this patch and address comments if you are not
available.

regards,
Ajin Cherian
Fujitsu Australia
[1] -
https://www.postgresql.org/message-id/CAGfChW7%2BZMN4_NHPgz24MM42HVO83ecr9TLfpihJ%3DM0s1GkXFw%40mail.gmail.com


v2-0001-Reduce-useless-changes-before-reassemble-during-l.patch
Description: Binary data


Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

2024-05-21 Thread David Rowley
(Thanks for your review. I'm sorry I didn't have time and energy to
respond properly until now)

On Tue, 21 May 2024 at 23:48, Heikki Linnakangas  wrote:
> BTW, could the same machinery be used for INTERSECT as well? There was a
> brief mention of that in the original thread, but I didn't understand
> the details. Not for v17, but I'm curious. I was wondering if
> build_setop_child_paths() should be named build_union_child_paths(),
> since it's only used with UNIONs, but I guess it could be used as is for
> INTERSECT too.

I'd previously thought about that, but when I thought about it I'd
considered getting rid of the SetOp Intersect and replacing with a
join.  To do that my conclusion was that we'd first need to improve
joins using IS NOT DISTINCT FROM, as that's the behaviour we need for
correct setop NULL handling.  However, on relooking, I see that we
could still use SetOp Intersect with the flags injected into the
targetlist and get sorted results to it via Merge Append rather than
Append.  That might require better Const handling than what's in the
patch today due to the 1/0 flag that gets added to the subquery tlist.
I was unsure how much trouble to go to for INTERSECT. I spent about 7
years in a job writing queries and don't recall ever feeling the need
to use INTERSECT.  I did use EXCEPT, however... like at least once.
I'll probably circle back to it one day. People maybe don't use it
because it's so terribly optimised.

> # Testing
>
> postgres=# begin; create table foo as select i from generate_series(1,
> 100) i; create index on foo (i); commit;
> BEGIN
> SELECT 100
> CREATE INDEX
> COMMIT
> postgres=# set enable_seqscan=off;
> SET
> postgres=# explain (select 1 as i union select i from foo) order by i;
>QUERY PLAN
>
> --
>   Unique  (cost=144370.89..149370.89 rows=101 width=4)
> ->  Sort  (cost=144370.89..146870.89 rows=101 width=4)
>   Sort Key: (1)
>   ->  Append  (cost=0.00..31038.44 rows=101 width=4)
> ->  Result  (cost=0.00..0.01 rows=1 width=4)
> ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=100 width=4)
> (6 rows)
>
> I'm disappointed it couldn't produce a MergeAppend plan. If you replace
> the "union" with "union all" you do get a MergeAppend.
>
> Some more cases where I hoped for a MergeAppend:

I've not looked again in detail, but there was some discussion along
these lines in [1].  I think the problem is down to how we remove
redundant PathKeys when the EquivalenceClass has a Const. There can
only be 1 value, so no need for a PathKey to represent that. The
problem with that comes with lack of equivalence visibility through
subqueries.  The following demonstrates:

create table ab(a int, b int, primary key(a,b));
set enable_seqscan=0;
set enable_bitmapscan=0;

explain (costs off) select * from (select * from ab where a=1 order by
b) order by a,b;
QUERY PLAN
---
 Sort
   Sort Key: ab.a, ab.b
   ->  Index Only Scan using ab_pkey on ab
 Index Cond: (a = 1)
(4 rows)

explain (costs off) select * from (select * from ab where a=1 order by
b) order by b;
 QUERY PLAN
-
 Index Only Scan using ab_pkey on ab
   Index Cond: (a = 1)
(2 rows)

Because the subquery only publishes that it's ordered by "b", the
outer query thinks it needs to sort on "a,b".  That's a wasted effort
since the subquery has an equivalence class for "a" with a constant.
The outer query doesn't know that.

> postgres=# explain (select i, 'foo' from foo union select i, 'foo' from
> foo) order by 1;
>   QUERY PLAN
>
> -
>   Unique  (cost=380767.54..395767.54 rows=200 width=36)
> ->  Sort  (cost=380767.54..385767.54 rows=200 width=36)
>   Sort Key: foo.i, ('foo'::text)
>   ->  Append  (cost=0.42..62076.85 rows=200 width=36)
> ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=100 width=36)
> ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=100 width=36)
> (6 rows)
>
>
> postgres=# explain (select 'foo', i from foo union select 'bar', i from
> foo) order by 1;
>   QUERY PLAN
>
> -
>   Unique  (cost=380767.54..395767.54 rows=200 width=36)
> ->  Sort  (cost=380767.54..385767.54 rows=200 width=36)
>   Sort Key: ('foo'::text), foo.i
>   ->  Append  (cost=0.42..62076.85 rows=200 width=36)
> ->  Index Only 

Re: Shared detoast Datum proposal

2024-05-21 Thread Andy Fan


Andy Fan  writes:

> Hi Robert, 
>
>> Andy Fan asked me off-list for some feedback about this proposal. I
>> have hesitated to comment on it for lack of having studied the matter
>> in any detail, but since I've been asked for my input, here goes:
>
> Thanks for doing this! Since we have two totally different designs
> between Tomas and me and I think we both understand the two sides of
> each design, just that the following things are really puzzled to me.

my emacs was stuck and somehow this incompleted message was sent out,
Please ignore this post for now. Sorry for this.

-- 
Best Regards
Andy Fan





Re: First draft of PG 17 release notes

2024-05-21 Thread Masahiko Sawada
Hi,

On Thu, May 9, 2024 at 1:03 PM Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html
>
> It will be improved until the final release.  The item count is 188,
> which is similar to recent releases:
>

I found a typo:

s/pg_statstatement/pg_stat_statement/

I've attached a patch to fix it.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix_pg_stat_statements.patch
Description: Binary data


Re: Shared detoast Datum proposal

2024-05-21 Thread Andy Fan


Hi Robert, 

> Andy Fan asked me off-list for some feedback about this proposal. I
> have hesitated to comment on it for lack of having studied the matter
> in any detail, but since I've been asked for my input, here goes:

Thanks for doing this! Since we have two totally different designs
between Tomas and me and I think we both understand the two sides of
each design, just that the following things are really puzzled to me.

- which factors we should care more.
- the possiblility to improve the design-A/B for its badness.

But for the point 2, we probably needs some prefer design first.

for now let's highlight that there are 2 totally different method to
handle this problem. Let's call:

design a:  detoast the datum into slot->tts_values (and manage the
memory with the lifespan of *slot*). From me.

design b:  detost the datum into a CACHE (and manage the memory with
CACHE eviction and at released the end-of-query). From Tomas.

Currently I pefer design a, I'm not sure if I want desgin a because a is
really good or just because design a is my own design. So I will
summarize the the current state for the easier discussion. I summarize
them so that you don't need to go to the long discussion, and I summarize
the each point with the link to the original post since I may
misunderstand some points and the original post can be used as a double
check. 

> I agree that there's a problem here, but I suspect this is not the
> right way to solve it. Let's compare this to something like the
> syscache. Specifically, let's think about the syscache on
> pg_class.relname.

OK.

> In the case of the syscache on pg_class.relname, there are two reasons
> why we might repeatedly look up the same values in the cache. One is
> that the code might do multiple name lookups when it really ought to
> do only one. Doing multiple lookups is bad for security and bad for
> performance, but we have code like that in some places and some of it
> is hard to fix. The other reason is that it is likely that the user
> will mention the same table names repeatedly; each time they do, they
> will trigger a lookup on the same name. By caching the result of the
> lookup, we can make it much faster. An important point to note here is
> that the queries that the user will issue in the future are unknown to
> us. In a certain sense, we cannot even know whether the same table
> name will be mentioned more than once in the same query: when we reach
> the first instance of the table name and look it up, we do not have
> any good way of knowing whether it will be mentioned again later, say
> due to a self-join. Hence, the pattern of cache lookups is
> fundamentally unknowable.

True. 

> But that's not the case in the motivating example that started this
> thread. In that case, the target list includes three separate
> references to the same column. We can therefore predict that if the
> column value is toasted, we're going to de-TOAST it exactly three
> times.

True.

> If, on the other hand, the column value were mentioned only
> once, it would be detoasted just once. In that latter case, which is
> probably quite common, this whole cache idea seems like it ought to
> just waste cycles, which makes me suspect that no matter what is done
> to this patch, there will always be cases where it causes significant
> regressions.

I agree.

> In the former case, where the column reference is
> repeated, it will win, but it will also hold onto the detoasted value
> after the third instance of detoasting, even though there will never
> be any more cache hits.

True.

> Here, unlike the syscache case, the cache is
> thrown away at the end of the query, so future queries can't benefit.
> Even if we could find a way to preserve the cache in some cases, it's
> not very likely to pay off. People are much more likely to hit the
> same table more than once than to hit the same values in the same
> table more than once in the same session.

I agree.

One more things I want to highlight it "syscache" is used for metadata
and *detoast cache* is used for user data. user data is more 
likely bigger than metadata, so cache size control (hence eviction logic
run more often) is more necessary in this case. The first graph in [1]
(including the quota text) highlight this idea.

> Suppose we had a design where, when a value got detoasted, the
> detoasted value went into the place where the toasted value had come
> from. Then this problem would be handled pretty much perfectly: the
> detoasted value would last until the scan advanced to the next row,
> and then it would be thrown out. So there would be no query-lifespan
> memory leak.

I think that is same idea as design a.

> Now, I don't really know how this would work, because the
> TOAST pointer is coming out of a slot and we can't necessarily write
> one attribute of any arbitrary slot type without a bunch of extra
> overhead, but maybe there's some way.

I think the key points includes:

1. Where to put the *detoast 

Re: Skip adding row-marks for non target tables when result relation is foreign table.

2024-05-21 Thread Jeff Davis
On Mon, 2024-05-06 at 23:10 +0100, SAIKIRAN AVULA wrote:
> I would like to bring to your attention an observation regarding the
> planner's behavior for foreign table update/delete operations. It
> appears that the planner adds rowmarks (ROW_MARK_COPY) for non-target
> tables, which I believe is unnecessary when using the postgres-fdw.
> This is because postgres-fdw performs early locking on tuples
> belonging to the target foreign table by utilizing the SELECT FOR
> UPDATE clause.

I agree with your reasoning here. If it reads the row with SELECT FOR
UPDATE, what's the purpose of row marks?

The cost of ROW_MARK_COPY is that it brings the whole tuple along
rather than a reference. I assume you are concerned about wide tables
involved in the join or is there another concern?

> In an attempt to address this, I tried implementing late locking.

For others in the thread, see:

https://www.postgresql.org/docs/current/fdw-row-locking.html

> However, this approach still doesn't work as intended because the API
> assumes that foreign table rows can be re-fetched using TID (ctid).
> This assumption is invalid for partitioned tables on the foreign
> server.

It looks like it's a "Datum rowid", but is currently only allowed to be
a ctid, which can't identify the partition. I wonder how much work it
would be to fix this?

>  Additionally, the commit afb9249d06f47d7a6d4a89fea0c3625fe43c5a5d,
> which introduced late locking for foreign tables, mentions that the
> benefits of late locking against a remote server are unclear, as the
> extra round trips required are likely to outweigh any potential
> concurrency improvements.

The extra round trip only happens when EPQ finds a newer version of the
tuple, which should be the exceptional case. I'm not sure how this
balances out, but to me late locking still seems preferable. Early
locking is a huge performance hit in some cases (locking many more rows
than necessary).

Early locking is also a violation of the documentation here:

"When a locking clause appears at the top level of a SELECT query, the
rows that are locked are exactly those that are returned by the query;
in the case of a join query, the rows locked are those that contribute
to returned join rows."

https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE

> To address this issue, I have taken the initiative to create a patch
> that prevents the addition of rowmarks for non-target tables when the
> target table is using early locking. I would greatly appreciate it if
> you could review the patch and provide any feedback or insights I may
> be overlooking.

A couple comments:

* You're using GetFdwRoutineForRelation() with makecopy=false, and then
closing the relation. If the rd_fdwroutine was already set previously,
then the returned pointer will point into the relcache, which may be
invalid after closing the relation. I'd probably pass makecopy=true and
then free it. (Weirdly if you pass makecopy=false, you may or may not
get a copy, so there's no way to know whether to free it or not.)

* Core postgres doesn't really choose early locking. If
RefetchForeignRow is not defined, then late locking is impossible, so
it assumes that early locking is happening. That assumption is true for
postgres_fdw, but might not be for other FDWs. What if an FDW doesn't
do early locking and also doesn't define RefetchForeignRow?

Regards,
Jeff Davis





Re: problems with "Shared Memory and Semaphores" section of docs

2024-05-21 Thread Imseih (AWS), Sami
> Any concerns with doing something like this [0] for the back-branches? The
> constant would be 6 instead of 7 on v14 through v16.

As far as backpatching the present inconsistencies in the docs,
[0] looks good to me.

[0] 
https://postgr.es/m/attachment/160360/v1-0001-fix-kernel-resources-docs-on-back-branches.patch
 



Regards,

Sami




Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Tristan Partin

On Tue May 21, 2024 at 10:04 AM CDT, Andres Freund wrote:

Hi,

On 2024-05-20 11:58:05 +0100, Dave Page wrote:
> I have very little experience with Meson, and even less interpreting it's
> logs, but it seems to me that it's not including the extra lib and include
> directories when it runs the test compile, given the command line it's
> reporting:
> 
> cl C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c

> /nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od /Oi-
> 
> Bug, or am I doing something silly?


It's a buglet. We rely on meson's internal fallback detection of zlib, if it's
not provided via pkg-config or cmake. But it doesn't know about our
extra_include_dirs parameter. We should probably fix that...


Here is the relevant Meson code for finding zlib in the Postgres tree:

postgres_inc_d = ['src/include']
postgres_inc_d += get_option('extra_include_dirs')
...
postgres_inc = [include_directories(postgres_inc_d)]
...
zlibopt = get_option('zlib')
zlib = not_found_dep
if not zlibopt.disabled()
  zlib_t = dependency('zlib', required: zlibopt)

  if zlib_t.type_name() == 'internal'
# if fallback was used, we don't need to test if headers are 
present (they
# aren't built yet, so we can't test)
zlib = zlib_t
  elif not zlib_t.found()
warning('did not find zlib')
  elif not cc.has_header('zlib.h',
  args: test_c_args, include_directories: postgres_inc,
  dependencies: [zlib_t], required: zlibopt)
warning('zlib header not found')
  elif not cc.has_type('z_streamp',
  dependencies: [zlib_t], prefix: '#include ',
  args: test_c_args, include_directories: postgres_inc)
if zlibopt.enabled()
  error('zlib version is too old')
else
  warning('zlib version is too old')
endif
  else
zlib = zlib_t
  endif

  if zlib.found()
cdata.set('HAVE_LIBZ', 1)
  endif
endif

You can see that we do pass the include dirs to the has_header check. 
Something seems to be going wrong here since your extra_include_dirs 
isn't being properly translated to include arguments.


--
Tristan Partin
https://tristan.partin.io




Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

2024-05-21 Thread David Rowley
On Wed, 22 May 2024 at 05:36, Robert Haas  wrote:
> The consensus on pgsql-release was to unrevert this patch and commit
> the fix now, rather than waiting for the next beta. However, the
> consensus was also to push the un-revert as a separate commit from the
> bug fix, rather than together as suggested by Álvaro. Since time is
> short due to the impending release and it's very late where you are,
> I've taken care of this. Hope that's OK.

Thanks for handling that. It's much appreciated.

David




Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Andrew Dunstan



On 2024-05-21 Tu 11:04, Andres Freund wrote:

Hi,

On 2024-05-20 11:58:05 +0100, Dave Page wrote:

I have very little experience with Meson, and even less interpreting it's
logs, but it seems to me that it's not including the extra lib and include
directories when it runs the test compile, given the command line it's
reporting:

cl C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c
/nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od /Oi-

Bug, or am I doing something silly?

It's a buglet. We rely on meson's internal fallback detection of zlib, if it's
not provided via pkg-config or cmake. But it doesn't know about our
extra_include_dirs parameter. We should probably fix that...



Yeah. Meanwhile, what I got working on a totally fresh Windows + VS 
install was instead of using extra_include_dirs etc to add the relevant 
directories to the environment LIB and INCLUDE settings before calling 
`"meson setup".



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: SQL:2011 application time

2024-05-21 Thread Jeff Davis
On Tue, 2024-05-21 at 13:57 -0400, Robert Haas wrote:
> What I think is less clear is what that means for temporal primary
> keys.

Right.

My message was specifically a response to the concern that there was
some kind of design flaw in the range types or exclusion constraints
mechanisms.

I don't believe that empty ranges represent a design flaw. If they
don't make sense for temporal constraints, then temporal constraints
should forbid them.

Regards,
Jeff Davis





Re: libpq compression (part 3)

2024-05-21 Thread Jacob Champion
On Tue, May 21, 2024 at 12:08 PM Jacob Burroughs
 wrote:
> I think I thought I was writing about something else when I wrote that
> :sigh:.  I think what I really should have written was a version of
> the part below, which is that we use streaming decompression, only
> decompress 8kb at a time, and for pre-auth messages limit them to
> `PG_MAX_AUTH_TOKEN_LENGTH` (65535 bytes), which isn't really enough
> data to actually cause any real-world pain by needing to decompress vs
> the equivalent pain of sending invalid uncompressed auth packets.

Okay. So it sounds like your position is similar to Robert's from
earlier: prefer allowing unauthenticated compressed packets for
simplicity, as long as we think it's safe for the server. (Personally
I still think a client that compresses its password packets is doing
it wrong, and we could help them out by refusing that.)

> We own both the canonical client and server, so those are both covered
> here.  I would think it would be the responsibility of any other
> system that maintains its own implementation of the postgres protocol
> and chooses to support the compression protocol to perform its own
> mitigations against potential compression security issues.

Sure, but if our official documentation is "here's an extremely
security-sensitive feature, figure it out!" then we've done a
disservice to the community.

> Should we
> put the fixed message size limits (that have de facto been part of the
> protocol since 2021, even if they weren't documented as such) into the
> protocol documentation?

Possibly? I don't know if the other PG-compatible implementations use
the same limits. It might be better to say "limits must exist".

> ( I don't really see how one could implement other tooling that used
> pg compression without using streaming compression, as the protocol
> never hands over a standalone blob of compressed data: all compressed
> data is always part of a stream, but even with streaming decompression
> you still need some kind of limits or you will just chew up memory.)

Well, that's a good point; I wasn't thinking about the streaming APIs
themselves. If the easiest way to implement decompression requires the
use of an API that shouts "hey, give me guardrails!", then that helps
quite a bit. I really need to look into the attack surface of the
three algorithms.

--Jacob




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Burroughs
On Tue, May 21, 2024 at 1:24 PM Jacob Champion
 wrote:
>
> We absolutely have to document the risks and allow clients to be
> written safely. But I think server-side controls on risky behavior
> have proven to be generally more valuable, because the server
> administrator is often in a better spot to see the overall risks to
> the system. ("No, you will not use deprecated ciphersuites. No, you
> will not access this URL over plaintext. No, I will not compress this
> response containing customer credit card numbers, no matter how nicely
> you ask.") There are many more clients than servers, so it's less
> risky for the server to enforce safety than to hope that every client
> is safe.
>
> Does your database and access pattern regularly mingle secrets with
> public data? Would auditing correct client use of compression be a
> logistical nightmare? Do your app developers keep indicating in
> conversations that they don't understand the risks at all? Cool, just
> set `encrypted_compression = nope_nope_nope` on the server and sleep
> soundly at night. (Ideally we would default to that.)

Thinking about this more (and adding a encrypted_compression GUC or
whatever), I think my inclination would on the server-side default
enable compression for insecure connections but default disable for
encrypted connections, but both would be config parameters that can be
changed as desired.

> The concept of stream reset seems necessary but insufficient at the
> application level, which bleeds over into Jelte's compression_restart
> proposal. (At the protocol level, I think it may be sufficient?)
>
> If I write a query where one of the WHERE clauses is
> attacker-controlled and the other is a secret, I would really like to
> not compress that query on the client side. If I join a table of user
> IDs against a table of user-provided addresses and a table of
> application tokens for that user, compressing even a single row leaks
> information about those tokens -- at a _very_ granular level -- and I
> would really like the server not to do that.
>
> So if I'm building sand castles... I think maybe it'd be nice to mark
> tables (and/or individual columns?) as safe for compression under
> encryption, whether by row or in aggregate. And maybe libpq and psql
> should be able to turn outgoing compression on and off at will.
>
> And I understand those would balloon the scope of the feature. I'm
> worried I'm doing the security-person thing and sucking all the air
> out of the room. I know not everybody uses transport encryption; for
> those people, compress-it-all is probably a pretty winning strategy,
> and there's no need to reset the compression context ever. And the
> pg_dump-style, "give me everything" use case seems like it could maybe
> be okay, but I really don't know how to assess the risk there, at all.

I would imagine that a large volume of uses of postgres are in
contexts (e.g. internal networks) where either no encryption is used
or even when encryption is used the benefit of compression vs the risk
of someone being a position to perform a BREACH-style sidechannel
attack against DB traffic is sufficiently high that compress-it-all
would be be quite useful in many cases.  Would some sort of
per-table/column marking be useful for some cases?  Probably, but that
doesn't seem to me like it needs to be in v1 of this feature as long
as the protocol layer itself is designed such that parties can
arbitrarily alternate between transmitting compressed and uncompressed
data.  Then if we build such a feature down the road we just add logic
around *when* we compress but the protocol layer doesn't change.

> Well... working out the security minutiae _after_ changing the
> protocol is not historically a winning strategy, I think. Better to do
> it as a vertical stack.

Thinking about it more, I agree that we probably should work out the
protocol level mechanism for resetting compression
context/enabling/disabling/reconfiguring compression as part of this
work.  I don't think that we need to have all the ways that the
application layer might choose to use such things done here, but we
should have all the necessary primitives worked out.

-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: problems with "Shared Memory and Semaphores" section of docs

2024-05-21 Thread Nathan Bossart
On Fri, May 17, 2024 at 02:21:23PM -0500, Nathan Bossart wrote:
> On Fri, May 17, 2024 at 12:48:37PM -0500, Nathan Bossart wrote:
>> Cool.  I'll at least fix the back-branches as-is, but I'll see about
>> revamping this stuff for v18.
> 
> Attached is probably the absolute least we should do for the back-branches.

Any concerns with doing something like this [0] for the back-branches?  The
constant would be 6 instead of 7 on v14 through v16.

I wrote a quick sketch for what a runtime-computed GUC might look like for
v18.  We don't have agreement on this approach, but I figured I'd post
something while we search for a better one.

[0] 
https://postgr.es/m/attachment/160360/v1-0001-fix-kernel-resources-docs-on-back-branches.patch

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From f5abd8150c317a0eaf4fb2aff79164ce929d2d96 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 21 May 2024 14:02:22 -0500
Subject: [PATCH v1 1/1] add semaphores_required GUC

---
 doc/src/sgml/config.sgml| 14 ++
 doc/src/sgml/runtime.sgml   | 28 +---
 src/backend/storage/ipc/ipci.c  |  6 +-
 src/backend/utils/misc/guc_tables.c | 12 
 4 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 698169afdb..f6afc941df 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11215,6 +11215,20 @@ dynamic_library_path = 
'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  semaphores_required (integer)
+  
+   semaphores_required configuration 
parameter
+  
+  
+  
+   
+Reports the number of semaphores that are needed for the server based
+on the number of allowed connections, worker processes, etc.
+   
+  
+ 
+
  
   ssl_library (string)
   
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 6047b8171d..e5ebfb8a8b 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -781,13 +781,13 @@ psql: error: connection to server on socket 
"/tmp/.s.PGSQL.5432" failed: No such

 SEMMNI
 Maximum number of semaphore identifiers (i.e., sets)
-at least ceil((max_connections + 
autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 
16) plus room for other applications
+at least ceil(semaphores_required / 16) plus 
room for other applications

 

 SEMMNS
 Maximum number of semaphores system-wide
-ceil((max_connections + autovacuum_max_workers + 
max_wal_senders + max_worker_processes + 5) / 16) * 17 plus room for 
other applications
+ceil(semaphores_required / 16) * 17 plus 
room for other applications

 

@@ -836,27 +836,33 @@ psql: error: connection to server on socket 
"/tmp/.s.PGSQL.5432" failed: No such
 

 When using System V semaphores,
-PostgreSQL uses one semaphore per allowed 
connection
-(), allowed autovacuum worker process
-() and allowed background
-process (), in sets of 16.
+PostgreSQL uses one semaphore per allowed 
connection,
+worker process, etc., in sets of 16.
 Each such set will
 also contain a 17th semaphore which contains a magic
 number, to detect collision with semaphore sets used by
 other applications. The maximum number of semaphores in the system
 is set by SEMMNS, which consequently must be at least
-as high as max_connections plus
-autovacuum_max_workers plus 
max_wal_senders,
-plus max_worker_processes, plus one extra for each 16
-allowed connections plus workers (see the formula in  plus one extra for
+each set of 16 required semaphores (see the formula in ).  The parameter SEMMNI
 determines the limit on the number of semaphore sets that can
 exist on the system at one time.  Hence this parameter must be at
-least ceil((max_connections + autovacuum_max_workers + 
max_wal_senders + max_worker_processes + 5) / 16).
+least ceil(semaphores_required / 16).
 Lowering the number
 of allowed connections is a temporary workaround for failures,
 which are usually confusingly worded No space
 left on device, from the function semget.
+The number of semaphores required by PostgreSQL
+is provided by the runtime-computed parameter
+semaphores_required, which can be determined before
+starting the server with a postgres command like:
+
+$ postgres -D $PGDATA -C semaphores_required
+
+The value of semaphores_required should be input into
+the aforementioned formulas to determine appropriate values for
+SEMMNI and SEMMNS.

 

diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 521ed5418c..3e030accac 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -372,11 +372,12 @@ InitializeShmemGUCs(void)
Size

Re: libpq compression (part 3)

2024-05-21 Thread Jacob Burroughs
On Tue, May 21, 2024 at 1:39 PM Jacob Champion
 wrote:
>
> If the server doesn't reject compressed packets pre-authentication,
> then case 2 isn't mitigated. (I haven't proven how risky that case is
> yet, to be clear.) In other words: if the threat model is that a
> client can attack us, we shouldn't assume that it will attack us
> politely.

I think I thought I was writing about something else when I wrote that
:sigh:.  I think what I really should have written was a version of
the part below, which is that we use streaming decompression, only
decompress 8kb at a time, and for pre-auth messages limit them to
`PG_MAX_AUTH_TOKEN_LENGTH` (65535 bytes), which isn't really enough
data to actually cause any real-world pain by needing to decompress vs
the equivalent pain of sending invalid uncompressed auth packets.

> > Because we are using streaming decompression, this is much less of an
> > issue than for things that decompress wholesale onto disk/into memory.
>
> (I agree in general, but since you're designing a protocol extension,
> IMO it's not enough that your implementation happens to mitigate
> risks. We more or less have to bake those mitigations into the
> specification of the extension, because things that aren't servers
> have to decompress now. Similar to RFC security considerations.)

We own both the canonical client and server, so those are both covered
here.  I would think it would be the responsibility of any other
system that maintains its own implementation of the postgres protocol
and chooses to support the compression protocol to perform its own
mitigations against potential compression security issues.  Should we
put the fixed message size limits (that have de facto been part of the
protocol since 2021, even if they weren't documented as such) into the
protocol documentation?  That would give implementers of the protocol
numbers that they could actually rely on when implementing the
appropriate safeguards because they would be able to actually have
explicit guarantees about the size of messages. I think it would make
more sense to put the limits on the underlying messages rather than
adding an additional limit that only applies to compressed messages.
( I don't really see how one could implement other tooling that used
pg compression without using streaming compression, as the protocol
never hands over a standalone blob of compressed data: all compressed
data is always part of a stream, but even with streaming decompression
you still need some kind of limits or you will just chew up memory.)

-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: First draft of PG 17 release notes

2024-05-21 Thread Robert Haas
On Tue, May 21, 2024 at 2:26 PM Melanie Plageman
 wrote:
> For the vacuum WAL volume reduction, there were a bunch of smaller
> projects throughout the last development year that I worked on that
> were committed by different people and with different individual
> benefits. Some changes caused vacuum to do less visibility checks (so
> less CPU usage), some changed WAL format in a way that saves some
> space, and some, like the commit you mention, make vacuum emit less
> WAL. That commit by itself doesn't contain all of the user benefits of
> the whole project. I couldn't think of a good place to list all of the
> commits together that were part of the same project. Perhaps you could
> argue that they were not in fact part of the same project and instead
> were just small individual changes -- none of which are individually
> worth including in the release notes.

Yeah, I think a lot of projects have this problem in one way or
another, but I think it may be worse for performance-related projects.

I wasn't intending to knock that particular commit, just to be clear,
or the commit message. I'm just saying that sometimes summarizing the
commit log may not be as easy as we'd hope.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Convert node test compile-time settings into run-time parameters

2024-05-21 Thread Andres Freund
Hi,

On 2024-05-20 09:28:39 +0200, Peter Eisentraut wrote:
> - Performance?  Looking for example at pg_parse_query() and its siblings,
> they also check for other debugging settings like log_parser_stats in the
> main code path, so it doesn't seem to be a concern.

I don't think we can conclude that. Just because we've not been that careful
about performance in a few spots doesn't mean we shouldn't be careful in other
areas. And I think something like log_parser_stats is a lot more generally
useful than debug_copy_parse_plan_trees.

The branch itself isn't necessarily the issue, the branch predictor can handle
that to a good degree. The reduction in code density is a bigger concern - and
also very hard to measure, because the cost is very incremental and
distributed.

At the very least I'd add unlikely() to all of the branches, so the debug code
can be placed separately from the "normal" portions.


Where I'd be more concerned about peformance is the added branch in
READ_LOCATION_FIELD. There are a lot of calls to that, addding runtime
branches to each, with external function calls inside, is somewhat likely to
be measurable.


> - Access control?  I have these settings as PGC_USERSET for now. Maybe they
> should be PGC_SUSET?

That probably would be right.


> Another thought:  Do we really need three separate settings?

Maybe not three settings, but a single setting, with multiple values, like
debug_io_direct?


Greetings,

Andres Freund




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Champion
On Tue, May 21, 2024 at 9:14 AM Jacob Burroughs
 wrote:
> On Tue, May 21, 2024 at 10:43 AM Jelte Fennema-Nio  wrote:
> > To help get everyone on the same page I wanted to list all the
> > security concerns in one place:
> >
> > 1. Triggering excessive CPU usage before authentication, by asking for
> > very high compression levels
> > 2. Triggering excessive memory/CPU usage before authentication, by
> > sending a client sending a zipbomb
> > 3. Triggering excessive CPU after authentication, by asking for a very
> > high compression level
> > 4. Triggering excessive memory/CPU after authentication due to
> > zipbombs (i.e. small amount of data extracting to lots of data)
> > 5. CRIME style leakage of information about encrypted data
> >
> > 1 & 2 can easily be solved by not allowing any authentication packets
> > to be compressed. This also has benefits for 5.
>
> This is already addressed by only compressing certain message types.
> If we think it is important that the server reject compressed packets
> of other types I can add that, but it seemed reasonable to just make
> the client never send such packets compressed.

If the server doesn't reject compressed packets pre-authentication,
then case 2 isn't mitigated. (I haven't proven how risky that case is
yet, to be clear.) In other words: if the threat model is that a
client can attack us, we shouldn't assume that it will attack us
politely.

> > 4 would require some safety limits on the amount of data that a
> > (small) compressed message can be decompressed to, and stop
> > decompression of that message once that limit is hit. What that limit
> > should be seems hard to choose though. A few ideas:
> > a. The size of the message reported by the uncompressed header. This
> > would mean that at most the 4GB will be uncompressed, since maximum
> > message length is 4GB (limited by 32bit message length field)
> > b. Allow servers to specify maximum client decompressed message length
> > lower than this 4GB, e.g. messages of more than 100MB of uncompressed
> > size should not be allowed.
>
> Because we are using streaming decompression, this is much less of an
> issue than for things that decompress wholesale onto disk/into memory.

(I agree in general, but since you're designing a protocol extension,
IMO it's not enough that your implementation happens to mitigate
risks. We more or less have to bake those mitigations into the
specification of the extension, because things that aren't servers
have to decompress now. Similar to RFC security considerations.)

--Jacob




Re: SQL:2011 application time

2024-05-21 Thread Isaac Morland
On Tue, 21 May 2024 at 13:57, Robert Haas  wrote:

What I think is less clear is what that means for temporal primary
> keys. As Paul pointed out upthread, in every other case, a temporal
> primary key is at least as unique as a regular primary key, but in
> this case, it isn't. And someone might reasonably think that a
> temporal primary key should exclude empty ranges just as all primary
> keys exclude nulls. Or they might think the opposite.
>

Fascinating. I think you're absolutely right that it's clear that two empty
intervals don't conflict. If somebody wants to claim two intervals
conflict, they need to point to at least one instant in time that is common
between them.

But a major point of a primary key, it seems to me, is that it uniquely
identifies a row. If items are identified by a time range, non-overlapping
or not, then the empty range can only identify one item (per value of
whatever other columns are in the primary key). I think for a unique key
the non-overlapping restriction has to be considered an additional
restriction on top of the usual uniqueness restriction.

I suspect in many applications there will be a non-empty constraint; for
example, it seems quite reasonable to me for a meeting booking system to
forbid empty meetings. But when they are allowed they should behave in the
mathematically appropriate way.


Re: First draft of PG 17 release notes

2024-05-21 Thread Melanie Plageman
On Tue, May 21, 2024 at 1:51 PM Robert Haas  wrote:
>
> On Tue, May 21, 2024 at 12:27 PM Andres Freund  wrote:
> > To me that's the "General Performance" section. If somebody reading the
> > release notes doesn't care about performance, they can just skip that 
> > section
> > ([1]).  I don't see why we wouldn't want to include the same level of detail
> > as for other changes.
>
> I'm relatively sure that we've had this argument in previous years and
> essentially everyone but Bruce has agreed with the idea that
> performance changes ought to be treated the same as any other kind of
> improvement. The difficulty is that Bruce is the one doing the release
> notes. I think it might help if someone were willing to prepare a
> patch showing what they think specifically should be changed. Or maybe
> Bruce would be willing to provide a list of all of the performance
> improvements he doesn't think are worth release-noting or isn't sure
> how to release-note, and someone else can then have a go at them.
>
> Personally, I suspect that a part of the problem, other than the
> inevitable fact that the person doing the work has a perspective on
> how the work should be done with which not everyone will agree, is
> that a lot of performance changes have commit messages that don't
> really explain what the user impact is. For instance, consider
> 6dbb490261a6170a3fc3e326c6983ad63e795047 ("Combine freezing and
> pruning steps in VACUUM"). It does actually say what the benefit is
> ("That reduces the overall amount of WAL generated") but the reader
> could easily be left wondering whether that is really the selling
> point. Does it also reduce CPU consumption? Is that more or less
> important than the WAL reduction? Was the WAL reduction the motivation
> for the work? Is the WAL reduction significant enough that this is a
> feature in its own right, or is this just preparatory to some other
> work? These kinds of ambiguities can exist for any commit, not just
> performance commits, but I bet that on average the problem is worse
> for performance-related commits.

In Postgres development, we break larger projects into smaller ones
and then those smaller projects into multiple individual commits. Each
commit needs to stand alone and each subproject needs to have a
defensible benefit. One thing that is harder with performance-related
work than non-performance feature work is that there isn't always a
final "turn it on" commit. For example, let's say you are adding a new
view that tracks new stats of some kind. You do a bunch of refactoring
and small subprojects to make it possible to add the view. Then the
final commit that actually creates the view has obvious user value to
whoever is reading the log. For performance features, it doesn't
always work like this.

For the vacuum WAL volume reduction, there were a bunch of smaller
projects throughout the last development year that I worked on that
were committed by different people and with different individual
benefits. Some changes caused vacuum to do less visibility checks (so
less CPU usage), some changed WAL format in a way that saves some
space, and some, like the commit you mention, make vacuum emit less
WAL. That commit by itself doesn't contain all of the user benefits of
the whole project. I couldn't think of a good place to list all of the
commits together that were part of the same project. Perhaps you could
argue that they were not in fact part of the same project and instead
were just small individual changes -- none of which are individually
worth including in the release notes.

- Melanie




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Champion
On Tue, May 21, 2024 at 8:23 AM Jacob Burroughs
 wrote:
> As currently implemented, the compression only applies to
> CopyData/DataRow/Query messages, none of which should be involved in
> authentication, unless I've really missed something in my
> understanding.

Right, but Robert has argued that we should compress it all, and I'm
responding to that proposal.

Sorry for introducing threads within threads. But I think it's
valuable to pin down both 1) the desired behavior, and 2) how the
current proposal behaves, as two separate things. I'll try to do a
better job of communicating which I'm talking about.

> > Right, I think it's reasonable to let a sufficiently
> > determined/informed user lift the guardrails, but first we have to
> > choose to put guardrails in place... and then we have to somehow
> > sufficiently inform the users when it's okay to lift them.
>
> My thought would be that compression should be opt-in on the client
> side, with documentation around the potential security pitfalls. (I
> could be convinced it should be opt-in on the server side, but overall
> I think opt-in on the client side generally protects against footguns
> without excessively getting in the way

We absolutely have to document the risks and allow clients to be
written safely. But I think server-side controls on risky behavior
have proven to be generally more valuable, because the server
administrator is often in a better spot to see the overall risks to
the system. ("No, you will not use deprecated ciphersuites. No, you
will not access this URL over plaintext. No, I will not compress this
response containing customer credit card numbers, no matter how nicely
you ask.") There are many more clients than servers, so it's less
risky for the server to enforce safety than to hope that every client
is safe.

Does your database and access pattern regularly mingle secrets with
public data? Would auditing correct client use of compression be a
logistical nightmare? Do your app developers keep indicating in
conversations that they don't understand the risks at all? Cool, just
set `encrypted_compression = nope_nope_nope` on the server and sleep
soundly at night. (Ideally we would default to that.)

> and if an attacker controls the
> client, they can just get the information they want directly-they
> don't need compression sidechannels to get that information.)

Sure, but I don't think that's relevant to the threats being discussed.

> Within SQL-level things, I don't think we can reasonably differentiate
> between private and attacker-controlled information at the
> libpq/server level.

And by the IETF line of argument -- or at least the argument I quoted
above -- that implies that we really have no business introducing
compression when confidentiality is requested. A stronger approach
would require us to prove, or the user to indicate, safety before
compressing.

Take a look at the security notes for QPACK [1] -- keeping in mind
that they know _more_ about what's going on at the protocol level than
we do, due to the header design. And they still say things like "an
encoder might choose not to index values with low entropy" and "these
criteria ... will evolve over time as new attacks are discovered." A
huge amount is left as an exercise for the reader. This stuff is
really hard.

> We can reasonably differentiate between message
> types that *definitely* are private and ones that could have
> either/both data in them, but that's not nearly as useful.  I think
> not compressing auth-related packets plus giving a mechanism to reset
> the compression stream for clients (plus guidance on the tradeoffs
> involved in turning on compression) is about as good as we can get.

The concept of stream reset seems necessary but insufficient at the
application level, which bleeds over into Jelte's compression_restart
proposal. (At the protocol level, I think it may be sufficient?)

If I write a query where one of the WHERE clauses is
attacker-controlled and the other is a secret, I would really like to
not compress that query on the client side. If I join a table of user
IDs against a table of user-provided addresses and a table of
application tokens for that user, compressing even a single row leaks
information about those tokens -- at a _very_ granular level -- and I
would really like the server not to do that.

So if I'm building sand castles... I think maybe it'd be nice to mark
tables (and/or individual columns?) as safe for compression under
encryption, whether by row or in aggregate. And maybe libpq and psql
should be able to turn outgoing compression on and off at will.

And I understand those would balloon the scope of the feature. I'm
worried I'm doing the security-person thing and sucking all the air
out of the room. I know not everybody uses transport encryption; for
those people, compress-it-all is probably a pretty winning strategy,
and there's no need to reset the compression context ever. And the
pg_dump-style, "give me 

Re: SQL:2011 application time

2024-05-21 Thread Robert Haas
On Thu, May 16, 2024 at 7:22 PM Jeff Davis  wrote:
> An empty range does not "bypass" the an exclusion constraint. The
> exclusion constraint has a documented meaning and it's enforced.
>
> Of course there are situations where an empty range doesn't make a lot
> of sense. For many domains zero doesn't make any sense, either.
> Consider receiving an email saying "thank you for purchasing 0
> widgets!". Check constraints seem like a reasonable way to prevent
> those kinds of problems.

I think that's true. Having infinitely many events zero-length events
scheduled at the same point in time isn't necessarily a problem: I can
attend an infinite number of simultaneous meetings if I only need to
attend them for exactly zero time.

What I think is less clear is what that means for temporal primary
keys. As Paul pointed out upthread, in every other case, a temporal
primary key is at least as unique as a regular primary key, but in
this case, it isn't. And someone might reasonably think that a
temporal primary key should exclude empty ranges just as all primary
keys exclude nulls. Or they might think the opposite.

At least, so it seems to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: First draft of PG 17 release notes

2024-05-21 Thread Robert Haas
On Tue, May 21, 2024 at 12:27 PM Andres Freund  wrote:
> To me that's the "General Performance" section. If somebody reading the
> release notes doesn't care about performance, they can just skip that section
> ([1]).  I don't see why we wouldn't want to include the same level of detail
> as for other changes.

I'm relatively sure that we've had this argument in previous years and
essentially everyone but Bruce has agreed with the idea that
performance changes ought to be treated the same as any other kind of
improvement. The difficulty is that Bruce is the one doing the release
notes. I think it might help if someone were willing to prepare a
patch showing what they think specifically should be changed. Or maybe
Bruce would be willing to provide a list of all of the performance
improvements he doesn't think are worth release-noting or isn't sure
how to release-note, and someone else can then have a go at them.

Personally, I suspect that a part of the problem, other than the
inevitable fact that the person doing the work has a perspective on
how the work should be done with which not everyone will agree, is
that a lot of performance changes have commit messages that don't
really explain what the user impact is. For instance, consider
6dbb490261a6170a3fc3e326c6983ad63e795047 ("Combine freezing and
pruning steps in VACUUM"). It does actually say what the benefit is
("That reduces the overall amount of WAL generated") but the reader
could easily be left wondering whether that is really the selling
point. Does it also reduce CPU consumption? Is that more or less
important than the WAL reduction? Was the WAL reduction the motivation
for the work? Is the WAL reduction significant enough that this is a
feature in its own right, or is this just preparatory to some other
work? These kinds of ambiguities can exist for any commit, not just
performance commits, but I bet that on average the problem is worse
for performance-related commits.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

2024-05-21 Thread Robert Haas
On Tue, May 21, 2024 at 8:44 AM David Rowley  wrote:
> Thanks for having a look.  I was planning to have the commit message
> as per attached. I'd only split the patch for ease of review per
> request of Tom. I should have mentioned that here.
>
> I would adjust the exact wording in the final paragraph as required
> depending on what plan materialises.
>
> This also fixes up the comment stuff that Heikki mentioned.

The consensus on pgsql-release was to unrevert this patch and commit
the fix now, rather than waiting for the next beta. However, the
consensus was also to push the un-revert as a separate commit from the
bug fix, rather than together as suggested by Álvaro. Since time is
short due to the impending release and it's very late where you are,
I've taken care of this. Hope that's OK.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: First draft of PG 17 release notes

2024-05-21 Thread Peter Geoghegan
On Tue, May 21, 2024 at 12:27 PM Andres Freund  wrote:
> > I agree the impact of performance improvements are often greater than
> > the average release note item.  However, if people expect Postgres to be
> > faster, is it important for them to know _why_ it is faster?
>
> Yes, it very often is.

Is it important for them to even read the release notes?

Bruce's arguments against listing performance items more often/with
greater prominence could just as easily be applied to other types of
features, in other areas. Performance is a feature (or a feature
category) -- no better or worse than any other category of feature.

> Performance improvements typically aren't "make
> everything 3% faster", they're more "make this special thing 20%
> faster". Without know what got faster, users don't know if
> a) the upgrade will improve their production situation
> b) they need to change something to take advantage of the improvement

Another important category of performance improvement is "make the
thing that was just unusable usable, for the first time ever".

Sometimes the baseline is unreasonably slow, so an improvement
effectively allows you as a user to do something that just wasn't
possible on previous versions. Other times it's addressed at something
that was very scary, like VACUUMs that need multiple rounds of index
vacuuming. Multiple rounds of index vacuuming are just woefully,
horribly inefficient, and are the single individual thing that can
make things far worse. Even if you didn't technically have that
problem before now, you did have the problem of having to worry about
it. So the work in question has sanded-down this really nasty sharp
edge. That's a substantial quality of life improvement for many users.

In short, many individual performance improvements are best thought of
as qualitative improvements, rather than quantitative improvements.

It doesn't help that there is a kind of pressure to present them as
quantitative improvements. For example, I was recently encouraged to
present my own Postgres 17 B-Tree work internally using some kind of
headline grabbing measure like "6x faster". That just seems silly to
me. I can contrive a case where it's faster by an arbitrarily large
amount. Much like how a selective index scan can be arbitrarily faster
than a sequential scan. Again, a qualitative improvement.

-- 
Peter Geoghegan




Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Andres Freund
Hi,

On 2024-05-20 11:58:05 +0100, Dave Page wrote:
> I then attempt to build PostgreSQL:
> 
>  meson setup build
> -Dextra_include_dirs=C:/build64/openssl/include,C:/build64/zlib/include
> -Dextra_lib_dirs=C:/build64/openssl/lib,C:/build64/zlib/lib -Dssl=openssl
> -Dzlib=enabled --prefix=c:/build64/pgsql
> 
> Which results in the output in output.txt, indicating that OpenSSL was
> correctly found, but zlib was not. I've also attached the meson log.

I forgot to mention that earlier: Assuming you're building something to be
distributed, I'd recommend --auto-features=enabled/disabled and specifying
specifically which dependencies you want to be used.

Greetings,

Andres Freund




Re: First draft of PG 17 release notes

2024-05-21 Thread Andres Freund
Hi,

On 2024-05-21 09:27:20 -0700, Andres Freund wrote:
> Also, the release notes are also not just important to users. I often go back
> and look in the release notes to see when some some important change was made,
> because sometimes it's harder to find it in the git log, due to sheer
> volume. And even just keeping up with all the changes between two releases is
> hard, it's useful to be able to read the release notes and see what happened.
>
> [...]
>
> [1] I've wondered if we should have one more level of TOC on the release note
> page, so it's easier to jump to specific sections.

Which reminds me: Eventually I'd like to add links to the most important
commits related to release note entries. We already do much of the work of
building that list of commits for each entry. That'd allow a reader to find
more details if interested.

Right one either has to open the sgml file (which no user will know to do), or
find the git entries manually. The latter of which is often hard, because the
git commits often will use different wording etc.

Admittedly doing so within the constraints of docbook and not wanting to
overly decrease density (both in .sgml and the resulting html) isn't a trivial
task.


That's entirely independent of my concern around noting performance
improvements in the release notes, of course.

Greetings,

Andres Freund




Re: First draft of PG 17 release notes

2024-05-21 Thread Andres Freund
Hi,

On 2024-05-18 11:13:54 -0400, Bruce Momjian wrote:
> Please see the email I just posted.  There are three goals we have to
> adjust for:
> 
> 1.  short release notes so they are readable
> 2.  giving people credit for performance improvements
> 3.  showing people Postgres cares about performance
> 
> I would like to achieve 2 & 3 without harming #1.  My experience is if I
> am reading a long document, and I get to a section where I start to
> wonder, "Why should I care about this?", I start to skim the rest of
> the document.

I agree keeping things reasonably short is important. But I don't think you're
evenly applying it as a goal.

Just skimming the notes from the end, I see
- an 8 entries long pg_stat_statements section
- multiple entries about "Create custom wait events for ..."
- three entries about adding --all to {reindexdb,vacuumdb,clusterdb}.
- an entry about adding long options to pg_archivecleanup
- two entries about grantable maintenance rights, once via pg_maintain, once
  per-table
- separate entries about pg_stat_reset_slru(), pg_stat_reset_shared("slru"),

If you're concerned about brevity, we can make things shorter without skipping
over most performance imporvements.


> I am particularly critical if I start to wonder, "Why
> does the author _think_ I should care about this?" becasue it feels like
> the author is writing for him/herself and not the audience.

FWIW, I think it's a good thing for somebody other than the author to have a
hand in writing a release notes entry for a change. The primary author(s) are
often too deep into some issue to have a good view of the right level of
detail and understandability.

Greetings,

Andres Freund




Re: First draft of PG 17 release notes

2024-05-21 Thread Andres Freund
Hi,

On 2024-05-18 10:59:47 -0400, Bruce Momjian wrote:
> On Wed, May 15, 2024 at 08:48:02PM -0700, Andres Freund wrote:
> > +many.
> >
> > We're having this debate every release. I think the ongoing reticence to 
> > note
> > performance improvements in the release notes is hurting Postgres.
> >
> > For one, performance improvements are one of the prime reason users
> > upgrade. Without them being noted anywhere more dense than the commit log,
> > it's very hard to figure out what improved for users. A halfway widely
> > applicable performance improvement is far more impactful than many of the
> > feature changes we do list in the release notes.
>
> I agree the impact of performance improvements are often greater than
> the average release note item.  However, if people expect Postgres to be
> faster, is it important for them to know _why_ it is faster?

Yes, it very often is. Performance improvements typically aren't "make
everything 3% faster", they're more "make this special thing 20%
faster". Without know what got faster, users don't know if
a) the upgrade will improve their production situation
b) they need to change something to take advantage of the improvement


> On the flip side, a performance improvement that makes everything 10%
> faster has little behavioral change for users, and in fact I think we
> get ~6% faster in every major release.

I cannot recall many "make everything faster" improvements, if any.

And even if it's "make everything faster" - that's useful for users to know,
it might solve their production problem!  It's also good for PR.


Given how expensive postgres upgrades still are, we can't expect production
workloads to upgrade to every major version. The set of performance
improvements and feature additions between major versions can help users make
an informed decision.


Also, the release notes are also not just important to users. I often go back
and look in the release notes to see when some some important change was made,
because sometimes it's harder to find it in the git log, due to sheer
volume. And even just keeping up with all the changes between two releases is
hard, it's useful to be able to read the release notes and see what happened.


> > For another, it's also very frustrating for developers that focus on
> > performance. The reticence to note their work, while noting other, far
> > smaller, things in the release notes, pretty much tells us that our work 
> > isn't
> > valued.
>
> Yes, but are we willing to add text that every user will have to read
> just for this purpose?

Of course it's a tradeoff. We shouldn't verbosely note down every small
changes just because of the recognition, that'd make the release notes
unreadable. And it'd just duplicate the commit log. But that's not the same as
defaulting to not noting performance improvements, even if the performance
improvement is more impactful than many other features that are noted.


> One think we _could_ do is to create a generic performance release note
> item saying performance has been improved in the following areas, with
> no more details, but we can list the authors on the item.

To me that's the "General Performance" section. If somebody reading the
release notes doesn't care about performance, they can just skip that section
([1]).  I don't see why we wouldn't want to include the same level of detail
as for other changes.

Greetings,

Andres Freund

[1] I've wondered if we should have one more level of TOC on the release note
page, so it's easier to jump to specific sections.




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Burroughs
On Tue, May 21, 2024 at 10:43 AM Jelte Fennema-Nio  wrote:
>
> To help get everyone on the same page I wanted to list all the
> security concerns in one place:
>
> 1. Triggering excessive CPU usage before authentication, by asking for
> very high compression levels
> 2. Triggering excessive memory/CPU usage before authentication, by
> sending a client sending a zipbomb
> 3. Triggering excessive CPU after authentication, by asking for a very
> high compression level
> 4. Triggering excessive memory/CPU after authentication due to
> zipbombs (i.e. small amount of data extracting to lots of data)
> 5. CRIME style leakage of information about encrypted data
>
> 1 & 2 can easily be solved by not allowing any authentication packets
> to be compressed. This also has benefits for 5.

This is already addressed by only compressing certain message types.
If we think it is important that the server reject compressed packets
of other types I can add that, but it seemed reasonable to just make
the client never send such packets compressed.

> 3 & 4 are less of a concern than 1&2 imho. Once authenticated a client
> deserves some level of trust. But having knobs to limit impact
> definitely seems useful.
>
> 3 can be solved in two ways afaict:
> a. Allow the server to choose the maximum compression level for each
> compression method (using some GUC), and downgrade the level
> transparently when a higher level is requested
> b. Don't allow the client to choose the compression level that the server 
> uses.
>
> I'd prefer option a

3a would seem preferable given discussion upthread. It would probably
be worth doing some measurement to check how much of an actual
difference in compute effort the max vs the default for all 3
algorithms adds, because I would really prefer to avoid needing to add
even more configuration knobs if the max compression level for the
streaming data usecase is sufficiently performant.

> 4 would require some safety limits on the amount of data that a
> (small) compressed message can be decompressed to, and stop
> decompression of that message once that limit is hit. What that limit
> should be seems hard to choose though. A few ideas:
> a. The size of the message reported by the uncompressed header. This
> would mean that at most the 4GB will be uncompressed, since maximum
> message length is 4GB (limited by 32bit message length field)
> b. Allow servers to specify maximum client decompressed message length
> lower than this 4GB, e.g. messages of more than 100MB of uncompressed
> size should not be allowed.

Because we are using streaming decompression, this is much less of an
issue than for things that decompress wholesale onto disk/into memory.
We only read PQ_RECV_BUFFER_SIZE (8k) bytes off the stream at once,
and when reading a packet we already have a `maxmsglen` that is
PQ_LARGE_MESSAGE_LIMIT (1gb) already, and "We abort the connection (by
returning EOF) if client tries to send more than that.)".  Therefore,
we effectively already have a limit of 1gb that applies to regular
messages too, and I think we should rely on this mechanism for
compressed data too (if we really think we need to make that number
configurable we probably could, but again the fewer new knobs we need
to add the better.


> I think 5 is the most complicated to deal with, especially as it
> depends on the actual usage to know what is safe. I believe we should
> let users have the freedom to make their own security tradeoffs, but
> we should protect them against some of the most glaring issues
> (especially ones that benefit little from compression anyway). As
> already shown by Andrey, sending LDAP passwords in a compressed way
> seems extremely dangerous. So I think we should disallow compressing
> any authentication related packets. To reduce similar risks further we
> can choose to compress only the message types that we expect to
> benefit most from compression. IMHO those are the following (marked
> with (B)ackend or (F)rontend to show who sends them):
> - Query (F)
> - Parse (F)
> - Describe (F)
> - Bind (F)
> - RowDescription (B)
> - DataRow (B)
> - CopyData (B/F)

That seems like a reasonable list (current implementation is just
CopyData/DataRow/Query, but I really just copied that fairly blindly
from the previous incarnation of this effort.) See also my comment
below 1&2 for if we think we need to block decompressing them too.

> Then I think we should let users choose how they want to compress and
> where they want their compression stream to restart. Something like
> this:
> a. compression_restart=query: Restart the stream after every query.
> Recommended if queries across the same connection are triggered by
> different end-users. I think this would be a sane default
> b. compression_restart=message: Restart the stream for every message.
> Recommended if the amount of correlation between rows of the same
> query is a security concern.
> c. compression_restart=manual: Don't restart the stream automatically,
> but 

Re: libpq compression (part 3)

2024-05-21 Thread Jelte Fennema-Nio
On Mon, 20 May 2024 at 21:42, Jacob Champion
 wrote:
> As Andrey points out, there was prior work done that started to take
> this into account. I haven't reviewed it to see how good it is -- and
> I think there are probably many use cases in which queries and tables
> contain both private and attacker-controlled information -- but if we
> agree that they have to be separated, then the strategy can at least
> be improved upon.


To help get everyone on the same page I wanted to list all the
security concerns in one place:

1. Triggering excessive CPU usage before authentication, by asking for
very high compression levels
2. Triggering excessive memory/CPU usage before authentication, by
sending a client sending a zipbomb
3. Triggering excessive CPU after authentication, by asking for a very
high compression level
4. Triggering excessive memory/CPU after authentication due to
zipbombs (i.e. small amount of data extracting to lots of data)
5. CRIME style leakage of information about encrypted data

1 & 2 can easily be solved by not allowing any authentication packets
to be compressed. This also has benefits for 5.

3 & 4 are less of a concern than 1&2 imho. Once authenticated a client
deserves some level of trust. But having knobs to limit impact
definitely seems useful.

3 can be solved in two ways afaict:
a. Allow the server to choose the maximum compression level for each
compression method (using some GUC), and downgrade the level
transparently when a higher level is requested
b. Don't allow the client to choose the compression level that the server uses.

I'd prefer option a

4 would require some safety limits on the amount of data that a
(small) compressed message can be decompressed to, and stop
decompression of that message once that limit is hit. What that limit
should be seems hard to choose though. A few ideas:
a. The size of the message reported by the uncompressed header. This
would mean that at most the 4GB will be uncompressed, since maximum
message length is 4GB (limited by 32bit message length field)
b. Allow servers to specify maximum client decompressed message length
lower than this 4GB, e.g. messages of more than 100MB of uncompressed
size should not be allowed.

I think 5 is the most complicated to deal with, especially as it
depends on the actual usage to know what is safe. I believe we should
let users have the freedom to make their own security tradeoffs, but
we should protect them against some of the most glaring issues
(especially ones that benefit little from compression anyway). As
already shown by Andrey, sending LDAP passwords in a compressed way
seems extremely dangerous. So I think we should disallow compressing
any authentication related packets. To reduce similar risks further we
can choose to compress only the message types that we expect to
benefit most from compression. IMHO those are the following (marked
with (B)ackend or (F)rontend to show who sends them):
- Query (F)
- Parse (F)
- Describe (F)
- Bind (F)
- RowDescription (B)
- DataRow (B)
- CopyData (B/F)

Then I think we should let users choose how they want to compress and
where they want their compression stream to restart. Something like
this:
a. compression_restart=query: Restart the stream after every query.
Recommended if queries across the same connection are triggered by
different end-users. I think this would be a sane default
b. compression_restart=message: Restart the stream for every message.
Recommended if the amount of correlation between rows of the same
query is a security concern.
c. compression_restart=manual: Don't restart the stream automatically,
but only when the client user calls a specific function. Recommended
only if the user can make trade-offs, or if no encryption is used
anyway.




Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-21 Thread Andres Freund
On 2024-05-17 16:03:09 -0400, Peter Geoghegan wrote:
> On Fri, May 17, 2024 at 3:50 PM Andres Freund  wrote:
> > You're saying that the data is correctly accessible on primaries, but broken
> > on standbys? Is there any difference in how the page looks like on the 
> > primary
> > vs standby?
> 
> There clearly is. The relevant infomask bits are different. I didn't
> examine it much closer than that, though.

That could also just be because of a different replay position, hence my
question about that somewhere else in the email...




Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Dave Page
On Tue, 21 May 2024 at 16:04, Andres Freund  wrote:

> Hi,
>
> On 2024-05-20 11:58:05 +0100, Dave Page wrote:
> > I have very little experience with Meson, and even less interpreting it's
> > logs, but it seems to me that it's not including the extra lib and
> include
> > directories when it runs the test compile, given the command line it's
> > reporting:
> >
> > cl
> C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c
> > /nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od
> /Oi-
> >
> > Bug, or am I doing something silly?
>
> It's a buglet. We rely on meson's internal fallback detection of zlib, if
> it's
> not provided via pkg-config or cmake. But it doesn't know about our
> extra_include_dirs parameter. We should probably fix that...
>

Oh good, then I'm not going bonkers. I'm still curious about how it works
for Andrew but not me, however fixing that buglet should solve my issue,
and would be sensible behaviour.

Thanks!

-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com


Re: libpq compression (part 3)

2024-05-21 Thread Jacob Burroughs
On Mon, May 20, 2024 at 2:42 PM Jacob Champion
 wrote:
>
> I mean... you said it, not me. I'm trying not to rain on the parade
> too much, because compression is clearly very valuable. But it makes
> me really uncomfortable that we're reintroducing the compression
> oracle (especially over the authentication exchange, which is
> generally more secret than the rest of the traffic).

As currently implemented, the compression only applies to
CopyData/DataRow/Query messages, none of which should be involved in
authentication, unless I've really missed something in my
understanding.

> Right, I think it's reasonable to let a sufficiently
> determined/informed user lift the guardrails, but first we have to
> choose to put guardrails in place... and then we have to somehow
> sufficiently inform the users when it's okay to lift them.

My thought would be that compression should be opt-in on the client
side, with documentation around the potential security pitfalls. (I
could be convinced it should be opt-in on the server side, but overall
I think opt-in on the client side generally protects against footguns
without excessively getting in the way and if an attacker controls the
client, they can just get the information they want directly-they
don't need compression sidechannels to get that information.)

> But for SQL, where's the dividing line between attacker-chosen and
> attacker-sought? To me, it seems like only the user knows; the server
> has no clue. I think that puts us "lower" in Alyssa's model than HTTP
> is.
>
> As Andrey points out, there was prior work done that started to take
> this into account. I haven't reviewed it to see how good it is -- and
> I think there are probably many use cases in which queries and tables
> contain both private and attacker-controlled information -- but if we
> agree that they have to be separated, then the strategy can at least
> be improved upon.

Within SQL-level things, I don't think we can reasonably differentiate
between private and attacker-controlled information at the
libpq/server level.  We can reasonably differentiate between message
types that *definitely* are private and ones that could have
either/both data in them, but that's not nearly as useful.  I think
not compressing auth-related packets plus giving a mechanism to reset
the compression stream for clients (plus guidance on the tradeoffs
involved in turning on compression) is about as good as we can get.
That said, I *think* the feature is reasonable to be
reviewed/committed without the reset functionality as long as the
compressed data already has the mechanism built in (as it does) to
signal when a decompressor should restart its streaming.  The actual
signaling protocol mechanism/necessary libpq API can happen in
followon work.


-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: Possible Bug in relation_open

2024-05-21 Thread Tom Lane
Robert Haas  writes:
> On Tue, May 21, 2024 at 9:58 AM Pradeep Kumar  
> wrote:
>> If the user tries to open the relation in RangeVar and NoLock mode calling 
>> table_openrv(relation, NoLock), it will internally call 
>> relation_openrv()-->relation_open(). In relation_open() we checking the 
>> Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); , here we expecting 
>> the lockmode is NoLock or greater than that, but in same function again we 
>> checking this assert case Assert(lockmode != NoLock || 
>> IsBootstrapProcessingMode() || CheckRelationLockedByMe(r, AccessShareLock, 
>> true)); , here we are expecting (lockmode != NoLock) , so why are there two 
>> cases that contradict?  and What if the user tries to open the relation in 
>> NoLock mode? and that will definitely cause the assert failure, Suppose the 
>> user who writes some extension and reads some relation oid that is constant, 
>> and wants to acquire NoLock?, need some clarification on this.

> You need to acquire a lock. Otherwise, the relcache entry could change
> underneath you while you're accessing it, which would result in
> PostgreSQL crashing.

To clarify: the rule is that it's only allowed to pass NoLock if you
know for certain that some suitable lock on that relation is already
held by the current query.  That's why these conditions are complicated.

regards, tom lane




Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Andres Freund
Hi,

On 2024-05-20 11:58:05 +0100, Dave Page wrote:
> I have very little experience with Meson, and even less interpreting it's
> logs, but it seems to me that it's not including the extra lib and include
> directories when it runs the test compile, given the command line it's
> reporting:
> 
> cl C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c
> /nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od /Oi-
> 
> Bug, or am I doing something silly?

It's a buglet. We rely on meson's internal fallback detection of zlib, if it's
not provided via pkg-config or cmake. But it doesn't know about our
extra_include_dirs parameter. We should probably fix that...

Greetings,

Andres Freund




Re: Possible Bug in relation_open

2024-05-21 Thread Robert Haas
On Tue, May 21, 2024 at 9:58 AM Pradeep Kumar  wrote:
> If the user tries to open the relation in RangeVar and NoLock mode calling 
> table_openrv(relation, NoLock), it will internally call 
> relation_openrv()-->relation_open(). In relation_open() we checking the 
> Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); , here we expecting 
> the lockmode is NoLock or greater than that, but in same function again we 
> checking this assert case Assert(lockmode != NoLock || 
> IsBootstrapProcessingMode() || CheckRelationLockedByMe(r, AccessShareLock, 
> true)); , here we are expecting (lockmode != NoLock) , so why are there two 
> cases that contradict?  and What if the user tries to open the relation in 
> NoLock mode? and that will definitely cause the assert failure, Suppose the 
> user who writes some extension and reads some relation oid that is constant, 
> and wants to acquire NoLock?, need some clarification on this.

You need to acquire a lock. Otherwise, the relcache entry could change
underneath you while you're accessing it, which would result in
PostgreSQL crashing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Dave Page
Hi

On Tue, 21 May 2024 at 15:12, Sandeep Thakkar <
sandeep.thak...@enterprisedb.com> wrote:

> Hi Dave,
>
>
> On Tue, May 21, 2024 at 3:12 PM Dave Page  wrote:
>
>> Hi Sandeep, Nazir,
>>
>> On Tue, 21 May 2024 at 10:14, Nazir Bilal Yavuz 
>> wrote:
>>
>>> Hi,
>>>
>>> On Tue, 21 May 2024 at 10:20, Sandeep Thakkar
>>>  wrote:
>>> >
>>> > Hi Dave,
>>> >
>>> > Is the .pc file generated after the successful build of zlib? If yes,
>>> then meson should be able to detect the installation ideally
>>>
>>> If meson is not able to find the .pc file automatically, using 'meson
>>> setup ... --pkg-config-path $ZLIB_PC_PATH' might help.
>>>
>>
>> The problem is that on Windows there are no standard locations for a
>> Unix-style development library installation such as this, so the chances
>> are that the .pc file will point to entirely the wrong location.
>>
>> For example, please see
>> https://github.com/dpage/winpgbuild/actions/runs/9172187335 which is a
>> Github action that builds a completely vanilla zlib using VC++. If you look
>> at the uploaded artefact containing the build output and example the .pc
>> file, you'll see it references /zlib as the location, which is simply where
>> I built it in that action. On a developer's machine that's almost certainly
>> not going to be where it actually ends up. For example, on the pgAdmin
>> build farm, the dependencies all end up in C:\build64\[whatever]. On the
>> similar Github action I'm building for PostgreSQL, that artefact will be
>> unpacked into /build/zlib.
>>
>>
> The above link returned 404. But I found a successful build at
> https://github.com/dpage/winpgbuild/actions/runs/9175426807. I downloaded
> the artifact but didn't find .pc file as I wanted to look into the content
> of that file.
>

Yeah, sorry - that was an old one.

Following some offline discussion with Andrew I realised I was building
zlib incorrectly - using cmake/msbuild instead of nmake and some manual
copying. Whilst the former does create a working library and a sane looking
installation, it's not the recommended method, which is documented in a
very non-obvious way - far less obvious than "oh look, there's a cmake
config - let's use that".

The new build is done using the recommended method, which works with PG16
and below out of the box with no need to rename any files.


>
> I had a word with Murali who mentioned he encountered a similar issue
> while building PG17 on windows. He worked-around is by using a template .pc
> file that includes these lines:
> --
> prefix=${pcfiledir}/../..
> exec_prefix=${prefix}
> libdir=${prefix}/lib
> sharedlibdir=${prefix}/lib
> includedir=${prefix}/include
> --
>

The issue here is that there is no .pc file created with the correct way of
building zlib, and even if there were (or I created a dummy one),
pkg-config isn't really a thing on Windows.

I'd also note that from what Andrew has shown me of the zlib installation
on the buildfarm member drongo, there is no .pc file there either, and yet
seems to work fine (and my zlib installation now has the exact same set of
files as his does).


>
> But in general I agree with you on the issue of Meson's dependency on
> pkgconfig files to detect the third party libraries.
>
> Of course, for my own builds I can easily make everything use consistent
>> directories, however most people who are likely to want to build PostgreSQL
>> may not want to also build all the dependencies themselves as well, as some
>> are a lot more difficult than zlib. So what tends to happen is people find
>> third party builds or upstream official builds.
>>
>> I would therefore argue that if the .pc file that's found doesn't provide
>> correct paths for us, then Meson should fall back to searching in the paths
>> specified on its command line for the appropriate libraries/headers (which
>> is what it does for OpenSSL for example, as that doesn't include a .pc
>> file). This is also what happens with PG16 and earlier.
>>
>> One other thing I will note is that PG16 and earlier try to use the wrong
>> filename for the import library. For years, it's been a requirement to do
>> something like this: "copy \zlib\lib\zlib.lib \zlib\lib\zdll.lib" to make a
>> build succeed against a "vanilla" zlib build. I haven't got as far as
>> figuring out if the same is true with Meson yet.
>>
>> --
>> Dave Page
>> pgAdmin: https://www.pgadmin.org
>> PostgreSQL: https://www.postgresql.org
>> EDB: https://www.enterprisedb.com
>>
>>
>
> --
> Sandeep Thakkar
>
>
>

-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com


Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Sandeep Thakkar
Hi Dave,


On Tue, May 21, 2024 at 3:12 PM Dave Page  wrote:

> Hi Sandeep, Nazir,
>
> On Tue, 21 May 2024 at 10:14, Nazir Bilal Yavuz 
> wrote:
>
>> Hi,
>>
>> On Tue, 21 May 2024 at 10:20, Sandeep Thakkar
>>  wrote:
>> >
>> > Hi Dave,
>> >
>> > Is the .pc file generated after the successful build of zlib? If yes,
>> then meson should be able to detect the installation ideally
>>
>> If meson is not able to find the .pc file automatically, using 'meson
>> setup ... --pkg-config-path $ZLIB_PC_PATH' might help.
>>
>
> The problem is that on Windows there are no standard locations for a
> Unix-style development library installation such as this, so the chances
> are that the .pc file will point to entirely the wrong location.
>
> For example, please see
> https://github.com/dpage/winpgbuild/actions/runs/9172187335 which is a
> Github action that builds a completely vanilla zlib using VC++. If you look
> at the uploaded artefact containing the build output and example the .pc
> file, you'll see it references /zlib as the location, which is simply where
> I built it in that action. On a developer's machine that's almost certainly
> not going to be where it actually ends up. For example, on the pgAdmin
> build farm, the dependencies all end up in C:\build64\[whatever]. On the
> similar Github action I'm building for PostgreSQL, that artefact will be
> unpacked into /build/zlib.
>
>
The above link returned 404. But I found a successful build at
https://github.com/dpage/winpgbuild/actions/runs/9175426807. I downloaded
the artifact but didn't find .pc file as I wanted to look into the content
of that file.

I had a word with Murali who mentioned he encountered a similar issue while
building PG17 on windows. He worked-around is by using a template .pc file
that includes these lines:
--
prefix=${pcfiledir}/../..
exec_prefix=${prefix}
libdir=${prefix}/lib
sharedlibdir=${prefix}/lib
includedir=${prefix}/include
--

But in general I agree with you on the issue of Meson's dependency on
pkgconfig files to detect the third party libraries.

Of course, for my own builds I can easily make everything use consistent
> directories, however most people who are likely to want to build PostgreSQL
> may not want to also build all the dependencies themselves as well, as some
> are a lot more difficult than zlib. So what tends to happen is people find
> third party builds or upstream official builds.
>
> I would therefore argue that if the .pc file that's found doesn't provide
> correct paths for us, then Meson should fall back to searching in the paths
> specified on its command line for the appropriate libraries/headers (which
> is what it does for OpenSSL for example, as that doesn't include a .pc
> file). This is also what happens with PG16 and earlier.
>
> One other thing I will note is that PG16 and earlier try to use the wrong
> filename for the import library. For years, it's been a requirement to do
> something like this: "copy \zlib\lib\zlib.lib \zlib\lib\zdll.lib" to make a
> build succeed against a "vanilla" zlib build. I haven't got as far as
> figuring out if the same is true with Meson yet.
>
> --
> Dave Page
> pgAdmin: https://www.pgadmin.org
> PostgreSQL: https://www.postgresql.org
> EDB: https://www.enterprisedb.com
>
>

-- 
Sandeep Thakkar


Possible Bug in relation_open

2024-05-21 Thread Pradeep Kumar
Hello Hackers,

If the user tries to open the relation in RangeVar and NoLock mode
calling *table_openrv(relation,
NoLock), *it will internally call relation_openrv()-->relation_open().
In *relation_open()
*we checking the Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); ,
here we expecting the lockmode is *NoLock or greater than that*, but in
same function again we checking this assert case Assert(lockmode != NoLock
|| IsBootstrapProcessingMode() || CheckRelationLockedByMe(r,
AccessShareLock, true)); , here we are expecting (*lockmode != NoLock) *,
so why are there two cases that contradict?  and What if the user tries to
open the relation in NoLock mode? and that will definitely cause the assert
failure, Suppose the user who writes some extension and reads some relation
oid that is constant, and wants to acquire NoLock?, need some clarification
on this.

Thanks & Regards
Pradeep


Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-05-21 Thread Justin Pryzby
It occurred to me that psql \dP+ should show the AM of partitioned
tables (and other partitioned rels).
Arguably, this could've been done when \dP was introduced in v12, but
at that point would've shown the AM only for partitioned indexes.
But it makes a lot of sense to do it now that partitioned tables support
AMs.  I suggest to consider this for v17.

regression=# \dP+
  List of partitioned relations
 Schema | Name |  Owner  |   Type| Table  | 
Access method | Total size | Description
+--+-+---++---++-
 public | mlparted | pryzbyj | partitioned table || 
heap2 | 104 kB |
 public | tableam_parted_heap2 | pryzbyj | partitioned table || 
  | 32 kB  |
 public | trigger_parted   | pryzbyj | partitioned table || 
  | 0 bytes|
 public | upsert_test  | pryzbyj | partitioned table || 
  | 8192 bytes |
 public | trigger_parted_pkey  | pryzbyj | partitioned index | trigger_parted | 
btree | 16 kB  |
 public | upsert_test_pkey | pryzbyj | partitioned index | upsert_test| 
btree | 8192 bytes |
---
 src/bin/psql/describe.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index f67bf0b8925..22a668409e7 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4113,7 +4113,7 @@ listPartitionedTables(const char *reltypes, const char 
*pattern, bool verbose)
PQExpBufferData title;
PGresult   *res;
printQueryOpt myopt = pset.popt;
-   booltranslate_columns[] = {false, false, false, false, 
false, false, false, false, false};
+   booltranslate_columns[] = {false, false, false, false, 
false, false, false, false, false, false};
const char *tabletitle;
boolmixed_output = false;
 
@@ -4181,6 +4181,14 @@ listPartitionedTables(const char *reltypes, const char 
*pattern, bool verbose)
 
if (verbose)
{
+   /*
+* Table access methods were introduced in v12, and can be set 
on
+* partitioned tables since v17.
+*/
+   appendPQExpBuffer(,
+ ",\n  am.amname as \"%s\"",
+ gettext_noop("Access 
method"));
+
if (showNested)
{
appendPQExpBuffer(,
@@ -4216,6 +4224,9 @@ listPartitionedTables(const char *reltypes, const char 
*pattern, bool verbose)
 
if (verbose)
{
+   appendPQExpBufferStr(,
+"\n LEFT JOIN 
pg_catalog.pg_am am ON c.relam = am.oid");
+
if (pset.sversion < 12)
{
appendPQExpBufferStr(,
-- 
2.42.0





Re: Reading timestamp values from Datums gives garbage values

2024-05-21 Thread Sushrut Shivaswamy
Thank you everyone for your responses.

I was a bit thrown off by the timestamp value the first time I printed it
by how small it was.
The revelation that postgres TimestampTz uses an epoch (time zero) of
2000-01-01 helped clarify
that value would indeed be smaller than regular UNIX epoch.

In my case I was trying to convert a diff of two timestamps into epoch
seconds which explains why the value
was just 1.5hr. My issue is now resolved.

Thanks again
 - Sushrut


Re: Avoid orphaned objects dependencies, take 3

2024-05-21 Thread Robert Haas
On Wed, May 15, 2024 at 6:31 AM Bertrand Drouvot
 wrote:
> Please find attached v6 (only diff with v5 is moving the tests as suggested
> above).

I don't immediately know what to think about this patch. I've known
about this issue for a long time, but I didn't think this was how we
would fix it.

I think the problem here is basically that we don't lock namespaces
(schemas) when we're adding and removing things from the schema. So I
assumed that if we ever did something about this, what we would do
would be add a bunch of calls to lock schemas to the appropriate parts
of the code. What you've done here instead is add locking at a much
lower level - whenever we are adding a dependency on an object, we
lock the object. The advantage of that approach is that we definitely
won't miss anything. The disadvantage of that approach is that it
means we have some very low-level code taking locks, which means it's
not obvious which operations are taking what locks. Maybe it could
even result in some redundancy, like the higher-level code taking a
lock also (possibly in a different mode) and then this code taking
another one.

I haven't gone through the previous threads; it sounds like there's
already been some discussion of this, but I'm just telling you how it
strikes me on first look.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

2024-05-21 Thread David Rowley
On Wed, 22 May 2024 at 00:35, Alvaro Herrera  wrote:
>
> On 2024-May-21, David Rowley wrote:
>
> > I've attached 2 patches.
> >
> > 0001 is a simple revert of Tom's revert (7204f3591).
> > 0002 fixes the issue reported by Hubert.
>
> I would like to request that you don't keep 0001's message as you have
> it here.  It'd be more readable to take 66c0185a3d14's whole commit
> message with a small suffix like "try 2" in the commit title, and add an
> additional second paragraph stating it was transiently reverted by
> 7204f35919b7.  Otherwise it's harder to make sense of the commit on its
> own later.

Thanks for having a look.  I was planning to have the commit message
as per attached. I'd only split the patch for ease of review per
request of Tom. I should have mentioned that here.

I would adjust the exact wording in the final paragraph as required
depending on what plan materialises.

This also fixes up the comment stuff that Heikki mentioned.

David


v2-0001-Allow-planner-to-use-Merge-Append-to-efficiently-.patch
Description: Binary data


Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

2024-05-21 Thread Alvaro Herrera
On 2024-May-21, David Rowley wrote:

> I've attached 2 patches.
> 
> 0001 is a simple revert of Tom's revert (7204f3591).
> 0002 fixes the issue reported by Hubert.

I would like to request that you don't keep 0001's message as you have
it here.  It'd be more readable to take 66c0185a3d14's whole commit
message with a small suffix like "try 2" in the commit title, and add an
additional second paragraph stating it was transiently reverted by
7204f35919b7.  Otherwise it's harder to make sense of the commit on its
own later.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: libpq compression (part 3)

2024-05-21 Thread Robert Haas
On Mon, May 20, 2024 at 4:12 PM Magnus Hagander  wrote:
> That used to be the case in HTTP/1. But header compression was one of the 
> headline features of HTTP/2, which isn't exactly new anymore. But there's a 
> special algorithm, HPACK, for it. And then http/3 uses QPACK. Cloudflare has 
> a pretty decent blog post explaining why and how: 
> https://blog.cloudflare.com/hpack-the-silent-killer-feature-of-http-2/, or 
> rfc7541 for all the details.
>
> tl;dr; is yes, let's be careful not to expose headers to a CRIME-style 
> attack. And I doubt our connections has as much to gain by compressing 
> "header style" fields as http, so we are probably better off just not 
> compressing >  Work: https://www.redpill-linpro.com/

What do you think constitutes a header in the context of the
PostgreSQL wire protocol?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Convert node test compile-time settings into run-time parameters

2024-05-21 Thread Ranier Vilela
Em ter., 21 de mai. de 2024 às 09:25, Peter Eisentraut 
escreveu:

> On 20.05.24 15:59, Tom Lane wrote:
> > Peter Eisentraut  writes:
> >> This patch converts the compile-time settings
> >>   COPY_PARSE_PLAN_TREES
> >>   WRITE_READ_PARSE_PLAN_TREES
> >>   RAW_EXPRESSION_COVERAGE_TEST
> >
> >> into run-time parameters
> >
> >>   debug_copy_parse_plan_trees
> >>   debug_write_read_parse_plan_trees
> >>   debug_raw_expression_coverage_test
> >
> > I'm kind of down on this.  It seems like forcing a bunch of
> > useless-in-production debug support into the standard build.
> > What of this would be of any use to any non-developer?
>
> We have a bunch of other debug_* settings that are available in
> production builds, such as
>
> debug_print_parse
> debug_print_rewritten
> debug_print_plan
> debug_pretty_print
> debug_discard_caches
> debug_io_direct
> debug_parallel_query
> debug_logical_replication_streaming
>
If some of this is useful for non-developer users,
it shouldn't be called debug, or in this category.


> Maybe we could hide all of them behind some #ifdef DEBUG_OPTIONS, but in
> any case, I don't think the ones being proposed here are substantially
> different from those existing ones that they would require a separate
> treatment.
>
> My goal is to make these facilities easier to use, avoiding hand-editing
> pg_config_manual.h and having to recompile.
>
Although there are some developer users.
I believe that anything that is not useful for common users and is not used
for production
should not be compiled at runtime.

best regards,
Ranier Vilela


Re: Convert node test compile-time settings into run-time parameters

2024-05-21 Thread Peter Eisentraut

On 20.05.24 15:59, Tom Lane wrote:

Peter Eisentraut  writes:

This patch converts the compile-time settings
  COPY_PARSE_PLAN_TREES
  WRITE_READ_PARSE_PLAN_TREES
  RAW_EXPRESSION_COVERAGE_TEST



into run-time parameters



  debug_copy_parse_plan_trees
  debug_write_read_parse_plan_trees
  debug_raw_expression_coverage_test


I'm kind of down on this.  It seems like forcing a bunch of
useless-in-production debug support into the standard build.
What of this would be of any use to any non-developer?


We have a bunch of other debug_* settings that are available in 
production builds, such as


debug_print_parse
debug_print_rewritten
debug_print_plan
debug_pretty_print
debug_discard_caches
debug_io_direct
debug_parallel_query
debug_logical_replication_streaming

Maybe we could hide all of them behind some #ifdef DEBUG_OPTIONS, but in 
any case, I don't think the ones being proposed here are substantially 
different from those existing ones that they would require a separate 
treatment.


My goal is to make these facilities easier to use, avoiding hand-editing 
pg_config_manual.h and having to recompile.






Re: CREATE DATABASE with filesystem cloning

2024-05-21 Thread Ranier Vilela
Em ter., 21 de mai. de 2024 às 05:18, Nazir Bilal Yavuz 
escreveu:

> Hi,
>
> On Thu, 16 May 2024 at 17:47, Robert Haas  wrote:
> >
> > On Thu, May 16, 2024 at 9:43 AM Nazir Bilal Yavuz 
> wrote:
> > > Actually, the documentation for the file_copy_method was mentioning
> > > the things it controls; I converted it to an itemized list now. Also,
> > > changed the comment to: 'Further uses of this function requires
> > > updates to the list that GUC controls in its documentation.'. v7 is
> > > attached.
> >
> > I think the comments need some wordsmithing.
>
> I changed it to 'Uses of this function must be documented in the list
> of places affected by this GUC.', I am open to any suggestions.
>
> > I don't see why this parameter should be PGC_POSTMASTER.
>
> I changed it to 'PGC_USERSET' now. My initial point was the database
> or tablespace to be copied with the same method. I thought copying
> some portion of the database with the copy and rest with the clone
> could cause potential problems. After a bit of searching, I could not
> find any problems related to that.
>
> v8 is attached.
>
Hi,
I did some research on the subject and despite being an improvement,
I believe that some terminologies should be changed in this patch.
Although the new function is called *clone_file*, I'm not sure if it really
is "clone".
Why MacOS added an API called clonefile [1] and Linux exists
another called FICLONE.[2]
So perhaps it should be treated here as a copy and not a clone?
Leaving it open, is the possibility of implementing a true clone api?
Thoughts?

best regards,
Ranier Vilela

[1] clonefile 
[2] ioctl_ficlonerange



>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>


Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

2024-05-21 Thread Heikki Linnakangas

On 21/05/2024 05:58, David Rowley wrote:

Let this thread be for at least the coding portion of this or be my
thread for this patch for the v18 cycle if the RMT rules in favour of
keeping that code reverted for v17.

I've attached 2 patches.

0001 is a simple revert of Tom's revert (7204f3591).
0002 fixes the issue reported by Hubert.

If anyone wants to have a look, I'd be grateful for that.  Tom did
call for further review after this being the 4th issue reported for
66c0185a3.


My planner experience is a bit rusty, but I took a quick look. Looks 
generally OK to me. Some comments below:



+   /* For for UNIONs (not UNION ALL), try sorting, if sorting is possible 
*/


Duplicated word: "For for"


/*
 * build_setop_child_paths
 *  Build paths for the set op child relation denoted by 'rel'.
 *
 * interesting_pathkeys: if not NIL, also include paths that suit these
 * pathkeys, sorting any unsorted paths as required.
 * *pNumGroups: if not NULL, we estimate the number of distinct groups
 *  in the result, and store it there


The indentation on 'interesting_pathkeys' and '*pNumGroups' is inconsistent.

I have a vague feeling that this comment deserves to be longer. The 
function does a lot of things. How is 'child_tlist' different from 
rel->reltarget for example?


'interesting_pathkeys' is modified by the call to 
add_setop_child_rel_equivalences(): it adds members to the 
EquivalenceClasses of the pathkeys. Is that worth mentioning here, or is 
that obvious to someone who know more about the planner?



/*
 * Create paths to suit final sort order required for 
setop_pathkeys.
 * Here we'll sort the cheapest input path (if not sorted 
already) and
 * incremental sort any paths which are partially sorted.
 */
is_sorted = pathkeys_count_contained_in(setop_pathkeys,
   
 subpath->pathkeys,

_keys);

if (!is_sorted)
{


Maybe also mention that if it's already sorted, it's used as is.

BTW, could the same machinery be used for INTERSECT as well? There was a 
brief mention of that in the original thread, but I didn't understand 
the details. Not for v17, but I'm curious. I was wondering if 
build_setop_child_paths() should be named build_union_child_paths(), 
since it's only used with UNIONs, but I guess it could be used as is for 
INTERSECT too.



# Testing

postgres=# begin; create table foo as select i from generate_series(1, 
100) i; create index on foo (i); commit;

BEGIN
SELECT 100
CREATE INDEX
COMMIT
postgres=# set enable_seqscan=off;
SET
postgres=# explain (select 1 as i union select i from foo) order by i;
  QUERY PLAN 


--
 Unique  (cost=144370.89..149370.89 rows=101 width=4)
   ->  Sort  (cost=144370.89..146870.89 rows=101 width=4)
 Sort Key: (1)
 ->  Append  (cost=0.00..31038.44 rows=101 width=4)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
   ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=100 width=4)

(6 rows)

I'm disappointed it couldn't produce a MergeAppend plan. If you replace 
the "union" with "union all" you do get a MergeAppend.


Some more cases where I hoped for a MergeAppend:

postgres=# explain (select i, 'foo' from foo union select i, 'foo' from 
foo) order by 1;
 QUERY PLAN 


-
 Unique  (cost=380767.54..395767.54 rows=200 width=36)
   ->  Sort  (cost=380767.54..385767.54 rows=200 width=36)
 Sort Key: foo.i, ('foo'::text)
 ->  Append  (cost=0.42..62076.85 rows=200 width=36)
   ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=100 width=36)
   ->  Index Only Scan using foo_i_idx on foo foo_1 
(cost=0.42..26038.42 rows=100 width=36)

(6 rows)


postgres=# explain (select 'foo', i from foo union select 'bar', i from 
foo) order by 1;
 QUERY PLAN 


-
 Unique  (cost=380767.54..395767.54 rows=200 width=36)
   ->  Sort  (cost=380767.54..385767.54 rows=200 width=36)
 Sort Key: ('foo'::text), foo.i
 ->  Append  (cost=0.42..62076.85 rows=200 width=36)
   ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=100 width=36)
   ->  Index Only Scan using foo_i_idx on 

Re: Injection points: preloading and runtime arguments

2024-05-21 Thread Andrey M. Borodin



> On 21 May 2024, at 06:31, Michael Paquier  wrote:
> 
> So I agree that 0002 ought to call injection_init_shmem() when calling
> injection_points_preload(), but it also seems to me that the test is
> missing the fact that it should heat the backend cache to avoid the
> allocations in the critical sections.
> 
> Note that I disagree with taking a shortcut in the backend-side
> injection point code where we would bypass CritSectionCount or
> allowInCritSection.  These states should stay consistent for the sake
> of the callbacks registered so as these can rely on the same stack and
> conditions as the code where they are called.

Currently I'm working on the test using this
$creator->query_until(qr/start/, q(
\echo start
select injection_points_wakeup('');
select test_create_multixact();
));

I'm fine if instead of injection_points_wakeup('') I'll have to use select 
injection_points_preload('point name');.

Thanks!


Best regards, Andrey Borodin.





Feature request: limiting isolation level choices

2024-05-21 Thread Tihrd Reed
Hello,

Apologies if this is the wrong place to make such a request.

It would be useful to have the ability to prevent clients from setting whatever 
isolation levels they choose. Specifically, it would be desirable to enforce 
SERIALIZABLE for all transactions since the serializable guarantee only holds 
if other transactions also use the isolation level.

EdgeDB which is built on Postgres only allows the SERIALIZABLE isolation level, 
for example.

Thanks




Re: I have an exporting need...

2024-05-21 Thread Juan Hernández
Hi team!

I read all your comments and this leads me to learn more.

For me and my case would be useful, even there are other ways to solve
this, but I may be wrong and just have to learn more about maintenance,
backup and recovery tasks.

What if when --separatetables clause is used, table definition and data are
exported. Indexes, foreign keys and relations declarations are exported
too, but commented, with an advice. Just an idea.

Thank you all and best regards!

Juan de Jesús



El mar, 14 may 2024 a las 10:54, Tom Lane () escribió:

> Heikki Linnakangas  writes:
> > On 13/05/2024 16:01, Juan Hernández wrote:
> >> Do you consider useful to add a parameter (for example,
> >> --separatetables) so when used the exporting file process can create a
> >> different tablename.sql file for each table in database automatically?
>
> > It'd be tricky to restore from, as you need to restore the tables in the
> > right order. I think you'd still need a "main" sql file that includes
> > all the other files in the right order. And using the table names as
> > filenames gets tricky if the table names contain any funny characters.
>
> It's a lot worse than that, as it's entirely possible to have circular
> FK dependencies, meaning there is no "right order" if you think of
> each table file as self-contained DDL plus data.  Other sorts of
> circularities are possible too.
>
> pg_dump deals with that hazard by splitting things up: first create
> all the tables, then load all the data, then create all the indexes
> and foreign keys.  You can tell it to just emit the parts relevant to
> a particular table, but it's on your head whether that's actually
> going to be useful in your context.  I doubt that it's widely enough
> useful to justify creating a special mode beyond what we already
> have.
>
> regards, tom lane
>


Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-21 Thread John Naylor
On Mon, May 20, 2024 at 8:41 PM Masahiko Sawada  wrote:
>
> On Mon, May 20, 2024 at 8:47 PM Jonathan S. Katz  wrote:
> >
> > On 5/20/24 2:58 AM, John Naylor wrote:
> > > Hi Jon,
> > >
> > > Regarding vacuum "has shown up to a 6x improvement in overall time to
> > > complete its work" -- I believe I've seen reported numbers close to
> > > that only 1) when measuring the index phase in isolation or maybe 2)
> > > the entire vacuum of unlogged tables with one, perfectly-correlated
> > > index (testing has less variance with WAL out of the picture). I
> > > believe tables with many indexes would show a lot of improvement, but
> > > I'm not aware of testing that case specifically. Can you clarify where
> > > 6x came from?
> >
> > Sawada-san showed me the original context, but I can't rapidly find it
> > in the thread. Sawada-san, can you please share the numbers behind this?
> >
>
> I referenced the numbers that I measured during the development[1]
> (test scripts are here[2]). IIRC I used unlogged tables and indexes,
> and these numbers were the entire vacuum execution time including heap
> scanning, index vacuuming and heap vacuuming.

Thanks for confirming.

The wording "has a new internal data structure that reduces memory
usage and has shown up to a 6x improvement in overall time to complete
its work" is specific for runtime, and the memory use is less
specific. Unlogged tables are not the norm, so I'd be cautious of
reporting numbers specifically designed (for testing) to isolate the
thing that changed.

I'm wondering if it might be both more impressive-sounding and more
realistic for the average user experience to reverse that: specific on
memory, and less specific on speed. The best-case memory reduction
occurs for table update patterns that are highly localized, such as
the most recently inserted records, and I'd say those are a lot more
common than the use of unlogged tables.

Maybe something like "has a new internal data structure that reduces
overall time to complete its work and can use up to 20x less memory."

Now, it is true that when dead tuples are sparse and evenly spaced
(e.g. 1 every 100 pages), vacuum can now actually use more memory than
v16. However, the nature of that scenario also means that the number
of TIDs just can't get very big to begin with. In contrast, while the
runtime improvement for normal (logged) tables is likely not
earth-shattering, I believe it will always be at least somewhat
faster, and never slower.




Re: Reading timestamp values from Datums gives garbage values

2024-05-21 Thread Aleksander Alekseev
Hi,

> When trying to read the query response from the Datum, I get garbage values.
> I've tried various types and none of them read the correct value.
> ```
>
> Datum current_timestamp = SPI_getbinval(SPI_tuptable->vals[i], 
> SPI_tuptable->tupdesc, 5, );
>
> double current_time = DatumGetFloat8(current_timestamp); // prints 0
>
> int64 time = DatumGetUint64(current_timestamp); // prints 5293917674
>
> ```
>
> Can you help me out with the correct way to read EPOCH values from datums?

I don't entirely understand why you are using DatumGetFloat8() /
DatumGetUint64() and double / int64 types. There are
DatumGetTimestamp() / DatumGetTimestampTz() and Timestamp /
TimestampTz.

I recommend using the PostgreSQL code as a source of more examples of
how to deal with the given types. The file pg_proc.dat is a good entry
point. See also commit 260a1f18 [1] and PostgreSQL documentation [2].

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=260a1f18
[2]: https://www.postgresql.org/docs/16/xfunc-c.html

-- 
Best regards,
Aleksander Alekseev




Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Dave Page
Hi Sandeep, Nazir,

On Tue, 21 May 2024 at 10:14, Nazir Bilal Yavuz  wrote:

> Hi,
>
> On Tue, 21 May 2024 at 10:20, Sandeep Thakkar
>  wrote:
> >
> > Hi Dave,
> >
> > Is the .pc file generated after the successful build of zlib? If yes,
> then meson should be able to detect the installation ideally
>
> If meson is not able to find the .pc file automatically, using 'meson
> setup ... --pkg-config-path $ZLIB_PC_PATH' might help.
>

The problem is that on Windows there are no standard locations for a
Unix-style development library installation such as this, so the chances
are that the .pc file will point to entirely the wrong location.

For example, please see
https://github.com/dpage/winpgbuild/actions/runs/9172187335 which is a
Github action that builds a completely vanilla zlib using VC++. If you look
at the uploaded artefact containing the build output and example the .pc
file, you'll see it references /zlib as the location, which is simply where
I built it in that action. On a developer's machine that's almost certainly
not going to be where it actually ends up. For example, on the pgAdmin
build farm, the dependencies all end up in C:\build64\[whatever]. On the
similar Github action I'm building for PostgreSQL, that artefact will be
unpacked into /build/zlib.

Of course, for my own builds I can easily make everything use consistent
directories, however most people who are likely to want to build PostgreSQL
may not want to also build all the dependencies themselves as well, as some
are a lot more difficult than zlib. So what tends to happen is people find
third party builds or upstream official builds.

I would therefore argue that if the .pc file that's found doesn't provide
correct paths for us, then Meson should fall back to searching in the paths
specified on its command line for the appropriate libraries/headers (which
is what it does for OpenSSL for example, as that doesn't include a .pc
file). This is also what happens with PG16 and earlier.

One other thing I will note is that PG16 and earlier try to use the wrong
filename for the import library. For years, it's been a requirement to do
something like this: "copy \zlib\lib\zlib.lib \zlib\lib\zdll.lib" to make a
build succeed against a "vanilla" zlib build. I haven't got as far as
figuring out if the same is true with Meson yet.

-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com


How to declare GIN index on array type column when bootstrap?

2024-05-21 Thread Zhang Mingli
 Hi, all

I want to add some columns of int(or Oid) array and declare GIN index for it in 
catalog when bootstrap.

But current catalogs all use btree, I tried to declare a GIN index but failed, 
ex:
pg_class.h
```
CATALOG(pg_class
...
Int32   my_new_column[1] BKI_DEFAULT(_null_);
...
} FormData_pg_class;

DECLARE_INDEX(pg_class_my_index, 7200, on pg_class using gin(my_new_column 
array_ops));
#define ClassMYIndexId 7200
```
But this failed when init in heap_form_tuple().

I could use SQL to create GIN index column for user tables.

But is it possible to declare array column with GIN index in catalog when 
bootstrap?

Thanks.


Zhang Mingli
www.hashdata.xyz


Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Nazir Bilal Yavuz
Hi,

On Tue, 21 May 2024 at 10:20, Sandeep Thakkar
 wrote:
>
> Hi Dave,
>
> Is the .pc file generated after the successful build of zlib? If yes, then 
> meson should be able to detect the installation ideally

If meson is not able to find the .pc file automatically, using 'meson
setup ... --pkg-config-path $ZLIB_PC_PATH' might help.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: POC: GROUP BY optimization

2024-05-21 Thread Andrei Lepikhov

On 20/5/2024 15:54, jian he wrote:

As mentioned previously,
both A and B can use Incremental Sort, set_cheapest will choose the A
instead of B,
because A step generated path **first** satisfies increment sort.

Yeah, I agree with your analysis.
Looking into the cost_incremental_sort, I see that we have ten 
group_tuples. This value doesn't depend on how many presorted keys (1 or 
2) we use. This is caused by default estimation.
Given the current circumstances, it seems unlikely that we can make any 
significant changes without introducing a new sort cost model that 
accounts for the number of sorted columns.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: CREATE DATABASE with filesystem cloning

2024-05-21 Thread Nazir Bilal Yavuz
Hi,

On Thu, 16 May 2024 at 17:47, Robert Haas  wrote:
>
> On Thu, May 16, 2024 at 9:43 AM Nazir Bilal Yavuz  wrote:
> > Actually, the documentation for the file_copy_method was mentioning
> > the things it controls; I converted it to an itemized list now. Also,
> > changed the comment to: 'Further uses of this function requires
> > updates to the list that GUC controls in its documentation.'. v7 is
> > attached.
>
> I think the comments need some wordsmithing.

I changed it to 'Uses of this function must be documented in the list
of places affected by this GUC.', I am open to any suggestions.

> I don't see why this parameter should be PGC_POSTMASTER.

I changed it to 'PGC_USERSET' now. My initial point was the database
or tablespace to be copied with the same method. I thought copying
some portion of the database with the copy and rest with the clone
could cause potential problems. After a bit of searching, I could not
find any problems related to that.

v8 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft
From 54dac96e493f482581c09ec14e67196e73e371ca Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Tue, 21 May 2024 10:19:29 +0300
Subject: [PATCH v8] Introduce file_copy_method GUC

This GUC can be set to either COPY (default) or CLONE (if system
supports it).

If CLONE method is chosen, similar to COPY; but attempting to use
efficient file copying system calls.  The kernel has the opportunity to
share block ranges in copy-on-write file systems, or maybe push down
the copy to network file systems and storage devices.

Currently works on Linux, FreeBSD and macOS.  More systems could be
supported.

Author: Thomas Munro 
Author: Nazir Bilal Yavuz 
Reviewed-by: Robert Haas 
Reviewed-by: Ranier Vilela 
Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com
---
 src/include/storage/copydir.h |  9 ++
 src/backend/storage/file/copydir.c| 98 ++-
 .../utils/activity/wait_event_names.txt   |  1 +
 src/backend/utils/misc/guc_tables.c   | 12 +++
 src/backend/utils/misc/postgresql.conf.sample |  4 +
 doc/src/sgml/config.sgml  | 46 +
 doc/src/sgml/ref/alter_database.sgml  |  3 +-
 doc/src/sgml/ref/create_database.sgml |  4 +-
 src/tools/pgindent/typedefs.list  |  1 +
 9 files changed, 175 insertions(+), 3 deletions(-)

diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index a25e258f479..6edc3ea4f69 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -13,6 +13,15 @@
 #ifndef COPYDIR_H
 #define COPYDIR_H
 
+typedef enum FileCopyMethod
+{
+	FILE_COPY_METHOD_COPY,
+	FILE_COPY_METHOD_CLONE,
+} FileCopyMethod;
+
+/* GUC parameters */
+extern PGDLLIMPORT int file_copy_method;
+
 extern void copydir(const char *fromdir, const char *todir, bool recurse);
 extern void copy_file(const char *fromfile, const char *tofile);
 
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index d4fbe542077..6d8f05199ca 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -21,17 +21,42 @@
 #include 
 #include 
 
+#ifdef HAVE_COPYFILE_H
+#include 
+#endif
+
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
+#include "utils/guc.h"
+
+/* GUCs */
+int			file_copy_method = FILE_COPY_METHOD_COPY;
+
+/*
+ * GUC support
+ */
+const struct config_enum_entry file_copy_method_options[] = {
+	{"copy", FILE_COPY_METHOD_COPY, false},
+#if defined(HAVE_COPYFILE) && defined(COPYFILE_CLONE_FORCE) || defined(HAVE_COPY_FILE_RANGE)
+	{"clone", FILE_COPY_METHOD_CLONE, false},
+#endif
+	{NULL, 0, false}
+};
+
+static void clone_file(const char *fromfile, const char *tofile);
 
 /*
  * copydir: copy a directory
  *
  * If recurse is false, subdirectories are ignored.  Anything that's not
  * a directory or a regular file is ignored.
+ *
+ * This function uses a file_copy_method GUC to determine copy method.
+ * Uses of this function must be documented in the list of places
+ * affected by this GUC.
  */
 void
 copydir(const char *fromdir, const char *todir, bool recurse)
@@ -71,7 +96,12 @@ copydir(const char *fromdir, const char *todir, bool recurse)
 copydir(fromfile, tofile, true);
 		}
 		else if (xlde_type == PGFILETYPE_REG)
-			copy_file(fromfile, tofile);
+		{
+			if (file_copy_method == FILE_COPY_METHOD_CLONE)
+clone_file(fromfile, tofile);
+			else
+copy_file(fromfile, tofile);
+		}
 	}
 	FreeDir(xldir);
 
@@ -214,3 +244,69 @@ copy_file(const char *fromfile, const char *tofile)
 
 	pfree(buffer);
 }
+
+/*
+ * clone one file
+ */
+static void
+clone_file(const char *fromfile, const char *tofile)
+{
+#if defined(HAVE_COPYFILE) && defined(COPYFILE_CLONE_FORCE)
+	if (copyfile(fromfile, tofile, NULL, COPYFILE_CLONE_FORCE) < 0)
+		ereport(ERROR,
+(errcode_for_file_access(),

Re: Avoid orphaned objects dependencies, take 3

2024-05-21 Thread Bertrand Drouvot
Hi,

On Sun, May 19, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote:
> Hello Bertrand,
> 
> Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1]
> and [2], as these errors are not that abnormal (not Assert-like).
> 
> [1] https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa%40paquier.xyz
> [2] https://commitfest.postgresql.org/48/4735/
> 

Thanks for mentioning the above examples, I agree that it's worth to avoid
ERRCODE_INTERNAL_ERROR here: please find attached v7 that makes use of a new
ERRCODE: ERRCODE_DEPENDENT_OBJECTS_DOES_NOT_EXIST 

I thought about this name as it is close enough to the already existing 
"ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST" but I'm open to suggestion too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 79809f88311ef1cf4e8f3250e1508fe0b7d86602 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 29 Mar 2024 15:43:26 +
Subject: [PATCH v7] Avoid orphaned objects dependencies

It's currently possible to create orphaned objects dependencies, for example:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP)
has been put in place when the dependencies are being recorded. With this in
place, the drop schema in scenario 2 would be locked.

Also, after the new lock attempt, the patch checks that the object still exists:
with this in place session 2 in scenario 1 would be locked and would report an
error once session 1 committs (that would not be the case should session 1 abort
the transaction).

If the object is dropped before the new lock attempt is triggered then the patch
would also report an error (but with less details).

The patch adds a few tests for some dependency cases (that would currently produce
orphaned objects):

- schema and function (as the above scenarios)
- alter a dependency (function and schema)
- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper
---
 src/backend/catalog/dependency.c  |  58 
 src/backend/catalog/objectaddress.c   |  70 ++
 src/backend/catalog/pg_depend.c   |  12 ++
 src/backend/utils/errcodes.txt|   1 +
 src/include/catalog/dependency.h  |   1 +
 src/include/catalog/objectaddress.h   |   1 +
 .../expected/test_dependencies_locks.out  | 129 ++
 src/test/isolation/isolation_schedule |   1 +
 .../specs/test_dependencies_locks.spec|  89 
 9 files changed, 362 insertions(+)
  24.6% src/backend/catalog/
  45.1% src/test/isolation/expected/
  28.1% src/test/isolation/specs/

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index d4b5b2ade1..89b2f18f98 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1517,6 +1517,64 @@ AcquireDeletionLock(const ObjectAddress *object, int flags)
 	}
 }
 
+/*
+ * depLockAndCheckObject
+ *
+ * Lock the object that we are about to record a dependency on.
+ * After it's locked, verify that it hasn't been dropped while we
+ * weren't looking.  If the object has been dropped, this function
+ * does not return!
+ */
+void
+depLockAndCheckObject(const ObjectAddress *object)
+{
+	char	   *object_description;
+
+	/*
+	 * Those don't rely on LockDatabaseObject() when being dropped (see
+	 * AcquireDeletionLock()). Also it looks like they can not produce
+	 * orphaned dependent objects when being dropped.
+	 */
+	if (object->classId == RelationRelationId || object->classId == AuthMemRelationId)
+		return;
+
+	object_description = getObjectDescription(object, true);
+
+	/* assume we should lock the whole object not a sub-object */
+	LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock);
+
+	/* check if object still exists */
+	if (!ObjectByIdExist(object, false))
+	{
+		/*
+		 * It might be possible that we are creating it (for example creating
+		 * a composite type while creating a relation), so bypass the syscache
+		 * lookup and use a dirty snaphot instead to cover this scenario.
+		 */
+		if (!ObjectByIdExist(object, true))
+		{
+			/*
+			 * If the object has been dropped before we get a chance to get
+			 * its description, then emit a generic error message. That looks
+			 * like a good compromise over extra complexity.
+			 */
+			if (object_description)
+ereport(ERROR,
+		(errcode(ERRCODE_DEPENDENT_OBJECTS_DOES_NOT_EXIST),
+		 errmsg("%s 

Re: zlib detection in Meson on Windows broken?

2024-05-21 Thread Sandeep Thakkar
Hi Dave,

Is the .pc file generated after the successful build of zlib? If yes, then
meson should be able to detect the installation ideally

On Mon, May 20, 2024 at 4:28 PM Dave Page  wrote:

> Hi
>
> I'm working on updating the build of PostgreSQL that pgAdmin uses in its
> Windows installers to use Meson ready for the v17 release. I'm using Visual
> Studio 2022, on Windows Server 2022.
>
> I've been unable to persuade Meson to detect zlib, whilst OpenSSL seems to
> be fine.
>
> The dependencies have been built and installed as follows:
>
>  mkdir c:\build64
>
>  wget https://zlib.net/zlib-1.3.2.tar.gz
>  tar -zxvf zlib-1.3.2.tar.gz
>  cd zlib-1.3.2
>  cmake -DCMAKE_INSTALL_PREFIX=C:/build64/zlib -G "Visual Studio 17 2022" .
>  msbuild ALL_BUILD.vcxproj /p:Configuration=Release
>  msbuild RUN_TESTS.vcxproj /p:Configuration=Release
>  msbuild INSTALL.vcxproj /p:Configuration=Release
>  cd ..
>
>  wget https://www.openssl.org/source/openssl-3.0.13.tar.gz
>  tar -zxvf openssl-3.0.13.tar.gz
>  cd openssl-3.0.013
>  perl Configure VC-WIN64A no-asm --prefix=C:\build64\openssl no-ssl3
> no-comp
>  nmake
>  nmake test
>  nmake install
>  cd ..
>
> This results in the following headers and libraries being installed for
> zlib:
>
> C:\Users\dpage\git\postgresql>dir C:\build64\zlib\include
>  Volume in drive C has no label.
>  Volume Serial Number is 3AAD-5864
>
>  Directory of C:\build64\zlib\include
>
> 17/05/2024  15:56  .
> 17/05/2024  15:56  ..
> 17/05/2024  15:5417,096 zconf.h
> 22/01/2024  19:3296,829 zlib.h
>2 File(s)113,925 bytes
>2 Dir(s)  98,842,726,400 bytes free
>
> C:\Users\dpage\git\postgresql>dir C:\build64\zlib\lib
>  Volume in drive C has no label.
>  Volume Serial Number is 3AAD-5864
>
>  Directory of C:\build64\zlib\lib
>
> 17/05/2024  17:01  .
> 17/05/2024  15:56  ..
> 17/05/2024  15:5516,638 zlib.lib
> 17/05/2024  15:55   184,458 zlibstatic.lib
>2 File(s)201,096 bytes
>2 Dir(s)  98,842,726,400 bytes free
>
> I then attempt to build PostgreSQL:
>
>  meson setup build
> -Dextra_include_dirs=C:/build64/openssl/include,C:/build64/zlib/include
> -Dextra_lib_dirs=C:/build64/openssl/lib,C:/build64/zlib/lib -Dssl=openssl
> -Dzlib=enabled --prefix=c:/build64/pgsql
>
> Which results in the output in output.txt, indicating that OpenSSL was
> correctly found, but zlib was not. I've also attached the meson log.
>
> I have very little experience with Meson, and even less interpreting it's
> logs, but it seems to me that it's not including the extra lib and include
> directories when it runs the test compile, given the command line it's
> reporting:
>
> cl
> C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c
> /nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od /Oi-
>
> Bug, or am I doing something silly?
>
>
> --
> Dave Page
> pgAdmin: https://www.pgadmin.org
> PostgreSQL: https://www.postgresql.org
> EDB: https://www.enterprisedb.com
>
>

-- 
Sandeep Thakkar


Re: Pgoutput not capturing the generated columns

2024-05-21 Thread Peter Smith
Hi,

AFAICT this v2-0001 patch differences from v1 is mostly about adding
the new CREATE SUBSCRIPTION option. Specifically, I don't think it is
addressing any of my previous review comments for patch v1. [1]. So
these comments below are limited only to the new option code; All my
previous review comments probably still apply.

==
Commit message

1. (General)
The commit message is seriously lacking background explanation to describe:
- What is the current behaviour w.r.t. generated columns
- What is the problem with the current behaviour?
- What exactly is this patch doing to address that problem?

~

2.
New option generated_option is added in create subscription. Now if this
option is specified as 'true' during create subscription, generated
columns in the tables, present in publisher (to which this subscription is
subscribed) can also be replicated.

-

2A.
"generated_option" is not the name of the new option.

~

2B.
"create subscription" stmt should be UPPERCASE; will also be more
readable if the option name is quoted.

~

2C.
Needs more information like under what condition is this option ignored etc.

==
doc/src/sgml/ref/create_subscription.sgml

3.
+   
+generated-column (boolean)
+
+ 
+  Specifies whether the generated columns present in the tables
+  associated with the subscription should be replicated. The default is
+  false.
+ 
+
+ 
+  This parameter can only be set true if copy_data is set to false.
+  This option works fine when a generated column (in
publisher) is replicated to a
+  non-generated column (in subscriber). Else if it is
replicated to a generated
+  column, it will ignore the replicated data and fill the
column with computed or
+  default data.
+ 
+
+   

3A.
There is a typo in the name "generated-column" because we should use
underscores (not hyphens) for the option names.

~

3B.
This it is not a good option name because there is no verb so it
doesn't mean anything to set it true/false -- actually there IS a verb
"generate" but we are not saying generate = true/false, so this name
is also quite confusing.

I think "include_generated_columns" would be much better, but if
others think that name is too long then maybe "include_generated_cols"
or "include_gen_cols" or similar. Of course, whatever if the final
decision should be propagated same thru all the code comments, params,
fields, etc.

~

3C.
copy_data and false should be marked up as  fonts in the sgml

~

3D.

Suggest re-word this part. Don't need to explain when it "works fine".

BEFORE
This option works fine when a generated column (in publisher) is
replicated to a non-generated column (in subscriber). Else if it is
replicated to a generated column, it will ignore the replicated data
and fill the column with computed or default data.

SUGGESTION
If the subscriber-side column is also a generated column then this
option has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

==
src/backend/commands/subscriptioncmds.c

4. AlterSubscription
SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
SUBOPT_PASSWORD_REQUIRED |
SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-   SUBOPT_ORIGIN);
+   SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN);

Hmm. Is this correct? If ALTER is not allowed (later in this patch
there is a message "toggling generated_column option is not allowed."
then why are we even saying that SUBOPT_GENERATED_COLUMN is a
support_opt for ALTER?

~~~

5.
+ if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("toggling generated_column option is not allowed.")));
+ }

5A.
I suspect this is not even needed if the 'supported_opt' is fixed per
the previous comment.

~

5B.
But if this message is still needed then I think it should say "ALTER
is not allowed" (not "toggling is not allowed") and also the option
name should be quoted as per the new guidelines for error messages.

==
src/backend/replication/logical/proto.c


6. logicalrep_write_tuple

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+

Calling column_in_column_list() might be a more expensive operation
than checking just generated columns flag so maybe reverse the order
and check the generated columns first for a tiny performance gain.

~~

7.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;

ditto #6

~~

8. logicalrep_write_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if 

Re: using extended statistics to improve join estimates

2024-05-21 Thread Andrei Lepikhov

On 5/20/24 16:40, Andrei Lepikhov wrote:

On 20/5/2024 15:52, Andy Fan wrote:

+    if (clauselist_selectivity_hook)
+    *return* clauselist_selectivity_hook(root, clauses, ..)
Of course - library may estimate not all the clauses - it is a reason, 
why I added input/output parameter 'estimatedclauses' by analogy with 
statext_clauselist_selectivity.

Here is a polished and a bit modified version of the hook proposed.
Additionally, I propose exporting the statext_mcv_clauselist_selectivity 
routine, likewise dependencies_clauselist_selectivity. This could 
potentially enhance the functionality of the PostgreSQL estimation code.


To clarify the purpose, I want an optional, loaded as a library, more 
conservative estimation based on distinct statistics. Let's provide (a 
bit degenerate) example:


CREATE TABLE is_test(x1 integer, x2 integer, x3 integer, x4 integer);
INSERT INTO is_test (x1,x2,x3,x4)
   SELECT x%5,x%7,x%11,x%13 FROM generate_series(1,1E3) AS x;
INSERT INTO is_test (x1,x2,x3,x4)
   SELECT 14,14,14,14 FROM generate_series(1,100) AS x;
CREATE STATISTICS ist_stat (dependencies,ndistinct)
  ON x1,x2,x3,x4 FROM is_test;
ANALYZE is_test;
EXPLAIN (ANALYZE, COSTS ON, SUMMARY OFF, TIMING OFF)
SELECT * FROM is_test WHERE x1=14 AND x2=14 AND x3=14 AND x4=14;
DROP TABLE is_test CASCADE;

I see:
(cost=0.00..15.17 rows=3 width=16) (actual rows=100 loops=1)

Dependency works great if it is the same for all the data in the 
columns. But we get underestimations if we have different laws for 
subsets of rows. So, if we don't have MCV statistics, sometimes we need 
to pass over dependency statistics and use ndistinct instead.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0ab021c1e8..1508a1beea 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -128,6 +128,11 @@ clauselist_selectivity_ext(PlannerInfo *root,
 	ListCell   *l;
 	int			listidx;
 
+	if (clauselist_selectivity_hook)
+		s1 = clauselist_selectivity_hook(root, clauses, varRelid, jointype,
+		 sjinfo, ,
+		 use_extended_stats);
+
 	/*
 	 * If there's exactly one clause, just go directly to
 	 * clause_selectivity_ext(). None of what we might do below is relevant.
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 99fdf208db..b1722f5a60 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1712,7 +1712,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
  * 0-based 'clauses' indexes we estimate for and also skip clause items that
  * already have a bit set.
  */
-static Selectivity
+Selectivity
 statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
    JoinType jointype, SpecialJoinInfo *sjinfo,
    RelOptInfo *rel, Bitmapset **estimatedclauses,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5f5d7959d8..ff98fda08c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -146,6 +146,7 @@
 /* Hooks for plugins to get control when we ask for stats */
 get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;
+clauselist_selectivity_hook_type clauselist_selectivity_hook = NULL;
 
 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
 static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 7f2bf18716..436f30bdde 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -104,6 +104,14 @@ extern void BuildRelationExtStatistics(Relation onerel, bool inh, double totalro
 extern int	ComputeExtStatisticsRows(Relation onerel,
 	 int natts, VacAttrStats **vacattrstats);
 extern bool statext_is_kind_built(HeapTuple htup, char type);
+extern Selectivity statext_mcv_clauselist_selectivity(PlannerInfo *root,
+	  List *clauses,
+	  int varRelid,
+	  JoinType jointype,
+	  SpecialJoinInfo *sjinfo,
+	   RelOptInfo *rel,
+	   Bitmapset **estimatedclauses,
+	   bool is_or);
 extern Selectivity dependencies_clauselist_selectivity(PlannerInfo *root,
 	   List *clauses,
 	   int varRelid,
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f2563ad1cb..253f584c65 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -148,6 +148,17 @@ typedef bool (*get_index_stats_hook_type) (PlannerInfo *root,
 		   VariableStatData *vardata);
 extern PGDLLIMPORT get_index_stats_hook_type get_index_stats_hook;
 
+/* Hooks for plugins to get control when we ask for selectivity estimation */
+typedef Selectivity