date:20230418

RE: Initial Schema Sync for Logical Replication

2023-04-18 Thread Kumar, Sachin

> From: Masahiko Sawada 
> > > While writing a PoC patch, I found some difficulties in this idea.
> > > First, I tried to add schemaname+relname to pg_subscription_rel but
> > > I could not define the primary key of pg_subscription_rel. The
> > > primary key on (srsubid, srrelid) doesn't work since srrelid could be 
> > > NULL.
> > > Similarly, the primary key on (srsubid, srrelid, schemaname,
> > > relname) also doesn't work.
> > >
> >
> > Can we think of having a separate catalog table say
> > pg_subscription_remote_rel for this? You can have srsubid,
> > remote_schema_name, remote_rel_name, etc. We may need some other
> state
> > to be maintained during the initial schema sync where this table can
> > be used. Basically, this can be used to maintain the state till the
> > initial schema sync is complete because we can create a relation entry
> > in pg_subscritption_rel only after the initial schema sync is
> > complete.
> 
> It might not be ideal but I guess it works. But I think we need to modify the 
> name
> of replication slot for initial sync as it currently includes OID of the 
> table:
> 
> void
> ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
> char *syncslotname, Size szslot) {
> snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid,
>  relid, GetSystemIdentifier()); }
> 
> If we use both schema name and table name, it's possible that slot names are
> duplicated if schema and/or table names are long. Another idea is to use the
> hash value of schema+table names, but it cannot completely eliminate that
> possibility, and probably would make investigation and debugging hard in case
> of any failure. Probably we can use the OID of each entry in
> pg_subscription_remote_rel instead, but I'm not sure it's a good idea, mainly
> because we will end up using twice as many OIDs as before.

Maybe we can create serial primary key for pg_subscription_remote_rel table 
And use this key for creating replication slot ?

Regards
Sachin

Re: Enhanced rmgr desc routines test !has_image, not has_data

2023-04-18 Thread Michael Paquier

On Tue, Apr 18, 2023 at 02:36:40PM -0700, Peter Geoghegan wrote:
> This isn't just a theoretical issue. Currently, we won't display
> detailed descriptions of block data whenever wal_consistency_checking
> happens to be in use. At least for those records with relevant block
> data available to summarize that also happen to have an FPI that the
> REDO routine isn't supposed to apply (i.e. an FPI that is included in
> the record purely so that verifyBackupPageConsistency can verify that
> the REDO routine produces a matching image).

Yeah, I agree that your suggestion is more useful for debugging when a
record includes both a block image and some data associated to it.
So, +1.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-18 Thread Yurii Rashkovskii

Stephen,

> You could just drop another file into the data directory that just
> contains
> > the port number ($PGDATA/port).  However, if we ever do multiple ports,
> that
> > would still require a change in the format of that file, so I don't know
> if
> > that's actually better than a).
>

I find it difficult to get anything done under the restriction of "what if
we ever need to change X?" as it is difficult to address something that
doesn't exist or hasn't been planned.

A fine and delicate balance of anticipating what may happen theoretically
and what's more likely happen is an art. It's also important to consider
the impact of a breaking change. It's one thing if we have to break, say,
an SQL function signature or SQL syntax itself, and another if it is a
relatively small feature related to the administration of a server (in this
case, more like scripting a test bench).


>
> If we did a port per line then it wouldn't be changing the format of the
> first line, so that might not be all that bad.
>

If we consider this path, then (if we assume the format of the file is
still to be private), we can make the port line accept multiple ports using
a delimiter like `:` so that the next line still remains the same.

That being said, if the format is private to Postgres, it's all minor
considerations.


> > I don't think involving pg_ctl is necessary or desirable, since it would
> > make any future changes like that even more complicated.
>
> I'm a bit confused by this- if pg_ctl is invoked then we have
> more-or-less full control over parsing and reporting out the answer, so
> while it might be a bit more complicated for us, it seems surely simpler
> for the end user.  Or maybe you're referring to something here that I'm
> not thinking of?
>

I would love to learn about this as well.


>
> Independent of the above though ... this hand-wringing about what we
> might do in the relative near-term when we haven't done much in the past
> many-many years regarding listen_addresses or port strikes me as
> unlikely to be necessary.  Let's pick something and get it done and
> accept that we may have to change it at some point in the future, but
> that's kinda what major releases are for, imv anyway.
>

That's how I see it, too. I tried to make this change as small as possible
to appreciate the fact that all of this may change one day if or when that
portion of Postgres will be due for a major redesign. I'd be happy to
contribute to that process, but in the meantime, I am looking for the
simplest reasonable way to achieve a relatively specific use case.

Personally, I am fine with reading the `.pid` file and accepting that it
_may_ change in the future; I am also fine with amending the patch to add
functionality to pg_ctl or adding a new file.

To keep everybody's cognitive load low, I'd rather not flood the thread
with multiple alternative implementations (unless that's desirable) and
just go for something we can agree on.

(I consider this feature so small that it doesn't deserve such a lengthy
discussion. However, I also get Tom's point about how we document this
feature's use, which is very valid and valuable. If it was up to me
entirely, I'd probably just document `postmaster.pid` and call it a day. If
it ever breaks, that's a major release territory. Otherwise, amending
`pg_ctl` to access information like this in a uniform way is also a good
approach if we want to keep the format of the pid file private.)

-- 
Y.

Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-18 Thread Stephen Frost

Greetings,

* Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:
> On 13.04.23 04:45, Yurii Rashkovskii wrote:
> > But getting your agreement is important to get this in; I am willing to
> > play along and resolve both (1) and (2) in one go. As for the
> > implementation approach for (2), which of the following options would
> > you prefer?
> > 
> > a) Document postmaster.pid as it stands today
> > b) Expose the port number through pg_ctl (*my personal favorite)
> > c) Redesign its content below line 1 to make it extensible (make unnamed
> > lines named, for example)
> > 
> > If none of the above options suit you, do you have a strategy you'd prefer?
> 
> You could just drop another file into the data directory that just contains
> the port number ($PGDATA/port).  However, if we ever do multiple ports, that
> would still require a change in the format of that file, so I don't know if
> that's actually better than a).

If we did a port per line then it wouldn't be changing the format of the
first line, so that might not be all that bad.

> I don't think involving pg_ctl is necessary or desirable, since it would
> make any future changes like that even more complicated.

I'm a bit confused by this- if pg_ctl is invoked then we have
more-or-less full control over parsing and reporting out the answer, so
while it might be a bit more complicated for us, it seems surely simpler
for the end user.  Or maybe you're referring to something here that I'm
not thinking of?

Independent of the above though ... this hand-wringing about what we
might do in the relative near-term when we haven't done much in the past
many-many years regarding listen_addresses or port strikes me as
unlikely to be necessary.  Let's pick something and get it done and
accept that we may have to change it at some point in the future, but
that's kinda what major releases are for, imv anyway.

Thanks!

Stpehen

signature.asc
Description: PGP signature

Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-18 Thread Peter Eisentraut


On 13.04.23 04:45, Yurii Rashkovskii wrote:
But getting your agreement is important to get this in; I am willing to 
play along and resolve both (1) and (2) in one go. As for the 
implementation approach for (2), which of the following options would 
you prefer?


a) Document postmaster.pid as it stands today
b) Expose the port number through pg_ctl (*my personal favorite)
c) Redesign its content below line 1 to make it extensible (make unnamed 
lines named, for example)


If none of the above options suit you, do you have a strategy you'd prefer?


You could just drop another file into the data directory that just 
contains the port number ($PGDATA/port).  However, if we ever do 
multiple ports, that would still require a change in the format of that 
file, so I don't know if that's actually better than a).


I don't think involving pg_ctl is necessary or desirable, since it would 
make any future changes like that even more complicated.

Re: Use INT_MAX for wal size related gucs's max value

2023-04-18 Thread Junwang Zhao

These gucs are always used with ConvertToXSegs, to calculate the count of
wal segments(see the following code snip), and wal_segment_size can be
configured by initdb as a value of a power of 2 between 1 and 1024 (megabytes),
so I think INT_MAX should be safe here.

/*
* Convert values of GUCs measured in megabytes to equiv. segment count.
* Rounds down.
*/
#define ConvertToXSegs(x, segsize) XLogMBVarToSegs((x), (segsize))

/*
* Convert values of GUCs measured in megabytes to equiv. segment count.
* Rounds down.
*/
#define XLogMBVarToSegs(mbvar, wal_segsz_bytes) \
((mbvar) / ((wal_segsz_bytes) / (1024 * 1024)))

On Wed, Apr 19, 2023 at 11:33 AM Tom Lane  wrote:
>
> Junwang Zhao  writes:
> > The wal size related gucs use the MB unit, so we should just use
> > INT_MAX instead of MAX_KILOBYTES as the max value.
>
> The point of MAX_KILOBYTES is to avoid overflow when the value
> is multiplied by 1kB.  It does seem like that might not be
> appropriate for these values, but that doesn't mean that we can
> blithely go to INT_MAX.  Have you chased down how they are used?
>
> regards, tom lane

-- 
Regards
Junwang Zhao

Re: Use INT_MAX for wal size related gucs's max value

2023-04-18 Thread Tom Lane

Junwang Zhao  writes:
> The wal size related gucs use the MB unit, so we should just use
> INT_MAX instead of MAX_KILOBYTES as the max value.

The point of MAX_KILOBYTES is to avoid overflow when the value
is multiplied by 1kB.  It does seem like that might not be
appropriate for these values, but that doesn't mean that we can
blithely go to INT_MAX.  Have you chased down how they are used?

regards, tom lane

Use INT_MAX for wal size related gucs's max value

2023-04-18 Thread Junwang Zhao

The wal size related gucs use the MB unit, so we should just use
INT_MAX instead of MAX_KILOBYTES as the max value.

-- 
Regards
Junwang Zhao


0001-use-INT_MAX-for-wal-size-related-max-value.patch
Description: Binary data

Re: Allowing parallel-safe initplans

2023-04-18 Thread Richard Guo

On Tue, Apr 18, 2023 at 9:33 PM Tom Lane  wrote:

> Richard Guo  writes:
> > It seems that in this case the top_plan does not have any extParam, so
> > the Gather node that is added atop the top_plan does not have a chance
> > to get its initParam filled in set_param_references().
>
> Oh, so maybe we'd need to copy up extParam as well?  But it's largely
> moot, since I don't see a good way to avoid breaking the EXPLAIN
> output.

Yeah, seems breaking the EXPLAIN output is inevitable if we move the
initPlans to the Gather node.  So maybe we need to keep the logic as in
v1 patch, i.e. avoid adding a Gather node when top_plan has initPlans.
If we do so, I wonder if we need to explain the potential wrong results
issue in the comments.

Thanks
Richard

Re: check_strxfrm_bug()

2023-04-18 Thread Jonathan S. Katz


On 4/18/23 9:19 PM, Thomas Munro wrote:

On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier  wrote:

On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:

+1 for getting rid of TRUST_STRXFRM.


+1

The situation is not improving fast, and requires hard work to follow
on each OS.  Clearly, mainstreaming ICU is the way to go.  libc
support will always have niche uses, to be compatible with other
software on the box, but trusting strxfrm doesn't seem to be on the
cards any time soon.


[RMT hat, personal opinion on RMT]

To be clear, is the proposal to remove both "check_strxfrm_bug" and 
"TRUST_STRXFRM"?


Given a bunch of folks who have expertise in this area of code all agree 
with removing the above as part of the collation cleanups targeted for 
v16, I'm inclined to agree. I don't really see the need for an explicit 
RMT action, but based on the consensus this seems OK to add as an open item.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: pg_collation.collversion for C.UTF-8

2023-04-18 Thread Thomas Munro

On Wed, Apr 19, 2023 at 1:30 PM Jeff Davis  wrote:
> On Wed, 2023-04-19 at 07:48 +1200, Thomas Munro wrote:
> > Many OSes have a locale with this name.  I don't know this history,
> > who did it first etc, but now I am wondering if they all took the
> > "obvious" interpretation, that it should be code-point based,
> > extrapolating from "C" (really memcmp order):
>
> memcmp() is not the same as code-point order in all encodings, right?

Right.  I wasn't trying to suggest that *we* should assume that, I was
just thinking out loud about how a libc implementor would surely think
that a "C.encoding" should work, in the spirit of "C", given that the
standard doesn't tell us IIUC.  It looks like for technical reasons
inside glibc, that couldn't be done before 2.35:

https://sourceware.org/bugzilla/show_bug.cgi?id=17318

That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied by
the glibc project) isn't supposed to be versioned, but it's extremely
unfortunate that a bunch of OSes (Debian and maybe more) have been
sorting text in some other order under that name for years.

> I've been thinking that we should have a "provider=none" for the
> special cases that use memcmp(). It's not using libc as a collation
> provider; it's really postgres in control of the semantics.

Yeah, interesting idea.

Re: pg_collation.collversion for C.UTF-8

2023-04-18 Thread Jeff Davis

On Wed, 2023-04-19 at 07:48 +1200, Thomas Munro wrote:
> Many OSes have a locale with this name.  I don't know this history,
> who did it first etc, but now I am wondering if they all took the
> "obvious" interpretation, that it should be code-point based,
> extrapolating from "C" (really memcmp order):

memcmp() is not the same as code-point order in all encodings, right?

I've been thinking that we should have a "provider=none" for the
special cases that use memcmp(). It's not using libc as a collation
provider; it's really postgres in control of the semantics.

That would clean up the documentation and the code a bit, and make it
more clear which locales are being passed to the provider and which
ones aren't.

If we are passing it to a provider (e.g. "C.UTF-8"), we shouldn't make
unnecessary assumptions about what the provider will do with it.

For what it's worth, in my recent ICU language tag work, I special-
cased ICU locales with language "C" or "POSIX" to map to "en-US-u-va-
posix", disregarding everything else (collation attributes, etc.). I
believe that's the right thing based on the behavior I observed: for
the POSIX variant of en-US, ICU seems to disregard other things such as
case insensitivity. But it still ultimately goes to the provider and
ICU has particular rules for that locale -- I don't assume memcpy-like
semantics or code point order.

Regards,
Jeff Davis

Re: check_strxfrm_bug()

2023-04-18 Thread Thomas Munro

On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier  wrote:
> On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
> > +1 for getting rid of TRUST_STRXFRM.

+1

The situation is not improving fast, and requires hard work to follow
on each OS.  Clearly, mainstreaming ICU is the way to go.  libc
support will always have niche uses, to be compatible with other
software on the box, but trusting strxfrm doesn't seem to be on the
cards any time soon.

Re: Should we put command options in alphabetical order in the doc?

2023-04-18 Thread Peter Geoghegan

On Tue, Apr 18, 2023 at 4:30 PM Peter Geoghegan  wrote:
> > I'd be interested to know if you could tell me if SKIP_LOCKED has more
> > importance than INDEX_CLEANUP, for example. If you can, it would seem
> > like trying to say apples are more important than oranges, or
> > vice-versa.
>
> I don't accept your premise that the only thing that matters (or the
> most important thing) is adherence to some unambiguous and consistent
> order.

In the case of VACUUM, the current devel order is:

FULL, FREEZE, VERBOSE, ANALYZE, DISABLE_PAGE_SKIPPING, SKIP_LOCKED,
INDEX_CLEANUP, PROCESS_MAIN, PROCESS_TOAST,
TRUNCATE, PARALLEL, SKIP_DATABASE_STATS, ONLY_DATABASE_STATS, BUFFER_USAGE_LIMIT

I think that this order is far superior to alphabetical order, which
is tantamount to random order. The first 4 items are indeed the really
important ones to users, in my experience.

I do have some minor quibbles beyond that, though. These are:

* PARALLEL deserves to be at the start, maybe 4th or 5th overall.

* DISABLE_PAGE_SKIPPING should be later, since it's really only a
testing option that probably never proved useful in production. In
particular, it has little business being before SKIP_LOCKED, which is
much more important and relevant.

* TRUNCATE and INDEX_CLEANUP are similar options, and ought to be side
by side. I would put PROCESS_MAIN and PROCESS_TOAST after those two
for the same reason.

While I'm certain that nobody will agree with me on every little
detail, I have to imagine that most would find my preferred ordering
quite understandable and unsurprising, at a high level -- this is not
a hopelessly idiosyncratic ranking, that could just as easily have
been generated by a PRNG. People may not easily agree that "apples are
more important than oranges, or vice-versa", but what does it matter?
I've really only put each option into buckets of items with *roughly*
the same importance. All of the details beyond that don't matter to
me, at all.

-- 
Peter Geoghegan

Re: allow_in_place_tablespaces vs. pg_basebackup

2023-04-18 Thread Michael Paquier

On Tue, Apr 18, 2023 at 11:35:41AM -0400, Robert Haas wrote:
> On Mon, Apr 17, 2023 at 1:30 AM Michael Paquier  wrote:
>> FWIW, doing that now rather than the beginning of July is OK for me
>> for this stuff.
> 
> OK, committed.

Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Should we put command options in alphabetical order in the doc?

2023-04-18 Thread Peter Geoghegan

On Tue, Apr 18, 2023 at 4:18 PM David Rowley  wrote:
> "Importance order" just seems horribly subjective to me.

Alphabetical order seems objectively bad. At least to me.

> I'd be interested to know if you could tell me if SKIP_LOCKED has more
> importance than INDEX_CLEANUP, for example. If you can, it would seem
> like trying to say apples are more important than oranges, or
> vice-versa.

I don't accept your premise that the only thing that matters (or the
most important thing) is adherence to some unambiguous and consistent
order.

-- 
Peter Geoghegan

Re: Should we put command options in alphabetical order in the doc?

2023-04-18 Thread David Rowley

On Tue, 18 Apr 2023 at 18:53, Peter Geoghegan  wrote:
> Take the VACUUM command. Right now FULL, FREEZE, and VERBOSE all come
> first. Those options are approximately the most important options --
> especially VERBOSE. But your patch places VERBOSE dead last.

hmm, how can we verify that the options are kept in order of
importance? What guidance can we provide to developers adding options
about where they should slot in the new option to the docs?

"Importance order" just seems horribly subjective to me.  I'd be
interested to know if you could tell me if SKIP_LOCKED has more
importance than INDEX_CLEANUP, for example. If you can, it would seem
like trying to say apples are more important than oranges, or
vice-versa.

David

Enhanced rmgr desc routines test !has_image, not has_data

2023-04-18 Thread Peter Geoghegan

Recent commits that enhanced rmgr desc routines (commits 7d8219a4 and
1c453cfd) dealt with records that lack relevant block data (and so
lack anything to give a more detailed summary of) by testing
!DecodedBkpBlock.has_image -- that is the gating condition that
determines if we want to (say) output a textual array representation
of the page offset number from a given nbtree VACUUM WAL record.
Strictly speaking, this isn't the correct gating condition to test. We
should be testing the *presence* of the relevant block data instead.
Why test an inexact proxy for the condition that we care about, when
we can just as easily test the precise condition we care about
instead?

This isn't just a theoretical issue. Currently, we won't display
detailed descriptions of block data whenever wal_consistency_checking
happens to be in use. At least for those records with relevant block
data available to summarize that also happen to have an FPI that the
REDO routine isn't supposed to apply (i.e. an FPI that is included in
the record purely so that verifyBackupPageConsistency can verify that
the REDO routine produces a matching image).

Attached patch fixes this bug.

-- 
Peter Geoghegan


v1-0001-Test-has_data-in-rmgr-desc-routines.patch
Description: Binary data

Re: Request for comment on setting binary format output per session

2023-04-18 Thread Greg Stark

On Mon, 17 Apr 2023 at 16:22, Tom Lane  wrote:
>
> I tend to agree with the proposition that we aren't going to add new
> message types very often, as long as we're careful to make them general
> purpose.  Don't forget that adding a new message type isn't just a matter
> of writing some spec text --- there has to be code backing it up.  We
> will never introduce thousands of new message types, or if we do,
> somebody factored it wrong and put data into the type code.

Well the way I understood Robert's proposal would be that you would
set a protocol option which could be some name like
SuperDuperExtension and then later send an extended message like X
SuperDuper Extension ...

The point being not so much that it saves on message types but that it
becomes possible for the wire protocol code to recognize the message
type and know which extension's code to call back to. Presumably a
callback was registered when the option was negotiated.

> The fact that we've gotten away without adding *any* new message types
> for about twenty years suggests to me that the growth rate isn't such
> that we need sub-message-types yet.  I'd keep the structure the same
> until such time as we can't choose a plausible code value for a new
> message, and then maybe add the "x-and-subtype" convention Jeff suggests.

Fwiw I've had at least two miniprojects that would eventually have led
to protocol extensions. Like most of my projects they're not finished
but one day...

Progress reporting on queries in progress -- I had things hacked to
send the progress report in an elog but eventually it would have made
sense to put it in a dedicated message type that the client would know
the structure of the content of.

Distributed tracing -- to pass the trace span id for each query and
any other baggage. Currently people either stuff it in application_id
or in SQL comments but they're both pretty awful.

-- 
greg

Re: pg_collation.collversion for C.UTF-8

2023-04-18 Thread Thomas Munro

On Wed, Apr 19, 2023 at 12:36 AM Daniel Verite  wrote:
> This seems to be based on the idea that C.* collations provide an
> immutable sort like "C", but it appears that it's not the case.

Hmm.  It seems I added that exemption initially for FreeBSD only in
ca051d8b101, and then merged the cases for several OSes in
beb4480c853.

It's extremely surprising to me that the sort order changed.  I
expected the sort order to be code point order:

https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

One interesting thing is that it seems that it might have been
independently invented by Debian (?) and then harmonised with glibc
2.35:

https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1871363.html

Was the earlier Debian version buggy, or did it simply have a
different idea of what the sort order should be, intentionally?  Ugh.
>From your examples, we can see that the older Debian system did not
have A < [some 4 digit code point], while the later version did (as
expected).  If so then it might be tempting to *not* do what you're
suggesting, since the stated goal of the thing is to be stable from
now on.  But it changed once in the early years of its existence!
Annoying.

Many OSes have a locale with this name.  I don't know this history,
who did it first etc, but now I am wondering if they all took the
"obvious" interpretation, that it should be code-point based,
extrapolating from "C" (really memcmp order):

https://unix.stackexchange.com/questions/597962/how-widespread-is-the-c-utf-8-locale

Re: Direct I/O

2023-04-18 Thread Greg Stark

On Mon, 17 Apr 2023 at 17:45, Thomas Munro  wrote:
>
> Reasons: (1) There will always be a
> few file systems that refuse O_DIRECT (Linux tmpfs is one such, as we
> learned in this thread; if fails with EINVAL at open() time), and

So why wouldn't we just automatically turn it off (globally or for
that tablespace) and keep operating without it afterward?

> (2) without a page cache, you really need to size your shared_buffers
> adequately and we can't do that automatically.

Well I'm more optimistic... That may not always be impossible.
We've already added the ability to add more shared memory after
startup. We could implement the ability to add or remove shared buffer
segments after startup. And it wouldn't be crazy to imagine a kernel
interface that lets us judge whether the kernel memory pressure makes
it reasonable for us to take more shared buffers or makes it necessary
to release shared memory to the kernel. You could hack something
together using /proc/meminfo today but I imagine an interface intended
for this kind of thing would be better.

>  It's something you'd
> opt into for a dedicated database server along with other carefully
> considered settings.  It seems acceptable to me that if you set
> io_direct to a non-default setting on an unusual-for-a-database-server
> filesystem you might get errors screaming about inability to open
> files -- you'll just have to turn it back off again if it doesn't work
> for you.

If the only solution is to turn it off perhaps the server should just
turn it off? I guess the problem is that the shared_buffers might be
set assuming it would be on?

-- 
greg

Re: Fix typos and inconsistencies for v16

2023-04-18 Thread Tom Lane

Justin Pryzby  writes:
> On Tue, Apr 18, 2023 at 02:06:43PM +1200, David Rowley wrote:
>> On Tue, 18 Apr 2023 at 10:10, Justin Pryzby  wrote:
>>> and s/evade/avoid/

>> I didn't touch this. You'll need to provide more justification for why
>> you think it's more correct than what's there.  

> I'd noticed that it's a substitution/mistake that's been made in the
> past.

"Evade" doesn't seem like le mot juste there; it's got negative
connotations.  But the code around it is just horrible.  Some offenses:

* No documentation in the function header comment of what the
usersetArray parameter is or does.  Which is bad enough in itself,
but what the parameter actually does is falsify the header comment's
principal claim that the passed context is what is used.  So I don't
find that omission acceptable.

* Non-obvious, and quite unnecessary, dependency on the isnull variable
having been left in a particular state by previous code.

* For me, at least, it'd read better if the if/else arms were swapped,
allowing removal of the negation in the if-condition and bringing
the code this comment comments on closer to said comment.

As for the comment text, maybe say

 * If the value was USER SET, then apply it at PGC_USERSET context
 * rather than the caller-supplied context, to prevent any more-restricted
 * GUCs being set.  Also pass InvalidOid for the role, to ensure any
 * special privileges of the current user aren't applied.

I hesitate to go look at the rest of this commit, but I guess somebody
had better.

regards, tom lane

Re: Fix typos and inconsistencies for v16

2023-04-18 Thread Alexander Lakhin


Hi Justin and David,

18.04.2023 01:10, Justin Pryzby wrote:

Well done.


Thank you for reviewing!


On Mon, Apr 17, 2023 at 09:00:00PM +0300, Alexander Lakhin wrote:

Hello hackers,

Please consider fixing the following unique words/identifiers introduced in v16:

Note that your patches are overlapping:
...
It'd make sense if the changes to each file were isolated to a single
patch (especially 004_load and acl.c).


I'd hoped that most of the proposed fixes will be accepted, so conflicts due
to skipping of some changes seemed unlikely to me. So if you are not
strongly disagree, I would continue presenting my findings the same way.


...
You missed "boostrap" :)


Yes, that's because "boostrap" was not unique, but my semi-automatic approach
is based on `uniq -u`, so I'm sure that there are typos that can't be found
this way.


But hadn't yet convinced myself to start the process of defending each
one of the fixes.  Attached some others that I found.


Yeah, those are good catches too, but e. g. "privilges" is not new in v16,
so it's fallen out of my "hot errors" category. If we're going to fix not so
hot ones too now, please look at the similar list for v15+ (596b5af1d..HEAD).

1. abbrevated -> abbreviated
2. ArchiveModeRequested -> ArchiveRecoveryRequested
3. BufFileOpenShared -> BufFileOpenFileSet // see dcac5e7ac
4. check_publication_columns -> pub_collist_contains_invalid_column // see note 
1
5. configuation -> configuration
6. copyAclUserName -> dequoteAclUserName // see 0c9d84427
7. EndWalRecovery -> FinishWalRecovery
8. HaveRegisteredOrActiveSnapshots -> HaveRegisteredOrActiveSnapshot
9. idiosyncracies -> idiosyncrasies
10. iif -> iff

11. initpriv -> initprivs
12. inserted_destrel -> insert_destrel
13. Intialize -> Initialize
14. invtrans -> invtransfn
15. isolation-level -> isolation level
16. lefthasheqoperator -> left_hasheqoperator + righthasheqoperator -> 
right_hasheqoperator
17. LRQ_NO_IO -> LRQ_NEXT_NO_IO
18. minRecovery point -> minRecoveryPoint
19. multidimensional-aware -> multidimension-aware // sync with gistbuild.c
20. ParalleVacuumState -> ParallelVacuumState

21. PgStatShm_Stat*Entry -> PgStatShared_* // see note 2
22. plpython_call_handler -> plpython3_call_handler // see 9b7e24a2c
23. pulications -> publications
24. ReadCheckPointRecord -> ReadCheckpointRecord
25. relkkinds -> relkinds
26. separare -> separate // though perhaps it's not the most suitable word here
27. setup_formatted_log_time -> get_formatted_log_time // see ac7c80758
28. SPI_abort -> SPI_rollback
29. ssup_datum_int32_compare -> ssup_datum_int32_cmp
30. ssup_datum_signed_compare -> ssup_datum_signed_cmp

31. ssup_datum_unsigned_compare -> ssup_datum_unsigned_cmp
32. SUBSCRITPION -> SUBSCRIPTION
33. tabelspaces -> tablespaces
34. table_state_not_ready -> table_states_not_ready
35. underling -> underlying
36. WalRecoveryResult -> EndOfWalRecoveryInfo

Also, I'd like to note that the following entities/references are orphaned now,
so maybe some of them could be removed too:
1. gen-rtab (in pgcrypto/Makefile) // orphaned since db7d1a7b0
2. pgstat_temp_directory // was left by b3abca681 for v15, but maybe it's time 
to remove it for v16
3. pgstat_write_statsfiles (in valgrind.supp)
4. quote_system_arg (in vcregress.pl) // unused since d2a2ce418
5. standard_initdb (in vcregress.pl) // unused since 322becb60
/* though maybe vcregress.pl will be removed completely soon */
6. int pstat;  /* mcrypt uses it */ (in 
contrib/pgcrypto/px.h)
/* "mcrypt" became unique after abe81ee08, support for libmcrypt was removed at 
2005
(3cc866123) */

Note 1. A check that was located in check_publication_columns() in
v13-0003-Add-column-filtering-to-logical-replication.patch [1],
can be found in pub_collist_contains_invalid_column() now (see also [2]).

Note 2. The inconsistency appeared in [3],
v67-0007-pgstat-store-statistics-in-shared-memory.patch was correct in
this aspect.


18.04.2023 04:35, David Rowley wrote:

Please consider fixing the following unique words/identifiers introduced in v16:

Thanks, I've pushed all of these apart from the following 2.

Thank you!

[1] 
https://www.postgresql.org/message-id/202112302021.ca7ihogysgh3%40alvherre.pgsql
[2] 
https://www.postgresql.org/message-id/CAA4eK1K5pkrPT9z5TByUPptExian5c18g6GnfNf9Cr97QdPbjw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/20220404041516.cctrvpadhuriawlq%40alap3.anarazel.de

Best regards,
Alexanderdiff --git a/contrib/basebackup_to_shell/t/001_basic.pl b/contrib/basebackup_to_shell/t/001_basic.pl
index a278df3f91..84ad93f614 100644
--- a/contrib/basebackup_to_shell/t/001_basic.pl
+++ b/contrib/basebackup_to_shell/t/001_basic.pl
@@ -49,7 +49,7 @@ $node->command_fails_like(
 	qr/shell command for backup is not configured/,
 	'fails if basebackup_to_shell.command is not set');
 
-# Configure basebackup_to_shell.command and reload the configuation file.
+# Configure basebackup_to_shell.command and reload the configurati

Re: Fix typos and inconsistencies for v16

2023-04-18 Thread Justin Pryzby

On Tue, Apr 18, 2023 at 02:06:43PM +1200, David Rowley wrote:
> On Tue, 18 Apr 2023 at 10:10, Justin Pryzby  wrote:
> > > -  * USER SET values are appliciable only for PGC_USERSET 
> > > parameters. We
> > > +  * USER SET values are applicable only for PGC_USERSET 
> > > parameters. We
> > >* use InvalidOid as role in order to evade possible 
> > > privileges of the
> >
> > and s/evade/avoid/
> 
> I didn't touch this. You'll need to provide more justification for why
> you think it's more correct than what's there.  

I'd noticed that it's a substitution/mistake that's been made in the
past.  I dug up:

9436041ed848debb3d64fb5fbff6cdb35bc46d04
8e12f4a250d250a89153da2eb9b91c31bb80c483
cd9479af2af25d7fa9bfd24dd4dcf976b360f077
6df7a9698bb036610c1e8c6d375e1be38cb26d5f
911e70207703799605f5a0e8aad9f06cff067c63

> It might not be worth too much discussion, however.

+many.  I may resend the patch at some later date.

> > Attached some others that I found.
> 
> Pushed the rest.  Thanks

Thanks!

-- 
Justin

Re: [PATCH] Compression dictionaries for JSONB

2023-04-18 Thread Aleksander Alekseev

Matthias, Nikita,

Many thanks for the feedback!

> Any type with typlen < 0 should work, right?

Right.

> The use of dictionaries should be dependent on only the use of a
> compression method that supports pre-computed compression
> dictionaries. I think storage=MAIN + compression dictionaries should
> be supported, to make sure there is no expensive TOAST lookup for the
> attributes of the tuple; but that doesn't seem to be an option with
> that design.

> I don't think it's a good idea to interfere with the storage strategies. 
> Dictionary
> should be a kind of storage option, like a compression, but not the strategy
> declining all others.

My reasoning behind this proposal was as follows.

Let's not forget that MAIN attributes *can* be stored in a TOAST table
as a final resort, and also that EXTENDED attributes are compressed
in-place first, and are stored in a TOAST table *only* if this is
needed to fit a tuple in toast_tuple_target bytes (which additionally
user can change). So whether in practice it's going to be advantageous
to distinguish MAIN+dict.compressed and EXTENDED+dict.compressed
attributes seems to be debatable.

Basically the only difference between MAIN and EXTENDED is the
priority the four-stage TOASTing algorithm gives to the corresponding
attributes. I would assume if the user wants dictionary compression,
the attribute should be highly compressible and thus always EXTENDED.
(We seem to use MAIN for types that are not that well compressible.)

This being said, if the majority believes we should introduce a new
entity and keep storage strategies as is, I'm fine with that. This
perhaps is not going to be the most convenient interface for the user.
On the flip side it's going to be flexible. It's all about compromise.

> I think "AT_AC SET COMPRESSION lz4 {[WITH | WITHOUT] DICTIONARY}",
> "AT_AC SET COMPRESSION lz4-dictionary", or "AT_AC SET
> compression_dictionary = on" would be better from a design
> perspective.

> Agree with Matthias on above.

OK, unless someone will object, we have a consensus here.

> Didn't we get zstd support recently as well?

Unfortunately, it is not used for TOAST. In fact I vaguely recall that
ZSTD support for TOAST may have been explicitly rejected. Don't quote
me on that however...

I think it's going to be awkward to support PGLZ/LZ4 for COMPRESSION
and LZ4/ZSTD for dictionary compression. As a user personally I would
prefer having one set of compression algorithms that can be used with
TOAST.

Perhaps for PoC we could focus on LZ4, and maybe PGLZ, if we choose to
use PGLZ for compression dictionaries too. We can always discuss ZSTD
separately.

> Can we specify a default compression method for each postgresql type,
> just like how we specify the default storage? If not, then the setting
> could realistically be in conflict with a default_toast_compression
> setting, assuming that dictionary support is not a requirement for
> column compression methods.

No, only STORAGE can be specified [1].

> The toast pointer must store enough info about the compression used to
> decompress the datum, which implies it needs to store the compression
> algorithm used, and a reference to the compression dictionary (if
> any). I think the idea about introducing a new toast pointer type (in
> the custom toast patch) wasn't bad per se, and that change would allow
> us to carry more or different info in the header.

> The Pluggable TOAST was rejected, but we have a lot of improvements
> based on changing the TOAST pointer structure.

Interestingly it looks like we ended up working on TOAST improvement
after all. I'm almost certain that we will have to modify TOAST
pointers to a certain degree in order to make it work. Hopefully it's
not going to be too invasive.

[1]: https://www.postgresql.org/docs/current/sql-createtype.html
-- 
Best regards,
Aleksander Alekseev

Re: constants for tar header offsets

2023-04-18 Thread Dagfinn Ilmari Mannsåker

Robert Haas  writes:

> On Tue, Apr 18, 2023 at 12:06 PM Tom Lane  wrote:
>> Hmm, you're right: I checked the POSIX.1-2018 spec as well, and
>> it agrees that the prefix field is 155 bytes long.  Perhaps just
>> add another comment line indicating that 12 bytes remain unassigned?
>
> OK. Here's v2, with that change and a few others.

It still has magic numbers for the sizes of the fields, should those
also be named constants?

- ilmari

Re: constants for tar header offsets

2023-04-18 Thread Tom Lane

Robert Haas  writes:
> OK. Here's v2, with that change and a few others.

LGTM.

regards, tom lane

Re: Request for comment on setting binary format output per session

2023-04-18 Thread Dave Cramer

On Tue, 18 Apr 2023 at 12:24, Robert Haas  wrote:

> On Tue, Apr 18, 2023 at 11:51 AM Tom Lane  wrote:
> > Robert Haas  writes:
> > > One thing I think we should do in this area is introduce #defines for
> > > all the message type codes and use those instead of having hard-coded
> > > constants everywhere.
> >
> > +1, but I wonder where we should put those exactly.  My first thought
> > was postgres_ext.h, but the charter for that is
> >
> >  * This file contains declarations of things that are visible
> everywhere
> >  *  in PostgreSQL *and* are visible to clients of frontend interface
> libraries.
> >  *  For example, the Oid type is part of the API of libpq and other
> libraries.
> >
> > so picayune details of the wire protocol probably don't belong there.
> > Maybe we need a new header concerned with the wire protocol?
>
> Yeah. I sort of thought maybe one of the files in src/include/libpq
> would be the right place, but it doesn't look like it.
>
> If we at least created the defines and replaced occurrences with the same,
then we can litigate where to put them later.

I think I'd prefer this in a different patch, but I'd be willing to take a
run at it.

Dave

Re: constants for tar header offsets

2023-04-18 Thread Robert Haas

On Tue, Apr 18, 2023 at 12:06 PM Tom Lane  wrote:
> Hmm, you're right: I checked the POSIX.1-2018 spec as well, and
> it agrees that the prefix field is 155 bytes long.  Perhaps just
> add another comment line indicating that 12 bytes remain unassigned?

OK. Here's v2, with that change and a few others.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Add-and-use-symbolic-constants-for-tar-header-off.patch
Description: Binary data

Re: Request for comment on setting binary format output per session

2023-04-18 Thread Robert Haas

On Tue, Apr 18, 2023 at 11:51 AM Tom Lane  wrote:
> Robert Haas  writes:
> > One thing I think we should do in this area is introduce #defines for
> > all the message type codes and use those instead of having hard-coded
> > constants everywhere.
>
> +1, but I wonder where we should put those exactly.  My first thought
> was postgres_ext.h, but the charter for that is
>
>  * This file contains declarations of things that are visible everywhere
>  *  in PostgreSQL *and* are visible to clients of frontend interface 
> libraries.
>  *  For example, the Oid type is part of the API of libpq and other libraries.
>
> so picayune details of the wire protocol probably don't belong there.
> Maybe we need a new header concerned with the wire protocol?

Yeah. I sort of thought maybe one of the files in src/include/libpq
would be the right place, but it doesn't look like it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Compression dictionaries for JSONB

2023-04-18 Thread Nikita Malakhov

Hi,

I don't think it's a good idea to interfere with the storage strategies.
Dictionary
should be a kind of storage option, like a compression, but not the strategy
declining all others.

>> While thinking about how a user interface could look like it occured
>> to me that what we are discussing could be merely a new STORAGE
>> strategy. Currently there are PLAIN, MAIN, EXTERNAL and EXTENDED.
>> Let's call a new strategy DICTIONARY, with typstorage = d.

>I think "AT_AC SET COMPRESSION lz4 {[WITH | WITHOUT] DICTIONARY}",
>"AT_AC SET COMPRESSION lz4-dictionary", or "AT_AC SET
>compression_dictionary = on" would be better from a design
>perspective.

Agree with Matthias on above.

About the TOAST pointer:

>The toast pointer must store enough info about the compression used to
>decompress the datum, which implies it needs to store the compression
>algorithm used, and a reference to the compression dictionary (if
>any). I think the idea about introducing a new toast pointer type (in
>the custom toast patch) wasn't bad per se, and that change would allow
>us to carry more or different info in the header.

The External TOAST pointer is very limited to the amount of service data
it could keep, that's why we introduced the Custom TOAST pointers in the
Pluggable TOAST. But keep in mind that changing the TOAST pointer
structure requires a lot of quite heavy modifications in the core - along
with
some obvious places like insert/update/delete datum there is very serious
issue with logical replication.
The Pluggable TOAST was rejected, but we have a lot of improvements
based on changing the TOAST pointer structure.

On Tue, Apr 18, 2023 at 6:40 PM Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Tue, 18 Apr 2023 at 17:28, Aleksander Alekseev
>  wrote:
> >
> > Hi Andres,
> >
> > > As I said, I don't think we should extend dictionaries. For this to
> work we'll
> > > likely need a new / extended compressed toast datum header of some
> form, with
> > > a reference to the dictionary. That'd likely be needed even with
> updatable
> > > dictionaries, as we IIRC don't know which column a toasted datum is
> for, but
> > > we need to know, to identify the dictionary. As we need that field
> anyway, we
> > > can easily have multiple dictionaries.
> >
> > So I summarized the requirements we agreed on so far and ended up with
> > the following list:
> >
> > * This is going to be a PostgreSQL feature, not an extension, not a
> > bunch of hooks, etc;
> > * We are not going to support lazy/partial decompression since this is
> > too complicated in a general case and Postgres is not a specialized
> > document-oriented DBMS (there is a normalization after all);
> > * This should be a relation-level optimization option, not something
> > visible to every user of the table (not a custom type, etc);
> > * This is going to be an attribute-level compression;
> > * The dictionaries should be created automatically (maybe not in a PoC
> > but in the final implementation) since people are not good at it;
> > * We are going to be using the existing compression algorithms like
> > LZ4/ZSTD, not to invent new ones;
> > * When created, a dictionary version is immutable, i.e. no new entries
> > can be added. New version of a dictionary can be created when the data
> > evolves. The compressed data stores the dictionary version used for
> > compression. A dictionary version can't be deleted while data exists
> > that uses this version of a dictionary;
> > * Dictionaries are created automatically from sampled data during
> > ANALIZE. We compare the efficiency of a new dictionary vs the
> > efficiency of the old one (or the lack of such) on sampled data and
> > depending on the results decide whether it's worth creating a new
> > version of a dictionary;
> > * This is going to work for multiple types: TEXT, JSON, JSONB, XML,
> > BYTEA etc. Ideally for user-defined types too;
>
> Any type with typlen < 0 should work, right?
>
> > Hopefully I didn't miss anything.
> >
> > While thinking about how a user interface could look like it occured
> > to me that what we are discussing could be merely a new STORAGE
> > strategy. Currently there are PLAIN, MAIN, EXTERNAL and EXTENDED.
> > Let's call a new strategy DICTIONARY, with typstorage = d.
>
> The use of dictionaries should be dependent on only the use of a
> compression method that supports pre-computed compression
> dictionaries. I think storage=MAIN + compression dictionaries should
> be supported, to make sure there is no expensive TOAST lookup for the
> attributes of the tuple; but that doesn't seem to be an option with
> that design.
>
> > When user wants a given attribute to be compressed, he/she says:
> >
> > ALTER TABLE foo ALTER COLUMN bar SET STORAGE DICTIONARY;
> >
> > And the compression algorithms is chosen as usual:
> >
> > ALTER TABLE foo ALTER COLUMN bar SET COMPRESSION lz4;
> >
> > When there are no dictionaries yet, DICTIONARY works the same as
> > EXTENDED. When a dict

Re: constants for tar header offsets

2023-04-18 Thread Tom Lane

Robert Haas  writes:
> On Tue, Apr 18, 2023 at 11:38 AM Tom Lane  wrote:
>> 2. The header size is defined as 512 bytes, but this doesn't sum to 512:
>> +   TAR_OFFSET_PREFIX = 345 /* 155 byte string */

> I think that what happened is that whoever designed the original tar
> format decided on 512 byte blocks. And the header did not take up the
> whole block. The USTAR format is an extension of the original format
> which uses more of the block, but still not all of it.

Hmm, you're right: I checked the POSIX.1-2018 spec as well, and
it agrees that the prefix field is 155 bytes long.  Perhaps just
add another comment line indicating that 12 bytes remain unassigned?

regards, tom lane

Re: constants for tar header offsets

2023-04-18 Thread Robert Haas

On Tue, Apr 18, 2023 at 11:38 AM Tom Lane  wrote:
> Robert Haas  writes:
> > We have a few different places in the code where we generate or modify
> > tar headers or just read data out of them. The code in question uses
> > one of my less-favorite programming things: magic numbers. The offsets
> > of the various fields within the tar header are just hard-coded in
> > each relevant place in our code. I think we should clean that up, as
> > in the attached patch.
>
> Generally +1, with a couple of additional thoughts:
>
> 1. Is it worth inventing macros for the values of the file type,
> rather than just writing the comment you did?

Might be.

> 2. The header size is defined as 512 bytes, but this doesn't sum to 512:
>
> +   TAR_OFFSET_PREFIX = 345 /* 155 byte string */
>
> Either that field's length is really 167 bytes, or there's some other
> field you didn't document.  (It looks like you may have copied "155"
> from an incorrect existing comment?)

According to my research, it is neither of those, e.g. see

https://www.subspacefield.org/~vax/tar_format.html
https://www.ibm.com/docs/en/zos/2.4.0?topic=formats-tar-format-tar-archives
https://wiki.osdev.org/USTAR

I think that what happened is that whoever designed the original tar
format decided on 512 byte blocks. And the header did not take up the
whole block. The USTAR format is an extension of the original format
which uses more of the block, but still not all of it.

> Yeah, this is adding greppability (a good thing!) but little more.
> However, I'm not convinced it's worth doing more.  It's not like
> this data structure will change anytime soon.

Right. greppability is a major concern for me here, and also bug
surface. Right now, to use the functions in pgtar.h, you need to know
all the header offsets as well as the format and length of each header
field. This centralizes constants for the header offsets, and at least
provides some centralized documentation of the rest. It's not great,
though, because it seems like there's some risk of someone writing new
code and getting confused about whether the length of a certain field
is 8 or 12, for example. A thicker abstraction layer might be able to
avoid or minimize such risks better than what we have, but I don't
really know how to design it, whereas this seems like an obvious
improvement.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Request for comment on setting binary format output per session

2023-04-18 Thread Tom Lane

Robert Haas  writes:
> One thing I think we should do in this area is introduce #defines for
> all the message type codes and use those instead of having hard-coded
> constants everywhere.

+1, but I wonder where we should put those exactly.  My first thought
was postgres_ext.h, but the charter for that is

 * This file contains declarations of things that are visible everywhere
 *  in PostgreSQL *and* are visible to clients of frontend interface libraries.
 *  For example, the Oid type is part of the API of libpq and other libraries.

so picayune details of the wire protocol probably don't belong there.
Maybe we need a new header concerned with the wire protocol?

regards, tom lane

Re: Request for comment on setting binary format output per session

2023-04-18 Thread Robert Haas

On Mon, Apr 17, 2023 at 4:22 PM Tom Lane  wrote:
> The fact that we've gotten away without adding *any* new message types
> for about twenty years suggests to me that the growth rate isn't such
> that we need sub-message-types yet.  I'd keep the structure the same
> until such time as we can't choose a plausible code value for a new
> message, and then maybe add the "x-and-subtype" convention Jeff suggests.

One thing I think we should do in this area is introduce #defines for
all the message type codes and use those instead of having hard-coded
constants everywhere.

I'm not brave enough to tackle that day, but the only reason the
current situation isn't a disaster is because every place we use e.g.
'Z' we generally also have a comment that mentions ReadyForQuery. If
it weren't for that, this would be pretty un-greppable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Compression dictionaries for JSONB

2023-04-18 Thread Matthias van de Meent

On Tue, 18 Apr 2023 at 17:28, Aleksander Alekseev
 wrote:
>
> Hi Andres,
>
> > As I said, I don't think we should extend dictionaries. For this to work 
> > we'll
> > likely need a new / extended compressed toast datum header of some form, 
> > with
> > a reference to the dictionary. That'd likely be needed even with updatable
> > dictionaries, as we IIRC don't know which column a toasted datum is for, but
> > we need to know, to identify the dictionary. As we need that field anyway, 
> > we
> > can easily have multiple dictionaries.
>
> So I summarized the requirements we agreed on so far and ended up with
> the following list:
>
> * This is going to be a PostgreSQL feature, not an extension, not a
> bunch of hooks, etc;
> * We are not going to support lazy/partial decompression since this is
> too complicated in a general case and Postgres is not a specialized
> document-oriented DBMS (there is a normalization after all);
> * This should be a relation-level optimization option, not something
> visible to every user of the table (not a custom type, etc);
> * This is going to be an attribute-level compression;
> * The dictionaries should be created automatically (maybe not in a PoC
> but in the final implementation) since people are not good at it;
> * We are going to be using the existing compression algorithms like
> LZ4/ZSTD, not to invent new ones;
> * When created, a dictionary version is immutable, i.e. no new entries
> can be added. New version of a dictionary can be created when the data
> evolves. The compressed data stores the dictionary version used for
> compression. A dictionary version can't be deleted while data exists
> that uses this version of a dictionary;
> * Dictionaries are created automatically from sampled data during
> ANALIZE. We compare the efficiency of a new dictionary vs the
> efficiency of the old one (or the lack of such) on sampled data and
> depending on the results decide whether it's worth creating a new
> version of a dictionary;
> * This is going to work for multiple types: TEXT, JSON, JSONB, XML,
> BYTEA etc. Ideally for user-defined types too;

Any type with typlen < 0 should work, right?

> Hopefully I didn't miss anything.
>
> While thinking about how a user interface could look like it occured
> to me that what we are discussing could be merely a new STORAGE
> strategy. Currently there are PLAIN, MAIN, EXTERNAL and EXTENDED.
> Let's call a new strategy DICTIONARY, with typstorage = d.

The use of dictionaries should be dependent on only the use of a
compression method that supports pre-computed compression
dictionaries. I think storage=MAIN + compression dictionaries should
be supported, to make sure there is no expensive TOAST lookup for the
attributes of the tuple; but that doesn't seem to be an option with
that design.

> When user wants a given attribute to be compressed, he/she says:
>
> ALTER TABLE foo ALTER COLUMN bar SET STORAGE DICTIONARY;
>
> And the compression algorithms is chosen as usual:
>
> ALTER TABLE foo ALTER COLUMN bar SET COMPRESSION lz4;
>
> When there are no dictionaries yet, DICTIONARY works the same as
> EXTENDED. When a dictionary is trained the data is compressed using
> the latest version of this dictionary. For visibility we are going to
> need some sort of pg_stat_dictionaries view that shows the existing
> dictionaries, how much space they consume, etc.

I think "AT_AC SET COMPRESSION lz4 {[WITH | WITHOUT] DICTIONARY}",
"AT_AC SET COMPRESSION lz4-dictionary", or "AT_AC SET
compression_dictionary = on" would be better from a design
perspective.

> If we choose this approach there are a couple of questions/notes that
> come to mind:
>
> * The default compression algorithm is PGLZ and unlike LZ4 it doesn't
> support training dictionaries yet. This should be straightforward to
> implement though, or alternatively shared dictionaries could work only
> for LZ4;

Didn't we get zstd support recently as well?

> * Currently users have control of toast_tuple_target but not
> TOAST_TUPLE_THRESHOLD. Which means for tuples smaller than 1/4 of the
> page size shared dictionaries are not going to be triggered. Which is
> not necessarily a bad thing. Alternatively we could give the users
> toast_tuple_threshold setting. This shouldn't necessarily be part of
> this CF entry discussion however, we can always discuss it separately;

That makes a lot of sense, but as you said handling that separately
would probably be better and easier to review.

> * Should we allow setting DICTIONARY storage strategy for a given
> type, i.e. CREATE TYPE baz STORAGE = DICTIONARY? I suggest we forbid
> it in the first implementation, just for the sake of simplicity.

Can we specify a default compression method for each postgresql type,
just like how we specify the default storage? If not, then the setting
could realistically be in conflict with a default_toast_compression
setting, assuming that dictionary support is not a requirement for
column compression methods.

> *

Re: constants for tar header offsets

2023-04-18 Thread Tom Lane

Robert Haas  writes:
> We have a few different places in the code where we generate or modify
> tar headers or just read data out of them. The code in question uses
> one of my less-favorite programming things: magic numbers. The offsets
> of the various fields within the tar header are just hard-coded in
> each relevant place in our code. I think we should clean that up, as
> in the attached patch.

Generally +1, with a couple of additional thoughts:

1. Is it worth inventing macros for the values of the file type,
rather than just writing the comment you did?

2. The header size is defined as 512 bytes, but this doesn't sum to 512:

+   TAR_OFFSET_PREFIX = 345 /* 155 byte string */

Either that field's length is really 167 bytes, or there's some other
field you didn't document.  (It looks like you may have copied "155"
from an incorrect existing comment?)

> I hasten to emphasize that, while I think this is an improvement, I
> don't think the result is particularly awesome. Even with the patch,
> src/port/tar.c and src/include/pgtar.h do a poor job insulating
> callers from the details of the tar format. However, it's also not
> very clear to me how to fix that.

Yeah, this is adding greppability (a good thing!) but little more.
However, I'm not convinced it's worth doing more.  It's not like
this data structure will change anytime soon.

regards, tom lane

Re: allow_in_place_tablespaces vs. pg_basebackup

2023-04-18 Thread Robert Haas

On Mon, Apr 17, 2023 at 1:30 AM Michael Paquier  wrote:
> FWIW, doing that now rather than the beginning of July is OK for me
> for this stuff.

OK, committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

constants for tar header offsets

2023-04-18 Thread Robert Haas

Hi,

We have a few different places in the code where we generate or modify
tar headers or just read data out of them. The code in question uses
one of my less-favorite programming things: magic numbers. The offsets
of the various fields within the tar header are just hard-coded in
each relevant place in our code. I think we should clean that up, as
in the attached patch.

I hasten to emphasize that, while I think this is an improvement, I
don't think the result is particularly awesome. Even with the patch,
src/port/tar.c and src/include/pgtar.h do a poor job insulating
callers from the details of the tar format. However, it's also not
very clear to me how to fix that. For instance, I thought about
writing a function that parses a tar header into a struct and then
using it in all of these places, but that seems like it would lose too
much efficiency relative to the current ad-hoc coding. So for now I
don't have a better idea than this.

--
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Add-and-use-symbolic-constants-for-tar-header-off.patch
Description: Binary data

Re: Temporary tables versus wraparound... again

2023-04-18 Thread Greg Stark

Hm, in an optimized build using kernel perf I see this. But I don't
know how to find what the call sites are for LWLockAcquire/Release. If
it's the locks on pgproc that would be kind of bad.

I wonder if I should be gathering horizons once in the
PrecommitActions and then just using those for every temp table
somehow. Perhaps only actually doing an update if the relfrozenxid is
actually at least vacuum_freeze_table_age old.

   3.98%  postgres LWLockAcquire
   3.51%  postgres LWLockRelease
   3.18%  postgres hash_search_with_hash_value
   2.20%  postgres DropRelationLocalBuffers
   1.80%  [kernel] check_preemption_disabled
   1.52%  postgres hash_bytes
   1.27%  postgres LockAcquireExtended
   0.97%  postgres _bt_compare
   0.95%  [kernel] kmem_cache_alloc

I still think we should be applying the vacuum warning messages to
stable and probably backpatching. I've actually heard from other users
who have faced the same surprise wraparound shutdown.

Re: [PATCH] Compression dictionaries for JSONB

2023-04-18 Thread Aleksander Alekseev

Hi Andres,

> As I said, I don't think we should extend dictionaries. For this to work we'll
> likely need a new / extended compressed toast datum header of some form, with
> a reference to the dictionary. That'd likely be needed even with updatable
> dictionaries, as we IIRC don't know which column a toasted datum is for, but
> we need to know, to identify the dictionary. As we need that field anyway, we
> can easily have multiple dictionaries.

So I summarized the requirements we agreed on so far and ended up with
the following list:

* This is going to be a PostgreSQL feature, not an extension, not a
bunch of hooks, etc;
* We are not going to support lazy/partial decompression since this is
too complicated in a general case and Postgres is not a specialized
document-oriented DBMS (there is a normalization after all);
* This should be a relation-level optimization option, not something
visible to every user of the table (not a custom type, etc);
* This is going to be an attribute-level compression;
* The dictionaries should be created automatically (maybe not in a PoC
but in the final implementation) since people are not good at it;
* We are going to be using the existing compression algorithms like
LZ4/ZSTD, not to invent new ones;
* When created, a dictionary version is immutable, i.e. no new entries
can be added. New version of a dictionary can be created when the data
evolves. The compressed data stores the dictionary version used for
compression. A dictionary version can't be deleted while data exists
that uses this version of a dictionary;
* Dictionaries are created automatically from sampled data during
ANALIZE. We compare the efficiency of a new dictionary vs the
efficiency of the old one (or the lack of such) on sampled data and
depending on the results decide whether it's worth creating a new
version of a dictionary;
* This is going to work for multiple types: TEXT, JSON, JSONB, XML,
BYTEA etc. Ideally for user-defined types too;

Hopefully I didn't miss anything.

While thinking about how a user interface could look like it occured
to me that what we are discussing could be merely a new STORAGE
strategy. Currently there are PLAIN, MAIN, EXTERNAL and EXTENDED.
Let's call a new strategy DICTIONARY, with typstorage = d.

When user wants a given attribute to be compressed, he/she says:

ALTER TABLE foo ALTER COLUMN bar SET STORAGE DICTIONARY;

And the compression algorithms is chosen as usual:

ALTER TABLE foo ALTER COLUMN bar SET COMPRESSION lz4;

When there are no dictionaries yet, DICTIONARY works the same as
EXTENDED. When a dictionary is trained the data is compressed using
the latest version of this dictionary. For visibility we are going to
need some sort of pg_stat_dictionaries view that shows the existing
dictionaries, how much space they consume, etc.

If we choose this approach there are a couple of questions/notes that
come to mind:

* The default compression algorithm is PGLZ and unlike LZ4 it doesn't
support training dictionaries yet. This should be straightforward to
implement though, or alternatively shared dictionaries could work only
for LZ4;
* Currently users have control of toast_tuple_target but not
TOAST_TUPLE_THRESHOLD. Which means for tuples smaller than 1/4 of the
page size shared dictionaries are not going to be triggered. Which is
not necessarily a bad thing. Alternatively we could give the users
toast_tuple_threshold setting. This shouldn't necessarily be part of
this CF entry discussion however, we can always discuss it separately;
* Should we allow setting DICTIONARY storage strategy for a given
type, i.e. CREATE TYPE baz STORAGE = DICTIONARY? I suggest we forbid
it in the first implementation, just for the sake of simplicity.
* It looks like we won't be able to share a dictionary between
multiple columns. Which again is not necessarily a bad thing: data in
these columns can be completely different (e.g. BYTEA and XML),
columns can be dropped independently, etc. If a user is interested in
sharing a dictionary between several columns he/she can join these
columns in a single JSONB column.
* TOAST currently doesn't support ZSTD. IMO this is not a big deal and
adding the corresponding support can be discussed separately.
* If memory serves, there were not so many free bits left in TOAST
pointers. The pointers don't store a storage strategy though so
hopefully this will not be a problem. We'll see.

Please let me know what you think about all this. I'm going to prepare
an updated patch for the next CF so I could use early feedback.

-- 
Best regards,
Aleksander Alekseev

Re: Allowing parallel-safe initplans

2023-04-18 Thread Tom Lane

Richard Guo  writes:
> On Mon, Apr 17, 2023 at 11:04 PM Tom Lane  wrote:
>> I wondered about that too, but how come neither of us saw non-cosmetic
>> failures (ie, actual query output changes not just EXPLAIN changes)
>> when we tried this?

> Sorry I forgot to mention that I did see query output changes after
> moving the initPlans to the Gather node.

Hmm, my memory was just of seeing the EXPLAIN output changes, but
maybe those got my attention to the extent of missing the others.

> It seems that in this case the top_plan does not have any extParam, so
> the Gather node that is added atop the top_plan does not have a chance
> to get its initParam filled in set_param_references().

Oh, so maybe we'd need to copy up extParam as well?  But it's largely
moot, since I don't see a good way to avoid breaking the EXPLAIN
output.

regards, tom lane

pg_collation.collversion for C.UTF-8

2023-04-18 Thread Daniel Verite

 Hi,

get_collation_actual_version() in pg_locale.c currently 
excludes C.UTF-8 (and more generally C.*) from versioning,
which makes pg_collation.collversion being empty for these
collations.

char *
get_collation_actual_version(char collprovider, const char *collcollate)
{

if (collprovider == COLLPROVIDER_LIBC &&
pg_strcasecmp("C", collcollate) != 0 &&
pg_strncasecmp("C.", collcollate, 2) != 0 &&
pg_strcasecmp("POSIX", collcollate) != 0)

This seems to be based on the idea that C.* collations provide an
immutable sort like "C", but it appears that it's not the case.

For instance, consider how these C.UTF-8 comparisons differ between
recent linux systems:

U+1D400 = Mathematical Bold Capital A

Debian 9.13 (glibc 2.24)
=> select  'A' < E'\U0001D400' collate "C.UTF-8";
 ?column? 
--
 t

Debian 10.13 (glibc 2.28)
=> select  'A' < E'\U0001D400' collate "C.UTF-8";
 ?column? 
--
 f

Debian 11.6 (glibc 2.31)
=> select  'A' < E'\U0001D400' collate "C.UTF-8";
 ?column? 
--
 f

Ubuntu 22.04 (glibc 2.35)
=> select  'A' < E'\U0001D400' collate "C.UTF-8";
 ?column? 
--
 t

So I suggest the attached patch to no longer exclude these collations
from the generic versioning.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index 092b620673..dd9c3a0c10 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1727,7 +1727,6 @@ get_collation_actual_version(char collprovider, const 
char *collcollate)
 #endif
if (collprovider == COLLPROVIDER_LIBC &&
pg_strcasecmp("C", collcollate) != 0 &&
-   pg_strncasecmp("C.", collcollate, 2) != 0 &&
pg_strcasecmp("POSIX", collcollate) != 0)
{
 #if defined(__GLIBC__)

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-18 Thread Daniel Gustafsson

> On 17 Apr 2023, at 18:20, Jacob Champion  wrote:

> Note that it won't fix the
> (completely different) misleading error message for OpenSSL 3.0, but
> since that's an *actively* unhelpful error message coming back from
> OpenSSL, I don't think we want to override it.

Agreed, the best we can do there is to memorize it in the test.

--
Daniel Gustafsson

Re: Adding argument names to aggregate functions

2023-04-18 Thread Jim Jones


On 18.04.23 12:27, Dagfinn Ilmari Mannsåker wrote:

Link to the actual job:
https://cirrus-ci.com/task/5881376021413888

The failure was:

[09:54:38.727] 216/262 postgresql:recovery / recovery/031_recovery_conflict 
ERROR 198.73s exit status 60

Looking at its log:

https://api.cirrus-ci.com/v1/artifact/task/5881376021413888/testrun/build/testrun/recovery/031_recovery_conflict/log/regress_log_031_recovery_conflict

we see:

timed out waiting for match: (?^:User was holding a relation lock for too long) 
at /Users/admin/pgsql/src/test/recovery/t/031_recovery_conflict.pl line 311.

That looks indeed completely unrelated to this patch.


Yes, that's what I suspected. The patch passes all tests now :)

I've marked the CF entry as "Ready for Committer".

Jim

Re: Adding argument names to aggregate functions

2023-04-18 Thread Dagfinn Ilmari Mannsåker

Jim Jones  writes:

> On 18.04.23 10:58, I wrote:
>> On 14.04.23 12:03, Dagfinn Ilmari Mannsåker wrote:
>>> Thanks for the heads-up, here's a rebased patch. I've also formatted
>>> the lines to match what reformat_dat_file.pl wants.  It also wanted to
>>> reformat a bunch of other entries, but I left those alone.
>>>
>>> - ilmari
>>
>> The patch applies cleanly now and \df shows the argument names:
>>
>> postgres=# \df string_agg
>>     List of functions
>>    Schema   |    Name    | Result data type | Argument data
>> types  | Type
>> ++--+--+--
>>  
>>  pg_catalog | string_agg | bytea    | value bytea, delimiter bytea | 
>> agg
>>  pg_catalog | string_agg | text | value text, delimiter text   | 
>> agg
>> (2 rows)
>>
>> postgres=# \df json_object_agg
>>     List of functions
>>    Schema   |  Name   | Result data type |  Argument data
>> types   | Type
>> +-+--++--
>>  
>>  pg_catalog | json_object_agg | json | key "any", value "any" | 
>> agg
>> (1 row)
>>
>>
>> I'm wondering if there are some sort of guidelines that dictate when
>> to name an argument or not. It would be nice to have one for future 
>> reference.

I seemed to recall a patch to add arugment names to a bunch of functions
in the past, thinking that might have some guidance, but can't for the
life of me find it now.

>> I will mark the CF entry as "Read for Committer" and let the
>> committers decide if it's best to first create a guideline for that or 
>> not.
>>
>> Best, Jim
>>
> I just saw that the patch is failing[1] on "macOS - Ventura -
> Meson". Not sure if it is related to this patch though ..
>
> [1]
> https://api.cirrus-ci.com/v1/artifact/task/5881376021413888/meson_log/build/meson-logs/meson-log.txt

Link to the actual job:

https://cirrus-ci.com/task/5881376021413888

The failure was:

[09:54:38.727] 216/262 postgresql:recovery / recovery/031_recovery_conflict 
ERROR 198.73s exit status 60

Looking at its log:

https://api.cirrus-ci.com/v1/artifact/task/5881376021413888/testrun/build/testrun/recovery/031_recovery_conflict/log/regress_log_031_recovery_conflict

we see:

timed out waiting for match: (?^:User was holding a relation lock for too long) 
at /Users/admin/pgsql/src/test/recovery/t/031_recovery_conflict.pl line 311.

That looks indeed completely unrelated to this patch.

- ilmari

Re: Incremental sort for access method with ordered scan support (amcanorderbyop)

2023-04-18 Thread David Rowley

On Tue, 18 Apr 2023 at 19:29, Miroslav Bendik  wrote:
> here is an updated patch with proposed changes.

Here's a quick review:

1. I don't think this is required. match_pathkeys_to_index() sets
these to NIL and they're set accordingly by the other code paths.

- List*orderbyclauses;
- List*orderbyclausecols;
+ List*orderbyclauses = NIL;
+ List*orderbyclausecols = NIL;

2. You can use list_copy_head(root->query_pathkeys,
list_length(orderbyclauses)); instead of:

+ useful_pathkeys = list_truncate(list_copy(root->query_pathkeys),
+ list_length(orderbyclauses));

3. The following 2 changes don't seem to be needed:

@@ -3104,11 +3100,11 @@ match_pathkeys_to_index(IndexOptInfo *index,
List *pathkeys,
  /* Pathkey must request default sort order for the target opfamily */
  if (pathkey->pk_strategy != BTLessStrategyNumber ||
  pathkey->pk_nulls_first)
- return;
+ break;

  /* If eclass is volatile, no hope of using an indexscan */
  if (pathkey->pk_eclass->ec_has_volatile)
- return;
+ break;

There's no code after the loop you're breaking out of, so it seems to
me that return is the same as break and there's no reason to change
it.

David

Re: Adding argument names to aggregate functions

2023-04-18 Thread Jim Jones


On 18.04.23 10:58, I wrote:

On 14.04.23 12:03, Dagfinn Ilmari Mannsåker wrote:

Thanks for the heads-up, here's a rebased patch. I've also formatted
the lines to match what reformat_dat_file.pl wants.  It also wanted to
reformat a bunch of other entries, but I left those alone.

- ilmari


The patch applies cleanly now and \df shows the argument names:

postgres=# \df string_agg
    List of functions
   Schema   |    Name    | Result data type | Argument data 
types  | Type
++--+--+-- 

 pg_catalog | string_agg | bytea    | value bytea, delimiter 
bytea | agg
 pg_catalog | string_agg | text | value text, delimiter 
text   | agg

(2 rows)

postgres=# \df json_object_agg
    List of functions
   Schema   |  Name   | Result data type |  Argument data 
types   | Type
+-+--++-- 

 pg_catalog | json_object_agg | json | key "any", value 
"any" | agg

(1 row)


I'm wondering if there are some sort of guidelines that dictate when 
to name an argument or not. It would be nice to have one for future 
reference.


I will mark the CF entry as "Read for Committer" and let the 
committers decide if it's best to first create a guideline for that or 
not.


Best, Jim

I just saw that the patch is failing[1] on "macOS - Ventura - Meson". 
Not sure if it is related to this patch though ..


[1] 
https://api.cirrus-ci.com/v1/artifact/task/5881376021413888/meson_log/build/meson-logs/meson-log.txt

Tab completion for GRANT MAINTAIN

2023-04-18 Thread Ken Kato


Hi hackers,

I found that GRANT MAINTAIN is not tab-completed with ON, so here is a 
patch.


Best wishes,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5825b2a195..bd04244969 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3902,7 +3902,7 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches("GRANT|REVOKE", MatchAny) ||
 			 TailMatches("REVOKE", "GRANT", "OPTION", "FOR", MatchAny))
 	{
-		if (TailMatches("SELECT|INSERT|UPDATE|DELETE|TRUNCATE|REFERENCES|TRIGGER|CREATE|CONNECT|TEMPORARY|TEMP|EXECUTE|USAGE|ALL"))
+		if (TailMatches("SELECT|INSERT|UPDATE|DELETE|TRUNCATE|REFERENCES|TRIGGER|CREATE|CONNECT|TEMPORARY|TEMP|EXECUTE|USAGE|MAINTAIN|ALL"))
 			COMPLETE_WITH("ON");
 		else if (TailMatches("GRANT", MatchAny))
 			COMPLETE_WITH("TO");

Support "Right Semi Join" plan shapes

2023-04-18 Thread Richard Guo

In thread [1] which discussed 'Right Anti Join', Tom once mentioned
'Right Semi Join'.  After a preliminary investigation I think it is
beneficial and can be implemented with very short change.  With 'Right
Semi Join', what we want to do is to just have the first match for each
inner tuple.  For HashJoin, after scanning the hash bucket for matches
to current outer, we just need to check whether the inner tuple has been
set match and skip it if so.  For MergeJoin, we can do it by avoiding
restoring inner scan to the marked tuple in EXEC_MJ_TESTOUTER, in the
case when new outer tuple == marked tuple.

As that thread is already too long, fork a new thread and attach a patch
used for discussion.  The patch implements 'Right Semi Join' for
HashJoin.

[1]
https://www.postgresql.org/message-id/CAMbWs4_eChX1bN%3Dvj0Uzg_7iz9Uivan%2BWjjor-X87L-V27A%2Brw%40mail.gmail.com

Thanks
Richard


v1-0001-Support-Right-Semi-Join-plan-shapes.patch
Description: Binary data

Re: Adding argument names to aggregate functions

2023-04-18 Thread Jim Jones


On 14.04.23 12:03, Dagfinn Ilmari Mannsåker wrote:

Thanks for the heads-up, here's a rebased patch. I've also formatted
the lines to match what reformat_dat_file.pl wants.  It also wanted to
reformat a bunch of other entries, but I left those alone.

- ilmari


The patch applies cleanly now and \df shows the argument names:

postgres=# \df string_agg
    List of functions
   Schema   |    Name    | Result data type | Argument data 
types  | Type

++--+--+--
 pg_catalog | string_agg | bytea    | value bytea, delimiter 
bytea | agg
 pg_catalog | string_agg | text | value text, delimiter 
text   | agg

(2 rows)

postgres=# \df json_object_agg
    List of functions
   Schema   |  Name   | Result data type |  Argument data 
types   | Type

+-+--++--
 pg_catalog | json_object_agg | json | key "any", value 
"any" | agg

(1 row)


I'm wondering if there are some sort of guidelines that dictate when to 
name an argument or not. It would be nice to have one for future reference.


I will mark the CF entry as "Read for Committer" and let the committers 
decide if it's best to first create a guideline for that or not.


Best, Jim

Re: Fix documentation for max_wal_size and min_wal_size

2023-04-18 Thread sirisha chamarthi

Hi

On Mon, Apr 17, 2023 at 9:38 PM Michael Paquier  wrote:

> On Mon, Apr 17, 2023 at 07:57:58PM -0700, sirisha chamarthi wrote:
> > On Fri, Apr 14, 2023 at 1:01 AM Kyotaro Horiguchi <
> horikyota@gmail.com>
> > wrote:
> >> So, I personally think it should be written like this: "The default
> >> size is 80MB. However, if you have changed the WAL segment size from
> >> the default of 16MB, it will be five times the segment size.", but I'm
> >> not sure what the others think about this..
>
> Yes, I was under the impression that this should mention 16MB, but
> I'd also add a note about initdb when a non-default value is specified
> for the segment size.
>

How about the text below?

"The default size is 80MB. However, if you have changed the WAL segment size
 from the default of 16MB with the initdb option --wal-segsize, it will be
five times the segment size."

Re: Incremental sort for access method with ordered scan support (amcanorderbyop)

2023-04-18 Thread Miroslav Bendik

po 17. 4. 2023 o 15:26 Richard Guo  napísal(a):
>
>
> On Sun, Apr 16, 2023 at 1:20 AM Miroslav Bendik  
> wrote:
>>
>> Postgres allows incremental sort only for ordered indexes. Function
>> build_index_paths dont build partial order paths for access methods
>> with order support. My patch adds support for incremental ordering
>> with access method.
>
>
> I think this is a meaningful optimization.  I reviewed the patch and
> here are the comments from me.
>
> * I understand the new param 'match_pathkeys_length_p' is used to tell
> how many presorted keys are useful.  I think list_length(orderbyclauses)
> will do the same.  So there is no need to add the new param, thus we can
> reduce the code diffs.
>
> * Now that match_pathkeys_to_index() returns a prefix of the pathkeys
> rather than returns NIL immediately when there is a failure to match, it
> seems the two local variables 'orderby_clauses' and 'clause_columns' are
> not necessary any more.  I think we can instead lappend the matched
> 'expr' and 'indexcol' to '*orderby_clauses_p' and '*clause_columns_p'
> directly.  In this way we can still call 'return' when we come to a
> failure to match.
>
> * In build_index_paths(), I think the diff can be reduced to
>
> -if (orderbyclauses)
> -useful_pathkeys = root->query_pathkeys;
> -else
> -useful_pathkeys = NIL;
> +useful_pathkeys = list_truncate(list_copy(root->query_pathkeys),
> +list_length(orderbyclauses));
>
> * Several comments in match_pathkeys_to_index() are out of date.  We
> need to revise them to cope with the change.
>
> * I think it's better to provide a test case.
>
> Thanks
> Richard

Thank you for advice,
here is an updated patch with proposed changes.

-- 
Best regards
Miroslav
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index b8000da56d..202779fb10 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -864,8 +864,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	List	   *index_clauses;
 	Relids		outer_relids;
 	double		loop_count;
-	List	   *orderbyclauses;
-	List	   *orderbyclausecols;
+	List	   *orderbyclauses = NIL;
+	List	   *orderbyclausecols = NIL;
 	List	   *index_pathkeys;
 	List	   *useful_pathkeys;
 	bool		found_lower_saop_clause;
@@ -1001,10 +1001,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		match_pathkeys_to_index(index, root->query_pathkeys,
 &orderbyclauses,
 &orderbyclausecols);
-		if (orderbyclauses)
-			useful_pathkeys = root->query_pathkeys;
-		else
-			useful_pathkeys = NIL;
+		useful_pathkeys = list_truncate(list_copy(root->query_pathkeys),
+		list_length(orderbyclauses));
 	}
 	else
 	{
@@ -3071,16 +3069,14 @@ expand_indexqual_rowcompare(PlannerInfo *root,
  * index column numbers (zero based) that each clause would be used with.
  * NIL lists are returned if the ordering is not achievable this way.
  *
- * On success, the result list is ordered by pathkeys, and in fact is
- * one-to-one with the requested pathkeys.
+ * On success, the result list is ordered by pathkeys. Result can be shorter
+ * than pathkeys on partial prefix match.
  */
 static void
 match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys,
 		List **orderby_clauses_p,
 		List **clause_columns_p)
 {
-	List	   *orderby_clauses = NIL;
-	List	   *clause_columns = NIL;
 	ListCell   *lc1;
 
 	*orderby_clauses_p = NIL;	/* set default results */
@@ -3104,11 +3100,11 @@ match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys,
 		/* Pathkey must request default sort order for the target opfamily */
 		if (pathkey->pk_strategy != BTLessStrategyNumber ||
 			pathkey->pk_nulls_first)
-			return;
+			break;
 
 		/* If eclass is volatile, no hope of using an indexscan */
 		if (pathkey->pk_eclass->ec_has_volatile)
-			return;
+			break;
 
 		/*
 		 * Try to match eclass member expression(s) to index.  Note that child
@@ -3145,8 +3141,8 @@ match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys,
    pathkey->pk_opfamily);
 if (expr)
 {
-	orderby_clauses = lappend(orderby_clauses, expr);
-	clause_columns = lappend_int(clause_columns, indexcol);
+	*orderby_clauses_p = lappend(*orderby_clauses_p, expr);
+	*clause_columns_p = lappend_int(*clause_columns_p, indexcol);
 	found = true;
 	break;
 }
@@ -3156,12 +3152,9 @@ match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys,
 break;
 		}
 
-		if (!found)/* fail if no match for this pathkey */
+		if (!found)/* end after first not matching expression */
 			return;
 	}
-
-	*orderby_clauses_p = orderby_clauses;	/* success! */
-	*clause_columns_p = clause_columns;
 }
 
 /*
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 0a631124c2..dcf2c0762d 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/

Re: Allowing parallel-safe initplans

2023-04-18 Thread Richard Guo

On Mon, Apr 17, 2023 at 11:04 PM Tom Lane  wrote:

> Richard Guo  writes:
> > So now it seems that the breakage of regression tests is more severe
> > than being cosmetic.  I wonder if we need to update the comments to
> > indicate the potential wrong results issue if we move the initPlans to
> > the Gather node.
>
> I wondered about that too, but how come neither of us saw non-cosmetic
> failures (ie, actual query output changes not just EXPLAIN changes)
> when we tried this?  Maybe the case is somehow not exercised, but if
> so I'm more worried about adding regression tests than comments.

Sorry I forgot to mention that I did see query output changes after
moving the initPlans to the Gather node.  First of all let me make sure
I was doing it in the right way.  On the base of the patch, I was using
the diff as below

if (debug_parallel_query != DEBUG_PARALLEL_OFF &&
-   top_plan->parallel_safe && top_plan->initPlan == NIL)
+   top_plan->parallel_safe)
{
Gather *gather = makeNode(Gather);

+   gather->plan.initPlan = top_plan->initPlan;
+   top_plan->initPlan = NIL;
+
gather->plan.targetlist = top_plan->targetlist;

And then I changed the default value of debug_parallel_query to
DEBUG_PARALLEL_REGRESS.  And then I just ran 'make installcheck' and saw
the query output changes.

> I think actually that it does work beyond the EXPLAIN weirdness,
> because since e89a71fb4 the Gather machinery knows how to transmit
> the values of Params listed in Gather.initParam to workers, and that
> is filled in setrefs.c in a way that looks like it'd work regardless
> of whether the Gather appeared organically or was stuck on by the
> debug_parallel_query hackery.  I've not tried to verify that
> directly though.

It seems that in this case the top_plan does not have any extParam, so
the Gather node that is added atop the top_plan does not have a chance
to get its initParam filled in set_param_references().

Thanks
Richard

54 matches

Mail list logo