Re: [PATCH] Allow Postgres to pick an unused port to listen
Stephen, > You could just drop another file into the data directory that just > contains > > the port number ($PGDATA/port). However, if we ever do multiple ports, > that > > would still require a change in the format of that file, so I don't know > if > > that's actually better than a). > I find it difficult to get anything done under the restriction of "what if we ever need to change X?" as it is difficult to address something that doesn't exist or hasn't been planned. A fine and delicate balance of anticipating what may happen theoretically and what's more likely happen is an art. It's also important to consider the impact of a breaking change. It's one thing if we have to break, say, an SQL function signature or SQL syntax itself, and another if it is a relatively small feature related to the administration of a server (in this case, more like scripting a test bench). > > If we did a port per line then it wouldn't be changing the format of the > first line, so that might not be all that bad. > If we consider this path, then (if we assume the format of the file is still to be private), we can make the port line accept multiple ports using a delimiter like `:` so that the next line still remains the same. That being said, if the format is private to Postgres, it's all minor considerations. > > I don't think involving pg_ctl is necessary or desirable, since it would > > make any future changes like that even more complicated. > > I'm a bit confused by this- if pg_ctl is invoked then we have > more-or-less full control over parsing and reporting out the answer, so > while it might be a bit more complicated for us, it seems surely simpler > for the end user. Or maybe you're referring to something here that I'm > not thinking of? > I would love to learn about this as well. > > Independent of the above though ... this hand-wringing about what we > might do in the relative near-term when we haven't done much in the past > many-many years regarding listen_addresses or port strikes me as > unlikely to be necessary. Let's pick something and get it done and > accept that we may have to change it at some point in the future, but > that's kinda what major releases are for, imv anyway. > That's how I see it, too. I tried to make this change as small as possible to appreciate the fact that all of this may change one day if or when that portion of Postgres will be due for a major redesign. I'd be happy to contribute to that process, but in the meantime, I am looking for the simplest reasonable way to achieve a relatively specific use case. Personally, I am fine with reading the `.pid` file and accepting that it _may_ change in the future; I am also fine with amending the patch to add functionality to pg_ctl or adding a new file. To keep everybody's cognitive load low, I'd rather not flood the thread with multiple alternative implementations (unless that's desirable) and just go for something we can agree on. (I consider this feature so small that it doesn't deserve such a lengthy discussion. However, I also get Tom's point about how we document this feature's use, which is very valid and valuable. If it was up to me entirely, I'd probably just document `postmaster.pid` and call it a day. If it ever breaks, that's a major release territory. Otherwise, amending `pg_ctl` to access information like this in a uniform way is also a good approach if we want to keep the format of the pid file private.) -- Y.
Re: [PATCH] Allow Postgres to pick an unused port to listen
Greetings, * Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote: > On 13.04.23 04:45, Yurii Rashkovskii wrote: > > But getting your agreement is important to get this in; I am willing to > > play along and resolve both (1) and (2) in one go. As for the > > implementation approach for (2), which of the following options would > > you prefer? > > > > a) Document postmaster.pid as it stands today > > b) Expose the port number through pg_ctl (*my personal favorite) > > c) Redesign its content below line 1 to make it extensible (make unnamed > > lines named, for example) > > > > If none of the above options suit you, do you have a strategy you'd prefer? > > You could just drop another file into the data directory that just contains > the port number ($PGDATA/port). However, if we ever do multiple ports, that > would still require a change in the format of that file, so I don't know if > that's actually better than a). If we did a port per line then it wouldn't be changing the format of the first line, so that might not be all that bad. > I don't think involving pg_ctl is necessary or desirable, since it would > make any future changes like that even more complicated. I'm a bit confused by this- if pg_ctl is invoked then we have more-or-less full control over parsing and reporting out the answer, so while it might be a bit more complicated for us, it seems surely simpler for the end user. Or maybe you're referring to something here that I'm not thinking of? Independent of the above though ... this hand-wringing about what we might do in the relative near-term when we haven't done much in the past many-many years regarding listen_addresses or port strikes me as unlikely to be necessary. Let's pick something and get it done and accept that we may have to change it at some point in the future, but that's kinda what major releases are for, imv anyway. Thanks! Stpehen signature.asc Description: PGP signature
Re: [PATCH] Allow Postgres to pick an unused port to listen
On 13.04.23 04:45, Yurii Rashkovskii wrote: But getting your agreement is important to get this in; I am willing to play along and resolve both (1) and (2) in one go. As for the implementation approach for (2), which of the following options would you prefer? a) Document postmaster.pid as it stands today b) Expose the port number through pg_ctl (*my personal favorite) c) Redesign its content below line 1 to make it extensible (make unnamed lines named, for example) If none of the above options suit you, do you have a strategy you'd prefer? You could just drop another file into the data directory that just contains the port number ($PGDATA/port). However, if we ever do multiple ports, that would still require a change in the format of that file, so I don't know if that's actually better than a). I don't think involving pg_ctl is necessary or desirable, since it would make any future changes like that even more complicated.
Re: Use INT_MAX for wal size related gucs's max value
These gucs are always used with ConvertToXSegs, to calculate the count of wal segments(see the following code snip), and wal_segment_size can be configured by initdb as a value of a power of 2 between 1 and 1024 (megabytes), so I think INT_MAX should be safe here. /* * Convert values of GUCs measured in megabytes to equiv. segment count. * Rounds down. */ #define ConvertToXSegs(x, segsize) XLogMBVarToSegs((x), (segsize)) /* * Convert values of GUCs measured in megabytes to equiv. segment count. * Rounds down. */ #define XLogMBVarToSegs(mbvar, wal_segsz_bytes) \ ((mbvar) / ((wal_segsz_bytes) / (1024 * 1024))) On Wed, Apr 19, 2023 at 11:33 AM Tom Lane wrote: > > Junwang Zhao writes: > > The wal size related gucs use the MB unit, so we should just use > > INT_MAX instead of MAX_KILOBYTES as the max value. > > The point of MAX_KILOBYTES is to avoid overflow when the value > is multiplied by 1kB. It does seem like that might not be > appropriate for these values, but that doesn't mean that we can > blithely go to INT_MAX. Have you chased down how they are used? > > regards, tom lane -- Regards Junwang Zhao
Re: Use INT_MAX for wal size related gucs's max value
Junwang Zhao writes: > The wal size related gucs use the MB unit, so we should just use > INT_MAX instead of MAX_KILOBYTES as the max value. The point of MAX_KILOBYTES is to avoid overflow when the value is multiplied by 1kB. It does seem like that might not be appropriate for these values, but that doesn't mean that we can blithely go to INT_MAX. Have you chased down how they are used? regards, tom lane
Use INT_MAX for wal size related gucs's max value
The wal size related gucs use the MB unit, so we should just use INT_MAX instead of MAX_KILOBYTES as the max value. -- Regards Junwang Zhao 0001-use-INT_MAX-for-wal-size-related-max-value.patch Description: Binary data
Re: Allowing parallel-safe initplans
On Tue, Apr 18, 2023 at 9:33 PM Tom Lane wrote: > Richard Guo writes: > > It seems that in this case the top_plan does not have any extParam, so > > the Gather node that is added atop the top_plan does not have a chance > > to get its initParam filled in set_param_references(). > > Oh, so maybe we'd need to copy up extParam as well? But it's largely > moot, since I don't see a good way to avoid breaking the EXPLAIN > output. Yeah, seems breaking the EXPLAIN output is inevitable if we move the initPlans to the Gather node. So maybe we need to keep the logic as in v1 patch, i.e. avoid adding a Gather node when top_plan has initPlans. If we do so, I wonder if we need to explain the potential wrong results issue in the comments. Thanks Richard
Re: check_strxfrm_bug()
On 4/18/23 9:19 PM, Thomas Munro wrote: On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier wrote: On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote: +1 for getting rid of TRUST_STRXFRM. +1 The situation is not improving fast, and requires hard work to follow on each OS. Clearly, mainstreaming ICU is the way to go. libc support will always have niche uses, to be compatible with other software on the box, but trusting strxfrm doesn't seem to be on the cards any time soon. [RMT hat, personal opinion on RMT] To be clear, is the proposal to remove both "check_strxfrm_bug" and "TRUST_STRXFRM"? Given a bunch of folks who have expertise in this area of code all agree with removing the above as part of the collation cleanups targeted for v16, I'm inclined to agree. I don't really see the need for an explicit RMT action, but based on the consensus this seems OK to add as an open item. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: pg_collation.collversion for C.UTF-8
On Wed, Apr 19, 2023 at 1:30 PM Jeff Davis wrote: > On Wed, 2023-04-19 at 07:48 +1200, Thomas Munro wrote: > > Many OSes have a locale with this name. I don't know this history, > > who did it first etc, but now I am wondering if they all took the > > "obvious" interpretation, that it should be code-point based, > > extrapolating from "C" (really memcmp order): > > memcmp() is not the same as code-point order in all encodings, right? Right. I wasn't trying to suggest that *we* should assume that, I was just thinking out loud about how a libc implementor would surely think that a "C.encoding" should work, in the spirit of "C", given that the standard doesn't tell us IIUC. It looks like for technical reasons inside glibc, that couldn't be done before 2.35: https://sourceware.org/bugzilla/show_bug.cgi?id=17318 That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied by the glibc project) isn't supposed to be versioned, but it's extremely unfortunate that a bunch of OSes (Debian and maybe more) have been sorting text in some other order under that name for years. > I've been thinking that we should have a "provider=none" for the > special cases that use memcmp(). It's not using libc as a collation > provider; it's really postgres in control of the semantics. Yeah, interesting idea.
Re: pg_collation.collversion for C.UTF-8
On Wed, 2023-04-19 at 07:48 +1200, Thomas Munro wrote: > Many OSes have a locale with this name. I don't know this history, > who did it first etc, but now I am wondering if they all took the > "obvious" interpretation, that it should be code-point based, > extrapolating from "C" (really memcmp order): memcmp() is not the same as code-point order in all encodings, right? I've been thinking that we should have a "provider=none" for the special cases that use memcmp(). It's not using libc as a collation provider; it's really postgres in control of the semantics. That would clean up the documentation and the code a bit, and make it more clear which locales are being passed to the provider and which ones aren't. If we are passing it to a provider (e.g. "C.UTF-8"), we shouldn't make unnecessary assumptions about what the provider will do with it. For what it's worth, in my recent ICU language tag work, I special- cased ICU locales with language "C" or "POSIX" to map to "en-US-u-va- posix", disregarding everything else (collation attributes, etc.). I believe that's the right thing based on the behavior I observed: for the POSIX variant of en-US, ICU seems to disregard other things such as case insensitivity. But it still ultimately goes to the provider and ICU has particular rules for that locale -- I don't assume memcpy-like semantics or code point order. Regards, Jeff Davis
Re: check_strxfrm_bug()
On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier wrote: > On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote: > > +1 for getting rid of TRUST_STRXFRM. +1 The situation is not improving fast, and requires hard work to follow on each OS. Clearly, mainstreaming ICU is the way to go. libc support will always have niche uses, to be compatible with other software on the box, but trusting strxfrm doesn't seem to be on the cards any time soon.
Re: Should we put command options in alphabetical order in the doc?
On Tue, Apr 18, 2023 at 4:30 PM Peter Geoghegan wrote: > > I'd be interested to know if you could tell me if SKIP_LOCKED has more > > importance than INDEX_CLEANUP, for example. If you can, it would seem > > like trying to say apples are more important than oranges, or > > vice-versa. > > I don't accept your premise that the only thing that matters (or the > most important thing) is adherence to some unambiguous and consistent > order. In the case of VACUUM, the current devel order is: FULL, FREEZE, VERBOSE, ANALYZE, DISABLE_PAGE_SKIPPING, SKIP_LOCKED, INDEX_CLEANUP, PROCESS_MAIN, PROCESS_TOAST, TRUNCATE, PARALLEL, SKIP_DATABASE_STATS, ONLY_DATABASE_STATS, BUFFER_USAGE_LIMIT I think that this order is far superior to alphabetical order, which is tantamount to random order. The first 4 items are indeed the really important ones to users, in my experience. I do have some minor quibbles beyond that, though. These are: * PARALLEL deserves to be at the start, maybe 4th or 5th overall. * DISABLE_PAGE_SKIPPING should be later, since it's really only a testing option that probably never proved useful in production. In particular, it has little business being before SKIP_LOCKED, which is much more important and relevant. * TRUNCATE and INDEX_CLEANUP are similar options, and ought to be side by side. I would put PROCESS_MAIN and PROCESS_TOAST after those two for the same reason. While I'm certain that nobody will agree with me on every little detail, I have to imagine that most would find my preferred ordering quite understandable and unsurprising, at a high level -- this is not a hopelessly idiosyncratic ranking, that could just as easily have been generated by a PRNG. People may not easily agree that "apples are more important than oranges, or vice-versa", but what does it matter? I've really only put each option into buckets of items with *roughly* the same importance. All of the details beyond that don't matter to me, at all. -- Peter Geoghegan
Re: allow_in_place_tablespaces vs. pg_basebackup
On Tue, Apr 18, 2023 at 11:35:41AM -0400, Robert Haas wrote: > On Mon, Apr 17, 2023 at 1:30 AM Michael Paquier wrote: >> FWIW, doing that now rather than the beginning of July is OK for me >> for this stuff. > > OK, committed. Thanks! -- Michael signature.asc Description: PGP signature
Re: Should we put command options in alphabetical order in the doc?
On Tue, Apr 18, 2023 at 4:18 PM David Rowley wrote: > "Importance order" just seems horribly subjective to me. Alphabetical order seems objectively bad. At least to me. > I'd be interested to know if you could tell me if SKIP_LOCKED has more > importance than INDEX_CLEANUP, for example. If you can, it would seem > like trying to say apples are more important than oranges, or > vice-versa. I don't accept your premise that the only thing that matters (or the most important thing) is adherence to some unambiguous and consistent order. -- Peter Geoghegan
Re: Should we put command options in alphabetical order in the doc?
On Tue, 18 Apr 2023 at 18:53, Peter Geoghegan wrote: > Take the VACUUM command. Right now FULL, FREEZE, and VERBOSE all come > first. Those options are approximately the most important options -- > especially VERBOSE. But your patch places VERBOSE dead last. hmm, how can we verify that the options are kept in order of importance? What guidance can we provide to developers adding options about where they should slot in the new option to the docs? "Importance order" just seems horribly subjective to me. I'd be interested to know if you could tell me if SKIP_LOCKED has more importance than INDEX_CLEANUP, for example. If you can, it would seem like trying to say apples are more important than oranges, or vice-versa. David
Enhanced rmgr desc routines test !has_image, not has_data
Recent commits that enhanced rmgr desc routines (commits 7d8219a4 and 1c453cfd) dealt with records that lack relevant block data (and so lack anything to give a more detailed summary of) by testing !DecodedBkpBlock.has_image -- that is the gating condition that determines if we want to (say) output a textual array representation of the page offset number from a given nbtree VACUUM WAL record. Strictly speaking, this isn't the correct gating condition to test. We should be testing the *presence* of the relevant block data instead. Why test an inexact proxy for the condition that we care about, when we can just as easily test the precise condition we care about instead? This isn't just a theoretical issue. Currently, we won't display detailed descriptions of block data whenever wal_consistency_checking happens to be in use. At least for those records with relevant block data available to summarize that also happen to have an FPI that the REDO routine isn't supposed to apply (i.e. an FPI that is included in the record purely so that verifyBackupPageConsistency can verify that the REDO routine produces a matching image). Attached patch fixes this bug. -- Peter Geoghegan v1-0001-Test-has_data-in-rmgr-desc-routines.patch Description: Binary data
Re: Request for comment on setting binary format output per session
On Mon, 17 Apr 2023 at 16:22, Tom Lane wrote: > > I tend to agree with the proposition that we aren't going to add new > message types very often, as long as we're careful to make them general > purpose. Don't forget that adding a new message type isn't just a matter > of writing some spec text --- there has to be code backing it up. We > will never introduce thousands of new message types, or if we do, > somebody factored it wrong and put data into the type code. Well the way I understood Robert's proposal would be that you would set a protocol option which could be some name like SuperDuperExtension and then later send an extended message like X SuperDuper Extension ... The point being not so much that it saves on message types but that it becomes possible for the wire protocol code to recognize the message type and know which extension's code to call back to. Presumably a callback was registered when the option was negotiated. > The fact that we've gotten away without adding *any* new message types > for about twenty years suggests to me that the growth rate isn't such > that we need sub-message-types yet. I'd keep the structure the same > until such time as we can't choose a plausible code value for a new > message, and then maybe add the "x-and-subtype" convention Jeff suggests. Fwiw I've had at least two miniprojects that would eventually have led to protocol extensions. Like most of my projects they're not finished but one day... Progress reporting on queries in progress -- I had things hacked to send the progress report in an elog but eventually it would have made sense to put it in a dedicated message type that the client would know the structure of the content of. Distributed tracing -- to pass the trace span id for each query and any other baggage. Currently people either stuff it in application_id or in SQL comments but they're both pretty awful. -- greg
Re: pg_collation.collversion for C.UTF-8
On Wed, Apr 19, 2023 at 12:36 AM Daniel Verite wrote: > This seems to be based on the idea that C.* collations provide an > immutable sort like "C", but it appears that it's not the case. Hmm. It seems I added that exemption initially for FreeBSD only in ca051d8b101, and then merged the cases for several OSes in beb4480c853. It's extremely surprising to me that the sort order changed. I expected the sort order to be code point order: https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 One interesting thing is that it seems that it might have been independently invented by Debian (?) and then harmonised with glibc 2.35: https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1871363.html Was the earlier Debian version buggy, or did it simply have a different idea of what the sort order should be, intentionally? Ugh. >From your examples, we can see that the older Debian system did not have A < [some 4 digit code point], while the later version did (as expected). If so then it might be tempting to *not* do what you're suggesting, since the stated goal of the thing is to be stable from now on. But it changed once in the early years of its existence! Annoying. Many OSes have a locale with this name. I don't know this history, who did it first etc, but now I am wondering if they all took the "obvious" interpretation, that it should be code-point based, extrapolating from "C" (really memcmp order): https://unix.stackexchange.com/questions/597962/how-widespread-is-the-c-utf-8-locale
Re: Direct I/O
On Mon, 17 Apr 2023 at 17:45, Thomas Munro wrote: > > Reasons: (1) There will always be a > few file systems that refuse O_DIRECT (Linux tmpfs is one such, as we > learned in this thread; if fails with EINVAL at open() time), and So why wouldn't we just automatically turn it off (globally or for that tablespace) and keep operating without it afterward? > (2) without a page cache, you really need to size your shared_buffers > adequately and we can't do that automatically. Well I'm more optimistic... That may not always be impossible. We've already added the ability to add more shared memory after startup. We could implement the ability to add or remove shared buffer segments after startup. And it wouldn't be crazy to imagine a kernel interface that lets us judge whether the kernel memory pressure makes it reasonable for us to take more shared buffers or makes it necessary to release shared memory to the kernel. You could hack something together using /proc/meminfo today but I imagine an interface intended for this kind of thing would be better. > It's something you'd > opt into for a dedicated database server along with other carefully > considered settings. It seems acceptable to me that if you set > io_direct to a non-default setting on an unusual-for-a-database-server > filesystem you might get errors screaming about inability to open > files -- you'll just have to turn it back off again if it doesn't work > for you. If the only solution is to turn it off perhaps the server should just turn it off? I guess the problem is that the shared_buffers might be set assuming it would be on? -- greg
Re: Fix typos and inconsistencies for v16
Justin Pryzby writes: > On Tue, Apr 18, 2023 at 02:06:43PM +1200, David Rowley wrote: >> On Tue, 18 Apr 2023 at 10:10, Justin Pryzby wrote: >>> and s/evade/avoid/ >> I didn't touch this. You'll need to provide more justification for why >> you think it's more correct than what's there. > I'd noticed that it's a substitution/mistake that's been made in the > past. "Evade" doesn't seem like le mot juste there; it's got negative connotations. But the code around it is just horrible. Some offenses: * No documentation in the function header comment of what the usersetArray parameter is or does. Which is bad enough in itself, but what the parameter actually does is falsify the header comment's principal claim that the passed context is what is used. So I don't find that omission acceptable. * Non-obvious, and quite unnecessary, dependency on the isnull variable having been left in a particular state by previous code. * For me, at least, it'd read better if the if/else arms were swapped, allowing removal of the negation in the if-condition and bringing the code this comment comments on closer to said comment. As for the comment text, maybe say * If the value was USER SET, then apply it at PGC_USERSET context * rather than the caller-supplied context, to prevent any more-restricted * GUCs being set. Also pass InvalidOid for the role, to ensure any * special privileges of the current user aren't applied. I hesitate to go look at the rest of this commit, but I guess somebody had better. regards, tom lane
Re: Fix typos and inconsistencies for v16
Hi Justin and David, 18.04.2023 01:10, Justin Pryzby wrote: Well done. Thank you for reviewing! On Mon, Apr 17, 2023 at 09:00:00PM +0300, Alexander Lakhin wrote: Hello hackers, Please consider fixing the following unique words/identifiers introduced in v16: Note that your patches are overlapping: ... It'd make sense if the changes to each file were isolated to a single patch (especially 004_load and acl.c). I'd hoped that most of the proposed fixes will be accepted, so conflicts due to skipping of some changes seemed unlikely to me. So if you are not strongly disagree, I would continue presenting my findings the same way. ... You missed "boostrap" :) Yes, that's because "boostrap" was not unique, but my semi-automatic approach is based on `uniq -u`, so I'm sure that there are typos that can't be found this way. But hadn't yet convinced myself to start the process of defending each one of the fixes. Attached some others that I found. Yeah, those are good catches too, but e. g. "privilges" is not new in v16, so it's fallen out of my "hot errors" category. If we're going to fix not so hot ones too now, please look at the similar list for v15+ (596b5af1d..HEAD). 1. abbrevated -> abbreviated 2. ArchiveModeRequested -> ArchiveRecoveryRequested 3. BufFileOpenShared -> BufFileOpenFileSet // see dcac5e7ac 4. check_publication_columns -> pub_collist_contains_invalid_column // see note 1 5. configuation -> configuration 6. copyAclUserName -> dequoteAclUserName // see 0c9d84427 7. EndWalRecovery -> FinishWalRecovery 8. HaveRegisteredOrActiveSnapshots -> HaveRegisteredOrActiveSnapshot 9. idiosyncracies -> idiosyncrasies 10. iif -> iff 11. initpriv -> initprivs 12. inserted_destrel -> insert_destrel 13. Intialize -> Initialize 14. invtrans -> invtransfn 15. isolation-level -> isolation level 16. lefthasheqoperator -> left_hasheqoperator + righthasheqoperator -> right_hasheqoperator 17. LRQ_NO_IO -> LRQ_NEXT_NO_IO 18. minRecovery point -> minRecoveryPoint 19. multidimensional-aware -> multidimension-aware // sync with gistbuild.c 20. ParalleVacuumState -> ParallelVacuumState 21. PgStatShm_Stat*Entry -> PgStatShared_* // see note 2 22. plpython_call_handler -> plpython3_call_handler // see 9b7e24a2c 23. pulications -> publications 24. ReadCheckPointRecord -> ReadCheckpointRecord 25. relkkinds -> relkinds 26. separare -> separate // though perhaps it's not the most suitable word here 27. setup_formatted_log_time -> get_formatted_log_time // see ac7c80758 28. SPI_abort -> SPI_rollback 29. ssup_datum_int32_compare -> ssup_datum_int32_cmp 30. ssup_datum_signed_compare -> ssup_datum_signed_cmp 31. ssup_datum_unsigned_compare -> ssup_datum_unsigned_cmp 32. SUBSCRITPION -> SUBSCRIPTION 33. tabelspaces -> tablespaces 34. table_state_not_ready -> table_states_not_ready 35. underling -> underlying 36. WalRecoveryResult -> EndOfWalRecoveryInfo Also, I'd like to note that the following entities/references are orphaned now, so maybe some of them could be removed too: 1. gen-rtab (in pgcrypto/Makefile) // orphaned since db7d1a7b0 2. pgstat_temp_directory // was left by b3abca681 for v15, but maybe it's time to remove it for v16 3. pgstat_write_statsfiles (in valgrind.supp) 4. quote_system_arg (in vcregress.pl) // unused since d2a2ce418 5. standard_initdb (in vcregress.pl) // unused since 322becb60 /* though maybe vcregress.pl will be removed completely soon */ 6. int pstat; /* mcrypt uses it */ (in contrib/pgcrypto/px.h) /* "mcrypt" became unique after abe81ee08, support for libmcrypt was removed at 2005 (3cc866123) */ Note 1. A check that was located in check_publication_columns() in v13-0003-Add-column-filtering-to-logical-replication.patch [1], can be found in pub_collist_contains_invalid_column() now (see also [2]). Note 2. The inconsistency appeared in [3], v67-0007-pgstat-store-statistics-in-shared-memory.patch was correct in this aspect. 18.04.2023 04:35, David Rowley wrote: Please consider fixing the following unique words/identifiers introduced in v16: Thanks, I've pushed all of these apart from the following 2. Thank you! [1] https://www.postgresql.org/message-id/202112302021.ca7ihogysgh3%40alvherre.pgsql [2] https://www.postgresql.org/message-id/CAA4eK1K5pkrPT9z5TByUPptExian5c18g6GnfNf9Cr97QdPbjw%40mail.gmail.com [3] https://www.postgresql.org/message-id/20220404041516.cctrvpadhuriawlq%40alap3.anarazel.de Best regards, Alexanderdiff --git a/contrib/basebackup_to_shell/t/001_basic.pl b/contrib/basebackup_to_shell/t/001_basic.pl index a278df3f91..84ad93f614 100644 --- a/contrib/basebackup_to_shell/t/001_basic.pl +++ b/contrib/basebackup_to_shell/t/001_basic.pl @@ -49,7 +49,7 @@ $node->command_fails_like( qr/shell command for backup is not configured/, 'fails if basebackup_to_shell.command is not set'); -# Configure basebackup_to_shell.command and reload the configuation file. +# Configure basebackup_to_shell.command and reload the
Re: Fix typos and inconsistencies for v16
On Tue, Apr 18, 2023 at 02:06:43PM +1200, David Rowley wrote: > On Tue, 18 Apr 2023 at 10:10, Justin Pryzby wrote: > > > - * USER SET values are appliciable only for PGC_USERSET > > > parameters. We > > > + * USER SET values are applicable only for PGC_USERSET > > > parameters. We > > >* use InvalidOid as role in order to evade possible > > > privileges of the > > > > and s/evade/avoid/ > > I didn't touch this. You'll need to provide more justification for why > you think it's more correct than what's there. I'd noticed that it's a substitution/mistake that's been made in the past. I dug up: 9436041ed848debb3d64fb5fbff6cdb35bc46d04 8e12f4a250d250a89153da2eb9b91c31bb80c483 cd9479af2af25d7fa9bfd24dd4dcf976b360f077 6df7a9698bb036610c1e8c6d375e1be38cb26d5f 911e70207703799605f5a0e8aad9f06cff067c63 > It might not be worth too much discussion, however. +many. I may resend the patch at some later date. > > Attached some others that I found. > > Pushed the rest. Thanks Thanks! -- Justin
Re: [PATCH] Compression dictionaries for JSONB
Matthias, Nikita, Many thanks for the feedback! > Any type with typlen < 0 should work, right? Right. > The use of dictionaries should be dependent on only the use of a > compression method that supports pre-computed compression > dictionaries. I think storage=MAIN + compression dictionaries should > be supported, to make sure there is no expensive TOAST lookup for the > attributes of the tuple; but that doesn't seem to be an option with > that design. > I don't think it's a good idea to interfere with the storage strategies. > Dictionary > should be a kind of storage option, like a compression, but not the strategy > declining all others. My reasoning behind this proposal was as follows. Let's not forget that MAIN attributes *can* be stored in a TOAST table as a final resort, and also that EXTENDED attributes are compressed in-place first, and are stored in a TOAST table *only* if this is needed to fit a tuple in toast_tuple_target bytes (which additionally user can change). So whether in practice it's going to be advantageous to distinguish MAIN+dict.compressed and EXTENDED+dict.compressed attributes seems to be debatable. Basically the only difference between MAIN and EXTENDED is the priority the four-stage TOASTing algorithm gives to the corresponding attributes. I would assume if the user wants dictionary compression, the attribute should be highly compressible and thus always EXTENDED. (We seem to use MAIN for types that are not that well compressible.) This being said, if the majority believes we should introduce a new entity and keep storage strategies as is, I'm fine with that. This perhaps is not going to be the most convenient interface for the user. On the flip side it's going to be flexible. It's all about compromise. > I think "AT_AC SET COMPRESSION lz4 {[WITH | WITHOUT] DICTIONARY}", > "AT_AC SET COMPRESSION lz4-dictionary", or "AT_AC SET > compression_dictionary = on" would be better from a design > perspective. > Agree with Matthias on above. OK, unless someone will object, we have a consensus here. > Didn't we get zstd support recently as well? Unfortunately, it is not used for TOAST. In fact I vaguely recall that ZSTD support for TOAST may have been explicitly rejected. Don't quote me on that however... I think it's going to be awkward to support PGLZ/LZ4 for COMPRESSION and LZ4/ZSTD for dictionary compression. As a user personally I would prefer having one set of compression algorithms that can be used with TOAST. Perhaps for PoC we could focus on LZ4, and maybe PGLZ, if we choose to use PGLZ for compression dictionaries too. We can always discuss ZSTD separately. > Can we specify a default compression method for each postgresql type, > just like how we specify the default storage? If not, then the setting > could realistically be in conflict with a default_toast_compression > setting, assuming that dictionary support is not a requirement for > column compression methods. No, only STORAGE can be specified [1]. > The toast pointer must store enough info about the compression used to > decompress the datum, which implies it needs to store the compression > algorithm used, and a reference to the compression dictionary (if > any). I think the idea about introducing a new toast pointer type (in > the custom toast patch) wasn't bad per se, and that change would allow > us to carry more or different info in the header. > The Pluggable TOAST was rejected, but we have a lot of improvements > based on changing the TOAST pointer structure. Interestingly it looks like we ended up working on TOAST improvement after all. I'm almost certain that we will have to modify TOAST pointers to a certain degree in order to make it work. Hopefully it's not going to be too invasive. [1]: https://www.postgresql.org/docs/current/sql-createtype.html -- Best regards, Aleksander Alekseev
Re: constants for tar header offsets
Robert Haas writes: > On Tue, Apr 18, 2023 at 12:06 PM Tom Lane wrote: >> Hmm, you're right: I checked the POSIX.1-2018 spec as well, and >> it agrees that the prefix field is 155 bytes long. Perhaps just >> add another comment line indicating that 12 bytes remain unassigned? > > OK. Here's v2, with that change and a few others. It still has magic numbers for the sizes of the fields, should those also be named constants? - ilmari
Re: constants for tar header offsets
Robert Haas writes: > OK. Here's v2, with that change and a few others. LGTM. regards, tom lane
Re: Request for comment on setting binary format output per session
On Tue, 18 Apr 2023 at 12:24, Robert Haas wrote: > On Tue, Apr 18, 2023 at 11:51 AM Tom Lane wrote: > > Robert Haas writes: > > > One thing I think we should do in this area is introduce #defines for > > > all the message type codes and use those instead of having hard-coded > > > constants everywhere. > > > > +1, but I wonder where we should put those exactly. My first thought > > was postgres_ext.h, but the charter for that is > > > > * This file contains declarations of things that are visible > everywhere > > * in PostgreSQL *and* are visible to clients of frontend interface > libraries. > > * For example, the Oid type is part of the API of libpq and other > libraries. > > > > so picayune details of the wire protocol probably don't belong there. > > Maybe we need a new header concerned with the wire protocol? > > Yeah. I sort of thought maybe one of the files in src/include/libpq > would be the right place, but it doesn't look like it. > > If we at least created the defines and replaced occurrences with the same, then we can litigate where to put them later. I think I'd prefer this in a different patch, but I'd be willing to take a run at it. Dave
Re: constants for tar header offsets
On Tue, Apr 18, 2023 at 12:06 PM Tom Lane wrote: > Hmm, you're right: I checked the POSIX.1-2018 spec as well, and > it agrees that the prefix field is 155 bytes long. Perhaps just > add another comment line indicating that 12 bytes remain unassigned? OK. Here's v2, with that change and a few others. -- Robert Haas EDB: http://www.enterprisedb.com v2-0001-Add-and-use-symbolic-constants-for-tar-header-off.patch Description: Binary data
Re: Request for comment on setting binary format output per session
On Tue, Apr 18, 2023 at 11:51 AM Tom Lane wrote: > Robert Haas writes: > > One thing I think we should do in this area is introduce #defines for > > all the message type codes and use those instead of having hard-coded > > constants everywhere. > > +1, but I wonder where we should put those exactly. My first thought > was postgres_ext.h, but the charter for that is > > * This file contains declarations of things that are visible everywhere > * in PostgreSQL *and* are visible to clients of frontend interface > libraries. > * For example, the Oid type is part of the API of libpq and other libraries. > > so picayune details of the wire protocol probably don't belong there. > Maybe we need a new header concerned with the wire protocol? Yeah. I sort of thought maybe one of the files in src/include/libpq would be the right place, but it doesn't look like it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Compression dictionaries for JSONB
Hi, I don't think it's a good idea to interfere with the storage strategies. Dictionary should be a kind of storage option, like a compression, but not the strategy declining all others. >> While thinking about how a user interface could look like it occured >> to me that what we are discussing could be merely a new STORAGE >> strategy. Currently there are PLAIN, MAIN, EXTERNAL and EXTENDED. >> Let's call a new strategy DICTIONARY, with typstorage = d. >I think "AT_AC SET COMPRESSION lz4 {[WITH | WITHOUT] DICTIONARY}", >"AT_AC SET COMPRESSION lz4-dictionary", or "AT_AC SET >compression_dictionary = on" would be better from a design >perspective. Agree with Matthias on above. About the TOAST pointer: >The toast pointer must store enough info about the compression used to >decompress the datum, which implies it needs to store the compression >algorithm used, and a reference to the compression dictionary (if >any). I think the idea about introducing a new toast pointer type (in >the custom toast patch) wasn't bad per se, and that change would allow >us to carry more or different info in the header. The External TOAST pointer is very limited to the amount of service data it could keep, that's why we introduced the Custom TOAST pointers in the Pluggable TOAST. But keep in mind that changing the TOAST pointer structure requires a lot of quite heavy modifications in the core - along with some obvious places like insert/update/delete datum there is very serious issue with logical replication. The Pluggable TOAST was rejected, but we have a lot of improvements based on changing the TOAST pointer structure. On Tue, Apr 18, 2023 at 6:40 PM Matthias van de Meent < boekewurm+postg...@gmail.com> wrote: > On Tue, 18 Apr 2023 at 17:28, Aleksander Alekseev > wrote: > > > > Hi Andres, > > > > > As I said, I don't think we should extend dictionaries. For this to > work we'll > > > likely need a new / extended compressed toast datum header of some > form, with > > > a reference to the dictionary. That'd likely be needed even with > updatable > > > dictionaries, as we IIRC don't know which column a toasted datum is > for, but > > > we need to know, to identify the dictionary. As we need that field > anyway, we > > > can easily have multiple dictionaries. > > > > So I summarized the requirements we agreed on so far and ended up with > > the following list: > > > > * This is going to be a PostgreSQL feature, not an extension, not a > > bunch of hooks, etc; > > * We are not going to support lazy/partial decompression since this is > > too complicated in a general case and Postgres is not a specialized > > document-oriented DBMS (there is a normalization after all); > > * This should be a relation-level optimization option, not something > > visible to every user of the table (not a custom type, etc); > > * This is going to be an attribute-level compression; > > * The dictionaries should be created automatically (maybe not in a PoC > > but in the final implementation) since people are not good at it; > > * We are going to be using the existing compression algorithms like > > LZ4/ZSTD, not to invent new ones; > > * When created, a dictionary version is immutable, i.e. no new entries > > can be added. New version of a dictionary can be created when the data > > evolves. The compressed data stores the dictionary version used for > > compression. A dictionary version can't be deleted while data exists > > that uses this version of a dictionary; > > * Dictionaries are created automatically from sampled data during > > ANALIZE. We compare the efficiency of a new dictionary vs the > > efficiency of the old one (or the lack of such) on sampled data and > > depending on the results decide whether it's worth creating a new > > version of a dictionary; > > * This is going to work for multiple types: TEXT, JSON, JSONB, XML, > > BYTEA etc. Ideally for user-defined types too; > > Any type with typlen < 0 should work, right? > > > Hopefully I didn't miss anything. > > > > While thinking about how a user interface could look like it occured > > to me that what we are discussing could be merely a new STORAGE > > strategy. Currently there are PLAIN, MAIN, EXTERNAL and EXTENDED. > > Let's call a new strategy DICTIONARY, with typstorage = d. > > The use of dictionaries should be dependent on only the use of a > compression method that supports pre-computed compression > dictionaries. I think storage=MAIN + compression dictionaries should > be supported, to make sure there is no expensive TOAST lookup for the > attributes of the tuple; but that doesn't seem to be an option with > that design. > > > When user wants a given attribute to be compressed, he/she says: > > > > ALTER TABLE foo ALTER COLUMN bar SET STORAGE DICTIONARY; > > > > And the compression algorithms is chosen as usual: > > > > ALTER TABLE foo ALTER COLUMN bar SET COMPRESSION lz4; > > > > When there are no dictionaries yet, DICTIONARY works the same as > > EXTENDED. When a
Re: constants for tar header offsets
Robert Haas writes: > On Tue, Apr 18, 2023 at 11:38 AM Tom Lane wrote: >> 2. The header size is defined as 512 bytes, but this doesn't sum to 512: >> + TAR_OFFSET_PREFIX = 345 /* 155 byte string */ > I think that what happened is that whoever designed the original tar > format decided on 512 byte blocks. And the header did not take up the > whole block. The USTAR format is an extension of the original format > which uses more of the block, but still not all of it. Hmm, you're right: I checked the POSIX.1-2018 spec as well, and it agrees that the prefix field is 155 bytes long. Perhaps just add another comment line indicating that 12 bytes remain unassigned? regards, tom lane
Re: constants for tar header offsets
On Tue, Apr 18, 2023 at 11:38 AM Tom Lane wrote: > Robert Haas writes: > > We have a few different places in the code where we generate or modify > > tar headers or just read data out of them. The code in question uses > > one of my less-favorite programming things: magic numbers. The offsets > > of the various fields within the tar header are just hard-coded in > > each relevant place in our code. I think we should clean that up, as > > in the attached patch. > > Generally +1, with a couple of additional thoughts: > > 1. Is it worth inventing macros for the values of the file type, > rather than just writing the comment you did? Might be. > 2. The header size is defined as 512 bytes, but this doesn't sum to 512: > > + TAR_OFFSET_PREFIX = 345 /* 155 byte string */ > > Either that field's length is really 167 bytes, or there's some other > field you didn't document. (It looks like you may have copied "155" > from an incorrect existing comment?) According to my research, it is neither of those, e.g. see https://www.subspacefield.org/~vax/tar_format.html https://www.ibm.com/docs/en/zos/2.4.0?topic=formats-tar-format-tar-archives https://wiki.osdev.org/USTAR I think that what happened is that whoever designed the original tar format decided on 512 byte blocks. And the header did not take up the whole block. The USTAR format is an extension of the original format which uses more of the block, but still not all of it. > Yeah, this is adding greppability (a good thing!) but little more. > However, I'm not convinced it's worth doing more. It's not like > this data structure will change anytime soon. Right. greppability is a major concern for me here, and also bug surface. Right now, to use the functions in pgtar.h, you need to know all the header offsets as well as the format and length of each header field. This centralizes constants for the header offsets, and at least provides some centralized documentation of the rest. It's not great, though, because it seems like there's some risk of someone writing new code and getting confused about whether the length of a certain field is 8 or 12, for example. A thicker abstraction layer might be able to avoid or minimize such risks better than what we have, but I don't really know how to design it, whereas this seems like an obvious improvement. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Request for comment on setting binary format output per session
Robert Haas writes: > One thing I think we should do in this area is introduce #defines for > all the message type codes and use those instead of having hard-coded > constants everywhere. +1, but I wonder where we should put those exactly. My first thought was postgres_ext.h, but the charter for that is * This file contains declarations of things that are visible everywhere * in PostgreSQL *and* are visible to clients of frontend interface libraries. * For example, the Oid type is part of the API of libpq and other libraries. so picayune details of the wire protocol probably don't belong there. Maybe we need a new header concerned with the wire protocol? regards, tom lane
Re: Request for comment on setting binary format output per session
On Mon, Apr 17, 2023 at 4:22 PM Tom Lane wrote: > The fact that we've gotten away without adding *any* new message types > for about twenty years suggests to me that the growth rate isn't such > that we need sub-message-types yet. I'd keep the structure the same > until such time as we can't choose a plausible code value for a new > message, and then maybe add the "x-and-subtype" convention Jeff suggests. One thing I think we should do in this area is introduce #defines for all the message type codes and use those instead of having hard-coded constants everywhere. I'm not brave enough to tackle that day, but the only reason the current situation isn't a disaster is because every place we use e.g. 'Z' we generally also have a comment that mentions ReadyForQuery. If it weren't for that, this would be pretty un-greppable. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Compression dictionaries for JSONB
On Tue, 18 Apr 2023 at 17:28, Aleksander Alekseev wrote: > > Hi Andres, > > > As I said, I don't think we should extend dictionaries. For this to work > > we'll > > likely need a new / extended compressed toast datum header of some form, > > with > > a reference to the dictionary. That'd likely be needed even with updatable > > dictionaries, as we IIRC don't know which column a toasted datum is for, but > > we need to know, to identify the dictionary. As we need that field anyway, > > we > > can easily have multiple dictionaries. > > So I summarized the requirements we agreed on so far and ended up with > the following list: > > * This is going to be a PostgreSQL feature, not an extension, not a > bunch of hooks, etc; > * We are not going to support lazy/partial decompression since this is > too complicated in a general case and Postgres is not a specialized > document-oriented DBMS (there is a normalization after all); > * This should be a relation-level optimization option, not something > visible to every user of the table (not a custom type, etc); > * This is going to be an attribute-level compression; > * The dictionaries should be created automatically (maybe not in a PoC > but in the final implementation) since people are not good at it; > * We are going to be using the existing compression algorithms like > LZ4/ZSTD, not to invent new ones; > * When created, a dictionary version is immutable, i.e. no new entries > can be added. New version of a dictionary can be created when the data > evolves. The compressed data stores the dictionary version used for > compression. A dictionary version can't be deleted while data exists > that uses this version of a dictionary; > * Dictionaries are created automatically from sampled data during > ANALIZE. We compare the efficiency of a new dictionary vs the > efficiency of the old one (or the lack of such) on sampled data and > depending on the results decide whether it's worth creating a new > version of a dictionary; > * This is going to work for multiple types: TEXT, JSON, JSONB, XML, > BYTEA etc. Ideally for user-defined types too; Any type with typlen < 0 should work, right? > Hopefully I didn't miss anything. > > While thinking about how a user interface could look like it occured > to me that what we are discussing could be merely a new STORAGE > strategy. Currently there are PLAIN, MAIN, EXTERNAL and EXTENDED. > Let's call a new strategy DICTIONARY, with typstorage = d. The use of dictionaries should be dependent on only the use of a compression method that supports pre-computed compression dictionaries. I think storage=MAIN + compression dictionaries should be supported, to make sure there is no expensive TOAST lookup for the attributes of the tuple; but that doesn't seem to be an option with that design. > When user wants a given attribute to be compressed, he/she says: > > ALTER TABLE foo ALTER COLUMN bar SET STORAGE DICTIONARY; > > And the compression algorithms is chosen as usual: > > ALTER TABLE foo ALTER COLUMN bar SET COMPRESSION lz4; > > When there are no dictionaries yet, DICTIONARY works the same as > EXTENDED. When a dictionary is trained the data is compressed using > the latest version of this dictionary. For visibility we are going to > need some sort of pg_stat_dictionaries view that shows the existing > dictionaries, how much space they consume, etc. I think "AT_AC SET COMPRESSION lz4 {[WITH | WITHOUT] DICTIONARY}", "AT_AC SET COMPRESSION lz4-dictionary", or "AT_AC SET compression_dictionary = on" would be better from a design perspective. > If we choose this approach there are a couple of questions/notes that > come to mind: > > * The default compression algorithm is PGLZ and unlike LZ4 it doesn't > support training dictionaries yet. This should be straightforward to > implement though, or alternatively shared dictionaries could work only > for LZ4; Didn't we get zstd support recently as well? > * Currently users have control of toast_tuple_target but not > TOAST_TUPLE_THRESHOLD. Which means for tuples smaller than 1/4 of the > page size shared dictionaries are not going to be triggered. Which is > not necessarily a bad thing. Alternatively we could give the users > toast_tuple_threshold setting. This shouldn't necessarily be part of > this CF entry discussion however, we can always discuss it separately; That makes a lot of sense, but as you said handling that separately would probably be better and easier to review. > * Should we allow setting DICTIONARY storage strategy for a given > type, i.e. CREATE TYPE baz STORAGE = DICTIONARY? I suggest we forbid > it in the first implementation, just for the sake of simplicity. Can we specify a default compression method for each postgresql type, just like how we specify the default storage? If not, then the setting could realistically be in conflict with a default_toast_compression setting, assuming that dictionary support is not a requirement for column compression methods. >
Re: constants for tar header offsets
Robert Haas writes: > We have a few different places in the code where we generate or modify > tar headers or just read data out of them. The code in question uses > one of my less-favorite programming things: magic numbers. The offsets > of the various fields within the tar header are just hard-coded in > each relevant place in our code. I think we should clean that up, as > in the attached patch. Generally +1, with a couple of additional thoughts: 1. Is it worth inventing macros for the values of the file type, rather than just writing the comment you did? 2. The header size is defined as 512 bytes, but this doesn't sum to 512: + TAR_OFFSET_PREFIX = 345 /* 155 byte string */ Either that field's length is really 167 bytes, or there's some other field you didn't document. (It looks like you may have copied "155" from an incorrect existing comment?) > I hasten to emphasize that, while I think this is an improvement, I > don't think the result is particularly awesome. Even with the patch, > src/port/tar.c and src/include/pgtar.h do a poor job insulating > callers from the details of the tar format. However, it's also not > very clear to me how to fix that. Yeah, this is adding greppability (a good thing!) but little more. However, I'm not convinced it's worth doing more. It's not like this data structure will change anytime soon. regards, tom lane
Re: allow_in_place_tablespaces vs. pg_basebackup
On Mon, Apr 17, 2023 at 1:30 AM Michael Paquier wrote: > FWIW, doing that now rather than the beginning of July is OK for me > for this stuff. OK, committed. -- Robert Haas EDB: http://www.enterprisedb.com
constants for tar header offsets
Hi, We have a few different places in the code where we generate or modify tar headers or just read data out of them. The code in question uses one of my less-favorite programming things: magic numbers. The offsets of the various fields within the tar header are just hard-coded in each relevant place in our code. I think we should clean that up, as in the attached patch. I hasten to emphasize that, while I think this is an improvement, I don't think the result is particularly awesome. Even with the patch, src/port/tar.c and src/include/pgtar.h do a poor job insulating callers from the details of the tar format. However, it's also not very clear to me how to fix that. For instance, I thought about writing a function that parses a tar header into a struct and then using it in all of these places, but that seems like it would lose too much efficiency relative to the current ad-hoc coding. So for now I don't have a better idea than this. -- Robert Haas EDB: http://www.enterprisedb.com v1-0001-Add-and-use-symbolic-constants-for-tar-header-off.patch Description: Binary data
Re: Temporary tables versus wraparound... again
Hm, in an optimized build using kernel perf I see this. But I don't know how to find what the call sites are for LWLockAcquire/Release. If it's the locks on pgproc that would be kind of bad. I wonder if I should be gathering horizons once in the PrecommitActions and then just using those for every temp table somehow. Perhaps only actually doing an update if the relfrozenxid is actually at least vacuum_freeze_table_age old. 3.98% postgres LWLockAcquire 3.51% postgres LWLockRelease 3.18% postgres hash_search_with_hash_value 2.20% postgres DropRelationLocalBuffers 1.80% [kernel] check_preemption_disabled 1.52% postgres hash_bytes 1.27% postgres LockAcquireExtended 0.97% postgres _bt_compare 0.95% [kernel] kmem_cache_alloc I still think we should be applying the vacuum warning messages to stable and probably backpatching. I've actually heard from other users who have faced the same surprise wraparound shutdown.
Re: [PATCH] Compression dictionaries for JSONB
Hi Andres, > As I said, I don't think we should extend dictionaries. For this to work we'll > likely need a new / extended compressed toast datum header of some form, with > a reference to the dictionary. That'd likely be needed even with updatable > dictionaries, as we IIRC don't know which column a toasted datum is for, but > we need to know, to identify the dictionary. As we need that field anyway, we > can easily have multiple dictionaries. So I summarized the requirements we agreed on so far and ended up with the following list: * This is going to be a PostgreSQL feature, not an extension, not a bunch of hooks, etc; * We are not going to support lazy/partial decompression since this is too complicated in a general case and Postgres is not a specialized document-oriented DBMS (there is a normalization after all); * This should be a relation-level optimization option, not something visible to every user of the table (not a custom type, etc); * This is going to be an attribute-level compression; * The dictionaries should be created automatically (maybe not in a PoC but in the final implementation) since people are not good at it; * We are going to be using the existing compression algorithms like LZ4/ZSTD, not to invent new ones; * When created, a dictionary version is immutable, i.e. no new entries can be added. New version of a dictionary can be created when the data evolves. The compressed data stores the dictionary version used for compression. A dictionary version can't be deleted while data exists that uses this version of a dictionary; * Dictionaries are created automatically from sampled data during ANALIZE. We compare the efficiency of a new dictionary vs the efficiency of the old one (or the lack of such) on sampled data and depending on the results decide whether it's worth creating a new version of a dictionary; * This is going to work for multiple types: TEXT, JSON, JSONB, XML, BYTEA etc. Ideally for user-defined types too; Hopefully I didn't miss anything. While thinking about how a user interface could look like it occured to me that what we are discussing could be merely a new STORAGE strategy. Currently there are PLAIN, MAIN, EXTERNAL and EXTENDED. Let's call a new strategy DICTIONARY, with typstorage = d. When user wants a given attribute to be compressed, he/she says: ALTER TABLE foo ALTER COLUMN bar SET STORAGE DICTIONARY; And the compression algorithms is chosen as usual: ALTER TABLE foo ALTER COLUMN bar SET COMPRESSION lz4; When there are no dictionaries yet, DICTIONARY works the same as EXTENDED. When a dictionary is trained the data is compressed using the latest version of this dictionary. For visibility we are going to need some sort of pg_stat_dictionaries view that shows the existing dictionaries, how much space they consume, etc. If we choose this approach there are a couple of questions/notes that come to mind: * The default compression algorithm is PGLZ and unlike LZ4 it doesn't support training dictionaries yet. This should be straightforward to implement though, or alternatively shared dictionaries could work only for LZ4; * Currently users have control of toast_tuple_target but not TOAST_TUPLE_THRESHOLD. Which means for tuples smaller than 1/4 of the page size shared dictionaries are not going to be triggered. Which is not necessarily a bad thing. Alternatively we could give the users toast_tuple_threshold setting. This shouldn't necessarily be part of this CF entry discussion however, we can always discuss it separately; * Should we allow setting DICTIONARY storage strategy for a given type, i.e. CREATE TYPE baz STORAGE = DICTIONARY? I suggest we forbid it in the first implementation, just for the sake of simplicity. * It looks like we won't be able to share a dictionary between multiple columns. Which again is not necessarily a bad thing: data in these columns can be completely different (e.g. BYTEA and XML), columns can be dropped independently, etc. If a user is interested in sharing a dictionary between several columns he/she can join these columns in a single JSONB column. * TOAST currently doesn't support ZSTD. IMO this is not a big deal and adding the corresponding support can be discussed separately. * If memory serves, there were not so many free bits left in TOAST pointers. The pointers don't store a storage strategy though so hopefully this will not be a problem. We'll see. Please let me know what you think about all this. I'm going to prepare an updated patch for the next CF so I could use early feedback. -- Best regards, Aleksander Alekseev
Re: Allowing parallel-safe initplans
Richard Guo writes: > On Mon, Apr 17, 2023 at 11:04 PM Tom Lane wrote: >> I wondered about that too, but how come neither of us saw non-cosmetic >> failures (ie, actual query output changes not just EXPLAIN changes) >> when we tried this? > Sorry I forgot to mention that I did see query output changes after > moving the initPlans to the Gather node. Hmm, my memory was just of seeing the EXPLAIN output changes, but maybe those got my attention to the extent of missing the others. > It seems that in this case the top_plan does not have any extParam, so > the Gather node that is added atop the top_plan does not have a chance > to get its initParam filled in set_param_references(). Oh, so maybe we'd need to copy up extParam as well? But it's largely moot, since I don't see a good way to avoid breaking the EXPLAIN output. regards, tom lane
pg_collation.collversion for C.UTF-8
Hi, get_collation_actual_version() in pg_locale.c currently excludes C.UTF-8 (and more generally C.*) from versioning, which makes pg_collation.collversion being empty for these collations. char * get_collation_actual_version(char collprovider, const char *collcollate) { if (collprovider == COLLPROVIDER_LIBC && pg_strcasecmp("C", collcollate) != 0 && pg_strncasecmp("C.", collcollate, 2) != 0 && pg_strcasecmp("POSIX", collcollate) != 0) This seems to be based on the idea that C.* collations provide an immutable sort like "C", but it appears that it's not the case. For instance, consider how these C.UTF-8 comparisons differ between recent linux systems: U+1D400 = Mathematical Bold Capital A Debian 9.13 (glibc 2.24) => select 'A' < E'\U0001D400' collate "C.UTF-8"; ?column? -- t Debian 10.13 (glibc 2.28) => select 'A' < E'\U0001D400' collate "C.UTF-8"; ?column? -- f Debian 11.6 (glibc 2.31) => select 'A' < E'\U0001D400' collate "C.UTF-8"; ?column? -- f Ubuntu 22.04 (glibc 2.35) => select 'A' < E'\U0001D400' collate "C.UTF-8"; ?column? -- t So I suggest the attached patch to no longer exclude these collations from the generic versioning. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 092b620673..dd9c3a0c10 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1727,7 +1727,6 @@ get_collation_actual_version(char collprovider, const char *collcollate) #endif if (collprovider == COLLPROVIDER_LIBC && pg_strcasecmp("C", collcollate) != 0 && - pg_strncasecmp("C.", collcollate, 2) != 0 && pg_strcasecmp("POSIX", collcollate) != 0) { #if defined(__GLIBC__)
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
> On 17 Apr 2023, at 18:20, Jacob Champion wrote: > Note that it won't fix the > (completely different) misleading error message for OpenSSL 3.0, but > since that's an *actively* unhelpful error message coming back from > OpenSSL, I don't think we want to override it. Agreed, the best we can do there is to memorize it in the test. -- Daniel Gustafsson
Re: Adding argument names to aggregate functions
On 18.04.23 12:27, Dagfinn Ilmari Mannsåker wrote: Link to the actual job: https://cirrus-ci.com/task/5881376021413888 The failure was: [09:54:38.727] 216/262 postgresql:recovery / recovery/031_recovery_conflict ERROR 198.73s exit status 60 Looking at its log: https://api.cirrus-ci.com/v1/artifact/task/5881376021413888/testrun/build/testrun/recovery/031_recovery_conflict/log/regress_log_031_recovery_conflict we see: timed out waiting for match: (?^:User was holding a relation lock for too long) at /Users/admin/pgsql/src/test/recovery/t/031_recovery_conflict.pl line 311. That looks indeed completely unrelated to this patch. Yes, that's what I suspected. The patch passes all tests now :) I've marked the CF entry as "Ready for Committer". Jim
Re: Adding argument names to aggregate functions
Jim Jones writes: > On 18.04.23 10:58, I wrote: >> On 14.04.23 12:03, Dagfinn Ilmari Mannsåker wrote: >>> Thanks for the heads-up, here's a rebased patch. I've also formatted >>> the lines to match what reformat_dat_file.pl wants. It also wanted to >>> reformat a bunch of other entries, but I left those alone. >>> >>> - ilmari >> >> The patch applies cleanly now and \df shows the argument names: >> >> postgres=# \df string_agg >> List of functions >> Schema | Name | Result data type | Argument data >> types | Type >> ++--+--+-- >> >> pg_catalog | string_agg | bytea | value bytea, delimiter bytea | >> agg >> pg_catalog | string_agg | text | value text, delimiter text | >> agg >> (2 rows) >> >> postgres=# \df json_object_agg >> List of functions >> Schema | Name | Result data type | Argument data >> types | Type >> +-+--++-- >> >> pg_catalog | json_object_agg | json | key "any", value "any" | >> agg >> (1 row) >> >> >> I'm wondering if there are some sort of guidelines that dictate when >> to name an argument or not. It would be nice to have one for future >> reference. I seemed to recall a patch to add arugment names to a bunch of functions in the past, thinking that might have some guidance, but can't for the life of me find it now. >> I will mark the CF entry as "Read for Committer" and let the >> committers decide if it's best to first create a guideline for that or >> not. >> >> Best, Jim >> > I just saw that the patch is failing[1] on "macOS - Ventura - > Meson". Not sure if it is related to this patch though .. > > [1] > https://api.cirrus-ci.com/v1/artifact/task/5881376021413888/meson_log/build/meson-logs/meson-log.txt Link to the actual job: https://cirrus-ci.com/task/5881376021413888 The failure was: [09:54:38.727] 216/262 postgresql:recovery / recovery/031_recovery_conflict ERROR 198.73s exit status 60 Looking at its log: https://api.cirrus-ci.com/v1/artifact/task/5881376021413888/testrun/build/testrun/recovery/031_recovery_conflict/log/regress_log_031_recovery_conflict we see: timed out waiting for match: (?^:User was holding a relation lock for too long) at /Users/admin/pgsql/src/test/recovery/t/031_recovery_conflict.pl line 311. That looks indeed completely unrelated to this patch. - ilmari
Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
On Tue, 18 Apr 2023 at 19:29, Miroslav Bendik wrote: > here is an updated patch with proposed changes. Here's a quick review: 1. I don't think this is required. match_pathkeys_to_index() sets these to NIL and they're set accordingly by the other code paths. - List*orderbyclauses; - List*orderbyclausecols; + List*orderbyclauses = NIL; + List*orderbyclausecols = NIL; 2. You can use list_copy_head(root->query_pathkeys, list_length(orderbyclauses)); instead of: + useful_pathkeys = list_truncate(list_copy(root->query_pathkeys), + list_length(orderbyclauses)); 3. The following 2 changes don't seem to be needed: @@ -3104,11 +3100,11 @@ match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys, /* Pathkey must request default sort order for the target opfamily */ if (pathkey->pk_strategy != BTLessStrategyNumber || pathkey->pk_nulls_first) - return; + break; /* If eclass is volatile, no hope of using an indexscan */ if (pathkey->pk_eclass->ec_has_volatile) - return; + break; There's no code after the loop you're breaking out of, so it seems to me that return is the same as break and there's no reason to change it. David
Re: Adding argument names to aggregate functions
On 18.04.23 10:58, I wrote: On 14.04.23 12:03, Dagfinn Ilmari Mannsåker wrote: Thanks for the heads-up, here's a rebased patch. I've also formatted the lines to match what reformat_dat_file.pl wants. It also wanted to reformat a bunch of other entries, but I left those alone. - ilmari The patch applies cleanly now and \df shows the argument names: postgres=# \df string_agg List of functions Schema | Name | Result data type | Argument data types | Type ++--+--+-- pg_catalog | string_agg | bytea | value bytea, delimiter bytea | agg pg_catalog | string_agg | text | value text, delimiter text | agg (2 rows) postgres=# \df json_object_agg List of functions Schema | Name | Result data type | Argument data types | Type +-+--++-- pg_catalog | json_object_agg | json | key "any", value "any" | agg (1 row) I'm wondering if there are some sort of guidelines that dictate when to name an argument or not. It would be nice to have one for future reference. I will mark the CF entry as "Read for Committer" and let the committers decide if it's best to first create a guideline for that or not. Best, Jim I just saw that the patch is failing[1] on "macOS - Ventura - Meson". Not sure if it is related to this patch though .. [1] https://api.cirrus-ci.com/v1/artifact/task/5881376021413888/meson_log/build/meson-logs/meson-log.txt
Tab completion for GRANT MAINTAIN
Hi hackers, I found that GRANT MAINTAIN is not tab-completed with ON, so here is a patch. Best wishes, -- Ken Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATIONdiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 5825b2a195..bd04244969 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3902,7 +3902,7 @@ psql_completion(const char *text, int start, int end) else if (TailMatches("GRANT|REVOKE", MatchAny) || TailMatches("REVOKE", "GRANT", "OPTION", "FOR", MatchAny)) { - if (TailMatches("SELECT|INSERT|UPDATE|DELETE|TRUNCATE|REFERENCES|TRIGGER|CREATE|CONNECT|TEMPORARY|TEMP|EXECUTE|USAGE|ALL")) + if (TailMatches("SELECT|INSERT|UPDATE|DELETE|TRUNCATE|REFERENCES|TRIGGER|CREATE|CONNECT|TEMPORARY|TEMP|EXECUTE|USAGE|MAINTAIN|ALL")) COMPLETE_WITH("ON"); else if (TailMatches("GRANT", MatchAny)) COMPLETE_WITH("TO");
Support "Right Semi Join" plan shapes
In thread [1] which discussed 'Right Anti Join', Tom once mentioned 'Right Semi Join'. After a preliminary investigation I think it is beneficial and can be implemented with very short change. With 'Right Semi Join', what we want to do is to just have the first match for each inner tuple. For HashJoin, after scanning the hash bucket for matches to current outer, we just need to check whether the inner tuple has been set match and skip it if so. For MergeJoin, we can do it by avoiding restoring inner scan to the marked tuple in EXEC_MJ_TESTOUTER, in the case when new outer tuple == marked tuple. As that thread is already too long, fork a new thread and attach a patch used for discussion. The patch implements 'Right Semi Join' for HashJoin. [1] https://www.postgresql.org/message-id/CAMbWs4_eChX1bN%3Dvj0Uzg_7iz9Uivan%2BWjjor-X87L-V27A%2Brw%40mail.gmail.com Thanks Richard v1-0001-Support-Right-Semi-Join-plan-shapes.patch Description: Binary data
Re: Adding argument names to aggregate functions
On 14.04.23 12:03, Dagfinn Ilmari Mannsåker wrote: Thanks for the heads-up, here's a rebased patch. I've also formatted the lines to match what reformat_dat_file.pl wants. It also wanted to reformat a bunch of other entries, but I left those alone. - ilmari The patch applies cleanly now and \df shows the argument names: postgres=# \df string_agg List of functions Schema | Name | Result data type | Argument data types | Type ++--+--+-- pg_catalog | string_agg | bytea | value bytea, delimiter bytea | agg pg_catalog | string_agg | text | value text, delimiter text | agg (2 rows) postgres=# \df json_object_agg List of functions Schema | Name | Result data type | Argument data types | Type +-+--++-- pg_catalog | json_object_agg | json | key "any", value "any" | agg (1 row) I'm wondering if there are some sort of guidelines that dictate when to name an argument or not. It would be nice to have one for future reference. I will mark the CF entry as "Read for Committer" and let the committers decide if it's best to first create a guideline for that or not. Best, Jim
Re: Fix documentation for max_wal_size and min_wal_size
Hi On Mon, Apr 17, 2023 at 9:38 PM Michael Paquier wrote: > On Mon, Apr 17, 2023 at 07:57:58PM -0700, sirisha chamarthi wrote: > > On Fri, Apr 14, 2023 at 1:01 AM Kyotaro Horiguchi < > horikyota@gmail.com> > > wrote: > >> So, I personally think it should be written like this: "The default > >> size is 80MB. However, if you have changed the WAL segment size from > >> the default of 16MB, it will be five times the segment size.", but I'm > >> not sure what the others think about this.. > > Yes, I was under the impression that this should mention 16MB, but > I'd also add a note about initdb when a non-default value is specified > for the segment size. > How about the text below? "The default size is 80MB. However, if you have changed the WAL segment size from the default of 16MB with the initdb option --wal-segsize, it will be five times the segment size."
Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
po 17. 4. 2023 o 15:26 Richard Guo napísal(a): > > > On Sun, Apr 16, 2023 at 1:20 AM Miroslav Bendik > wrote: >> >> Postgres allows incremental sort only for ordered indexes. Function >> build_index_paths dont build partial order paths for access methods >> with order support. My patch adds support for incremental ordering >> with access method. > > > I think this is a meaningful optimization. I reviewed the patch and > here are the comments from me. > > * I understand the new param 'match_pathkeys_length_p' is used to tell > how many presorted keys are useful. I think list_length(orderbyclauses) > will do the same. So there is no need to add the new param, thus we can > reduce the code diffs. > > * Now that match_pathkeys_to_index() returns a prefix of the pathkeys > rather than returns NIL immediately when there is a failure to match, it > seems the two local variables 'orderby_clauses' and 'clause_columns' are > not necessary any more. I think we can instead lappend the matched > 'expr' and 'indexcol' to '*orderby_clauses_p' and '*clause_columns_p' > directly. In this way we can still call 'return' when we come to a > failure to match. > > * In build_index_paths(), I think the diff can be reduced to > > -if (orderbyclauses) > -useful_pathkeys = root->query_pathkeys; > -else > -useful_pathkeys = NIL; > +useful_pathkeys = list_truncate(list_copy(root->query_pathkeys), > +list_length(orderbyclauses)); > > * Several comments in match_pathkeys_to_index() are out of date. We > need to revise them to cope with the change. > > * I think it's better to provide a test case. > > Thanks > Richard Thank you for advice, here is an updated patch with proposed changes. -- Best regards Miroslav diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index b8000da56d..202779fb10 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -864,8 +864,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, List *index_clauses; Relids outer_relids; double loop_count; - List *orderbyclauses; - List *orderbyclausecols; + List *orderbyclauses = NIL; + List *orderbyclausecols = NIL; List *index_pathkeys; List *useful_pathkeys; bool found_lower_saop_clause; @@ -1001,10 +1001,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, match_pathkeys_to_index(index, root->query_pathkeys, , ); - if (orderbyclauses) - useful_pathkeys = root->query_pathkeys; - else - useful_pathkeys = NIL; + useful_pathkeys = list_truncate(list_copy(root->query_pathkeys), + list_length(orderbyclauses)); } else { @@ -3071,16 +3069,14 @@ expand_indexqual_rowcompare(PlannerInfo *root, * index column numbers (zero based) that each clause would be used with. * NIL lists are returned if the ordering is not achievable this way. * - * On success, the result list is ordered by pathkeys, and in fact is - * one-to-one with the requested pathkeys. + * On success, the result list is ordered by pathkeys. Result can be shorter + * than pathkeys on partial prefix match. */ static void match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys, List **orderby_clauses_p, List **clause_columns_p) { - List *orderby_clauses = NIL; - List *clause_columns = NIL; ListCell *lc1; *orderby_clauses_p = NIL; /* set default results */ @@ -3104,11 +3100,11 @@ match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys, /* Pathkey must request default sort order for the target opfamily */ if (pathkey->pk_strategy != BTLessStrategyNumber || pathkey->pk_nulls_first) - return; + break; /* If eclass is volatile, no hope of using an indexscan */ if (pathkey->pk_eclass->ec_has_volatile) - return; + break; /* * Try to match eclass member expression(s) to index. Note that child @@ -3145,8 +3141,8 @@ match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys, pathkey->pk_opfamily); if (expr) { - orderby_clauses = lappend(orderby_clauses, expr); - clause_columns = lappend_int(clause_columns, indexcol); + *orderby_clauses_p = lappend(*orderby_clauses_p, expr); + *clause_columns_p = lappend_int(*clause_columns_p, indexcol); found = true; break; } @@ -3156,12 +3152,9 @@ match_pathkeys_to_index(IndexOptInfo *index, List *pathkeys, break; } - if (!found)/* fail if no match for this pathkey */ + if (!found)/* end after first not matching expression */ return; } - - *orderby_clauses_p = orderby_clauses; /* success! */ - *clause_columns_p = clause_columns; } /* diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out index 0a631124c2..dcf2c0762d 100644 --- a/src/test/regress/expected/incremental_sort.out +++ b/src/test/regress/expected/incremental_sort.out @@
Re: Allowing parallel-safe initplans
On Mon, Apr 17, 2023 at 11:04 PM Tom Lane wrote: > Richard Guo writes: > > So now it seems that the breakage of regression tests is more severe > > than being cosmetic. I wonder if we need to update the comments to > > indicate the potential wrong results issue if we move the initPlans to > > the Gather node. > > I wondered about that too, but how come neither of us saw non-cosmetic > failures (ie, actual query output changes not just EXPLAIN changes) > when we tried this? Maybe the case is somehow not exercised, but if > so I'm more worried about adding regression tests than comments. Sorry I forgot to mention that I did see query output changes after moving the initPlans to the Gather node. First of all let me make sure I was doing it in the right way. On the base of the patch, I was using the diff as below if (debug_parallel_query != DEBUG_PARALLEL_OFF && - top_plan->parallel_safe && top_plan->initPlan == NIL) + top_plan->parallel_safe) { Gather *gather = makeNode(Gather); + gather->plan.initPlan = top_plan->initPlan; + top_plan->initPlan = NIL; + gather->plan.targetlist = top_plan->targetlist; And then I changed the default value of debug_parallel_query to DEBUG_PARALLEL_REGRESS. And then I just ran 'make installcheck' and saw the query output changes. > I think actually that it does work beyond the EXPLAIN weirdness, > because since e89a71fb4 the Gather machinery knows how to transmit > the values of Params listed in Gather.initParam to workers, and that > is filled in setrefs.c in a way that looks like it'd work regardless > of whether the Gather appeared organically or was stuck on by the > debug_parallel_query hackery. I've not tried to verify that > directly though. It seems that in this case the top_plan does not have any extParam, so the Gather node that is added atop the top_plan does not have a chance to get its initParam filled in set_param_references(). Thanks Richard
Re: Should we put command options in alphabetical order in the doc?
On Mon, Apr 17, 2023 at 10:45 PM David Rowley wrote: > For the case of reindex.sgml, I do see that the existing parameter > order lists INDEX | TABLE | SCHEMA | DATABASE | SYSTEM first which is > the target of the reindex. I wondered if that was worth keeping. I'm > just thinking that since all of these are under the "Parameters" > heading that we should class them all as equals and just make the > order alphabetical. I feel that if we don't do that then the order to > add any new parameters is just not going to be obvious and we'll end > up with things getting out of order again quite quickly. I don't think that alphabetical order makes much sense. Surely some parameters are more important than others. Surely there is some kind of natural grouping that makes somewhat more sense than alphabetical order. Take the VACUUM command. Right now FULL, FREEZE, and VERBOSE all come first. Those options are approximately the most important options -- especially VERBOSE. But your patch places VERBOSE dead last. -- Peter Geoghegan