Re: speed up unicode decomposition and recomposition
On Fri, Oct 23, 2020 at 08:24:06PM -0400, Tom Lane wrote: > I'd advise not putting conv_compare() between get_code_entry() and > the header comment for get_code_entry(). Looks good otherwise. Indeed. I have adjusted the position of the comment, and applied the fix. Thanks for the report. -- Michael signature.asc Description: PGP signature
Re: Tab complete for alter table rls
Thanks Michael! -- Best regards Japin Li > On Oct 24, 2020, at 9:49 AM, Michael Paquier wrote: > > On Fri, Oct 23, 2020 at 04:37:18PM +0900, Michael Paquier wrote: >> No worries. Good catch. I'll try to test that and apply it later, >> but by reading the code it looks like you got that right. > > Checked and applied on HEAD, thanks! > -- > Michael
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Heikki Linnakangas writes: > On 23/10/2020 17:51, Tom Lane wrote: >> Seems like we could have gotten rid of that by now, but when exactly >> does it become fair game? And can we have a non-ad-hoc process for >> getting rid of such cruft? > I did some grepping for strings like "version 7", "pre-8" and so on. I > couldn't come up with a clear rule on what could be removed. Context > matters. Yeah, that's unsurprising. But thanks for all the effort you put into this review! > Findings in detail follow. And attached is a patch about the stuff that > I think can be removed pretty straightforwardly. I agree with the patch, and with your other thoughts, except as noted below. > config.sgml (on synchronized_scans): > have no ORDER BY clause. Setting this parameter > to > off ensures the pre-8.3 behavior in which a > sequential > scan always starts from the beginning of the table. The default > is on. > We could remove the reference to 8.3 version. I'm inclined to keep it > though. Maybe s/pre-8.3/simple/, or some similar adjective? > func.sgml: > > > Before PostgreSQL 8.2, the containment > operators @> and <@ > were respectively > called ~ and @. These names > are still > available, but are deprecated and will eventually be removed. > > > The old names are still available, so should keep this. Perhaps it's time to actually remove those operators as threatened here? That's material for a separate discussion, though. > If the contents of two arrays are equal but the dimensionality is > different, the first difference in the dimensionality information > determines the sort order. (This is a change from versions of > PostgreSQL prior to 8.2: older versions > would claim > that two arrays with the same contents were equal, even if the > number of dimensions or subscript ranges were different.) > > Could remove it. Yeah, I'm OK with removing the parenthetical comment. > There are two differences in the behavior of > string_to_array > from pre-9.1 versions of PostgreSQL. > Feels too early to remove. +1. 9.1 was in support till ~4 years ago; 8.2 EOL'd 9 years ago. I'm not sure where to put the cutoff, but 4 years seems too little. > > > Prior to PostgreSQL 8.2, the > <, <=, > > and >= > cases were not handled per SQL specification. A comparison like > ROW(a,b) < ROW(c,d) > was implemented as > a < c AND b < d > whereas the correct behavior is equivalent to > a < c OR (a = c AND b < d). > > > Important incompatibility. Although very old. I'm inclined to keep it. > If we remove it, it'd still be useful to explain the new behavior. Yeah, even if we don't care about 8.2, some of this text is useful to clarify the behavior of row comparisons. I haven't looked at the surrounding material, but I'd not want to just delete this unless it's clearly duplicative. > As of PostgreSQL 8.4, this advice is less > necessary since delayed indexing is used (seelinkend="gin-fast-update"/> for details). But for very large updates > it may still be best to drop and recreate the index. > I think that's old enough, but the paragraph would need some > copy-editing, not just removal. Right, same deal, needs a bit of wordsmithing not just deletion. > mode is unused and > ignored as of PostgreSQL 8.1; however, for > backward compatibility with earlier releases it is best to > set it to INV_READ, INV_WRITE, > or INV_READ | > INV_WRITE. > We need to say something about 'mode'. Keep. Maybe s/as of/since/, but otherwise fine. > Data of a particular data type might be transmitted in any of several > different formats. As of > PostgreSQL 7.4 > the only supported formats are text and > binary, > but the protocol makes provision for future extensions. The desired > Could replace the "as of PostgreSQL 7.4" with "Currently", but it's not > much shorter. While it's not shorter, I think it's clearer in this context. 7.4 is far enough back that a reader might expect the next sentence to offer updated info. > > > Another example — the PostgreSQL > mailing > list archives contained 910,989 unique words with 57,491,343 lexemes in > 461,020 messages. > > Refresh the numbers. I agree with the comment: if we keep this, there should be an "as of" date associated with the numbers. Thanks again for slogging through that! regards, tom lane
Re: Tab complete for alter table rls
On Fri, Oct 23, 2020 at 04:37:18PM +0900, Michael Paquier wrote: > No worries. Good catch. I'll try to test that and apply it later, > but by reading the code it looks like you got that right. Checked and applied on HEAD, thanks! -- Michael signature.asc Description: PGP signature
Re: speed up unicode decomposition and recomposition
Michael Paquier writes: > On Fri, Oct 23, 2020 at 04:18:13PM -0700, Mark Dilger wrote: >> On Oct 23, 2020, at 9:07 AM, Tom Lane wrote: >>> genhtml: WARNING: function data mismatch at >>> /home/postgres/pgsql/src/common/unicode_norm.c:102 > I can see the problem on Debian GID with lcov 1.14-2. This points to > the second declaration of get_code_entry(). I think that genhtml, > because it considers the code of unicode_norm.c as a whole without its > CFLAGS, gets confused because it finds the same function to index as > defined twice. It expects only one definition, hence the warning. So > I think that this can lead to some incorrect data in the HTML report, > and the attached patch takes care of fixing that. Tom, does it take > care of the issue on your side? Good catch! Yeah, that fixes it for me. I'd advise not putting conv_compare() between get_code_entry() and the header comment for get_code_entry(). Looks good otherwise. regards, tom lane
Re: minor problem in boolean cast
Cary Huang writes: > I noticed that when casting a string to boolean value with input 'of' it > still cast it to 'f'. I think with 'of', it should give an error because > 'off' is the expected candidate. This may not be intended so I made a simple > patch to address this. It's absolutely intended, and documented: https://www.postgresql.org/docs/devel/datatype-boolean.html Note the bit about "Unique prefixes of these strings are also accepted". The code comment just above parse_bool() says the same. regards, tom lane
Re: speed up unicode decomposition and recomposition
On Fri, Oct 23, 2020 at 04:18:13PM -0700, Mark Dilger wrote: > On Oct 23, 2020, at 9:07 AM, Tom Lane wrote: >> genhtml: WARNING: function data mismatch at >> /home/postgres/pgsql/src/common/unicode_norm.c:102 >> >> I've never seen anything like that before. I suppose it means that >> something about 783f0cc64 is a bit fishy, but I don't know what. >> >> The emitted coverage report looks fairly normal anyway. It says >> unicode_norm.c has zero test coverage, which is very possibly correct >> since I wasn't running in UTF8 encoding, but I'm not entirely sure of >> that either. > > I don't see it on mac nor on ubuntu64. I get 70.6% coverage of > lines and 90.9% of functions on ubuntu. I can see the problem on Debian GID with lcov 1.14-2. This points to the second declaration of get_code_entry(). I think that genhtml, because it considers the code of unicode_norm.c as a whole without its CFLAGS, gets confused because it finds the same function to index as defined twice. It expects only one definition, hence the warning. So I think that this can lead to some incorrect data in the HTML report, and the attached patch takes care of fixing that. Tom, does it take care of the issue on your side? -- Michael diff --git a/src/common/unicode_norm.c b/src/common/unicode_norm.c index 4ffce0e619..7cc8faa63a 100644 --- a/src/common/unicode_norm.c +++ b/src/common/unicode_norm.c @@ -53,11 +53,26 @@ * The backend version of this code uses a perfect hash function for the * lookup, while the frontend version uses a binary search. */ -#ifndef FRONTEND +#ifdef FRONTEND +/* comparison routine for bsearch() of decomposition lookup table. */ +static int +conv_compare(const void *p1, const void *p2) +{ + uint32 v1, +v2; + + v1 = *(const uint32 *) p1; + v2 = ((const pg_unicode_decomposition *) p2)->codepoint; + return (v1 > v2) ? 1 : ((v1 == v2) ? 0 : -1); +} + +#endif static const pg_unicode_decomposition * get_code_entry(pg_wchar code) { +#ifndef FRONTEND + int h; uint32 hashkey; pg_unicode_decompinfo decompinfo = UnicodeDecompInfo; @@ -82,33 +97,17 @@ get_code_entry(pg_wchar code) /* Success! */ return &decompinfo.decomps[h]; -} #else -/* comparison routine for bsearch() of decomposition lookup table. */ -static int -conv_compare(const void *p1, const void *p2) -{ - uint32 v1, -v2; - - v1 = *(const uint32 *) p1; - v2 = ((const pg_unicode_decomposition *) p2)->codepoint; - return (v1 > v2) ? 1 : ((v1 == v2) ? 0 : -1); -} - -static const pg_unicode_decomposition * -get_code_entry(pg_wchar code) -{ return bsearch(&(code), UnicodeDecompMain, lengthof(UnicodeDecompMain), sizeof(pg_unicode_decomposition), conv_compare); +#endif } -#endif /* !FRONTEND */ /* * Given a decomposition entry looked up earlier, get the decomposed signature.asc Description: PGP signature
minor problem in boolean cast
Hi I noticed that when casting a string to boolean value with input 'of' it still cast it to 'f'. I think with 'of', it should give an error because 'off' is the expected candidate. This may not be intended so I made a simple patch to address this. ``` postgres=# select cast('of' as boolean); bool -- f (1 row) ``` Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca 0001-boolean-type-cast-fix.patch Description: Binary data
Re: new heapcheck contrib module
> On Oct 23, 2020, at 4:12 PM, Tom Lane wrote: > > Mark Dilger writes: >> You certainly appear to be right about that. I've added the extra checks, >> and extended the regression test to include them. Patch attached. > > Pushed with some more fooling about. The "bit reversal" idea is not > a sufficient guide to picking values that will hit all the code checks. > For instance, I was seeing out-of-range warnings on one endianness and > not the other because on the other one the maxalign check rejected the > value first. I ended up manually tweaking the corruption patterns > until they hit all the tests on both endiannesses. Pretty much the > opposite of black-box testing, but it's not like our notions of line > pointer layout are going to change anytime soon. > > I made some logic rearrangements in the C code, too. Thanks Tom! And Peter, your comment earlier save me some time. Thanks to you, also! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: speed up unicode decomposition and recomposition
> On Oct 23, 2020, at 9:07 AM, Tom Lane wrote: > > I chanced to do an --enable-coverage test run today, and I got this > weird message during "make coverage-html": > > genhtml: WARNING: function data mismatch at > /home/postgres/pgsql/src/common/unicode_norm.c:102 > > I've never seen anything like that before. I suppose it means that > something about 783f0cc64 is a bit fishy, but I don't know what. > > The emitted coverage report looks fairly normal anyway. It says > unicode_norm.c has zero test coverage, which is very possibly correct > since I wasn't running in UTF8 encoding, but I'm not entirely sure of > that either. > > This is with RHEL8's lcov-1.13-4.el8 package. I suppose the first > question is does anybody else see that? I don't see it on mac nor on ubuntu64. I get 70.6% coverage of lines and 90.9% of functions on ubuntu. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
Mark Dilger writes: > You certainly appear to be right about that. I've added the extra checks, > and extended the regression test to include them. Patch attached. Pushed with some more fooling about. The "bit reversal" idea is not a sufficient guide to picking values that will hit all the code checks. For instance, I was seeing out-of-range warnings on one endianness and not the other because on the other one the maxalign check rejected the value first. I ended up manually tweaking the corruption patterns until they hit all the tests on both endiannesses. Pretty much the opposite of black-box testing, but it's not like our notions of line pointer layout are going to change anytime soon. I made some logic rearrangements in the C code, too. regards, tom lane
Re: new heapcheck contrib module
Mark Dilger writes: >> On Oct 23, 2020, at 11:51 AM, Tom Lane wrote: >> I do not have 64-bit big-endian hardware to play with unfortunately. >> But what I suspect is happening here is less about endianness and >> more about alignment pickiness; or maybe we were unlucky enough to >> index off the end of the shmem segment. > You certainly appear to be right about that. I've added the extra checks, > and extended the regression test to include them. Patch attached. Meanwhile, I've replicated the SIGBUS problem on gaur's host, so that's definitely what's happening. (Although PPC is also alignment-picky on the hardware level, I believe that both macOS and Linux try to mask that by having kernel trap handlers execute unaligned accesses, leaving only a nasty performance loss behind. That's why I failed to see this effect when checking your previous patch on an old Apple box. We likely won't see it in the buildfarm either, unless maybe on Noah's AIX menagerie.) I'll check this patch on gaur and push it if it's clean. regards, tom lane
Re: Mop-up around psql's \connect behavior
Kyotaro Horiguchi writes: > At Thu, 22 Oct 2020 15:23:04 -0400, Tom Lane wrote in >> ... The only real objection I can see is that it could >> hold a server connection open when the user thinks there is none; >> but that could only happen in a non-interactive script, and it does >> not seem like a big problem in that case. We could alternatively >> not stash the "dead" connection after a non-interactive \connect >> failure, but I doubt that's better. > Agreed. Thanks! After further thought I decided we *must* do it as per my "alternative" idea. Consider a script containing \c db1 user1 live_server \c db2 user2 dead_server \c db3 The script would be expecting to connect to db3 at dead_server, but if we re-use parameters from the first connection then it might successfully connect to db3 at live_server. This'd defeat the goal of not letting a script accidentally execute commands against the wrong database. So we have to not save the connection after a failed script \connect. However, it seems OK to save after a connection loss whether we're in a script or not; that is, \c db1 user1 server1 ... (connection dies here) ... --- these commands will fail \c db2 The script will be expecting the second \c to re-use parameters from the first one, and that will still work as expected. I went ahead and pushed it after adjusting that. regards, tom lane
Re: new heapcheck contrib module
> On Oct 23, 2020, at 11:51 AM, Tom Lane wrote: > > Hmm, we're not out of the woods yet: thorntail is even less happy > than before. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-23%2018%3A08%3A11 > > I do not have 64-bit big-endian hardware to play with unfortunately. > But what I suspect is happening here is less about endianness and > more about alignment pickiness; or maybe we were unlucky enough to > index off the end of the shmem segment. I see that verify_heapam > does this for non-redirect tuples: > >/* Set up context information about this next tuple */ >ctx.lp_len = ItemIdGetLength(ctx.itemid); >ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); >ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); > > with absolutely no thought for the possibility that lp_off is out of > range or not maxaligned. The checks for a sane lp_len seem to have > gone missing as well. You certainly appear to be right about that. I've added the extra checks, and extended the regression test to include them. Patch attached. v23-0001-Sanity-checking-line-pointers.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
On 23/10/2020 17:51, Tom Lane wrote: But anyway, this was about documentation not code. What I'm wondering about is when to drop things like, say, this bit in the regex docs: Two significant incompatibilities exist between AREs and the ERE syntax recognized by pre-7.4 releases of PostgreSQL: (etc etc) Seems like we could have gotten rid of that by now, but when exactly does it become fair game? And can we have a non-ad-hoc process for getting rid of such cruft? Let's try to zoom in on a rule: Anything that talks about 9.4 or above (min supported version - 1) should definitely be left in place. Something around 9.0 is possibly still useful to someone upgrading or updating an application. Or someone might still bump into old blog posts from that era. Before that, I don't see much value. Although you could argue that I jumped the gun on the notice about pre-8.2 pg_dump -t behavior. pg_dump still supports servers down to 8.0, so someone might also have an 8.0 pg_dump binary lying around, and might be confused that -t behaves differently. On the whole though, I think removing it was fair game. I did some grepping for strings like "version 7", "pre-8" and so on. I couldn't come up with a clear rule on what could be removed. Context matters. In text that talks about protocol versions or libpq functions like PQlibVersion() it seems sensible to go back as far as possible, for the completeness. And subtle user-visible differences in behavior are more important to document than changes in internal C APIs that cause a compiler failure, for example. Other notices are about old syntax that's kept for backwards compatibility, but still works. It makes sense to mention the old version in those cases, even if it's very old, because the alternative would be to just say something like "very old version", which is not any shorter, just less precise. Findings in detail follow. And attached is a patch about the stuff that I think can be removed pretty straightforwardly. array.sgml: If the value written for an element is NULL (in any case variant), the element is taken to be NULL. The presence of any quotes or backslashes disables this and allows the literal string value NULL to be entered. Also, for backward compatibility with pre-8.2 versions of PostgreSQL, the configuration parameter can be turned off to suppress recognition of NULL as a NULL. The GUC still exists, so we should keep this. catalogs.sgml: The view pg_group exists for backwards compatibility: it emulates a catalog that existed in PostgreSQL before version 8.1. It shows the names and members of all roles that are marked as not rolcanlogin, which is an approximation to the set of roles that are being used as groups. pg_group still exists, and that paragraph explains why. We should keep it. (There's a similar paragraph for pg_shadow) config.sgml (on synchronized_scans): This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then wrap around the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have no ORDER BY clause. Setting this parameter to off ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default is on. We could remove the reference to 8.3 version. I'm inclined to keep it though. func.sgml (String Functions and Operators): Before PostgreSQL 8.3, these functions would silently accept values of several non-string data types as well, due to the presence of implicit coercions from those data types to text. Those coercions have been removed because they frequently caused surprising behaviors. However, the string concatenation operator (||) still accepts non-string input, so long as at least one input is of a string type, as shown in . For other cases, insert an explicit coercion to text if you need to duplicate the previous behavior. Could remove the reference to 8.3, but the information about || still makes sense. I'm inclined to just keep it. func.sgml: Before PostgreSQL 8.2, the containment operators @> and <@ were respectively called ~ and @. These names are still available, but are deprecated and will eventually be removed. The old names are still available, so should keep this. func.sgml: Before PostgreSQL 8.1, the arguments of the sequence functions were of type text, not regclass, and the above-described conve
Re: new heapcheck contrib module
On Fri, Oct 23, 2020 at 11:51 AM Tom Lane wrote: > /* Set up context information about this next tuple */ > ctx.lp_len = ItemIdGetLength(ctx.itemid); > ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); > ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); > > with absolutely no thought for the possibility that lp_off is out of > range or not maxaligned. The checks for a sane lp_len seem to have > gone missing as well. That is surprising. verify_nbtree.c has PageGetItemIdCareful() for this exact reason. -- Peter Geoghegan
Re: new heapcheck contrib module
Hmm, we're not out of the woods yet: thorntail is even less happy than before. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-23%2018%3A08%3A11 I do not have 64-bit big-endian hardware to play with unfortunately. But what I suspect is happening here is less about endianness and more about alignment pickiness; or maybe we were unlucky enough to index off the end of the shmem segment. I see that verify_heapam does this for non-redirect tuples: /* Set up context information about this next tuple */ ctx.lp_len = ItemIdGetLength(ctx.itemid); ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); with absolutely no thought for the possibility that lp_off is out of range or not maxaligned. The checks for a sane lp_len seem to have gone missing as well. regards, tom lane
Re: [var]char versus character [varying]
James Coleman writes: > I've been wondering recently why the external canonical form of types > like char and varchar doesn't match the typname in pg_type. Mostly because the SQL standard wants certain spellings, some of which aren't even single words (e.g. DOUBLE PRECISION). There are cases where we could have changed internal names to match up with the spec name, but that won't work for all cases, and people have some attachment to the existing names anyway. > But I'm not following what would actually break if it weren't done > this way. Is the issue that a user defined type (in a different > schema, perhaps?) could overshadow the system type? That's one thing, and the rules about typmods are another. For instance the spec says that BIT without any other decoration means BIT(1), so that we have this: regression=# select '111'::bit; bit - 1 (1 row) versus regression=# select '111'::"bit"; bit - 111 (1 row) The latter means "bit without any length constraint", which is something the spec doesn't actually support. So when we have bit with typmod -1, we must spell it "bit" with quotes. > And would it make more sense (though I'm not volunteering right now to > write such a patch :D) to have these names be an additional column on > pg_type so that they can be queried by the user? Not particularly, because some of these types actually have several different spec-approved spellings, eg VARCHAR, CHAR VARYING, CHARACTER VARYING are all in the standard. regards, tom lane
Re: new heapcheck contrib module
> On Oct 23, 2020, at 11:04 AM, Tom Lane wrote: > > I wrote: >> Mark Dilger writes: >>> The patch I *should* have attached last night this time: > >> Thanks, I'll do some big-endian testing with this. > > Seems to work, so I pushed it (after some compulsive fooling > about with whitespace and perltidy-ing). Thanks for all the help! > It appears to me that > the code coverage for verify_heapam.c is not very good though, > only circa 50%. Do we care to expend more effort on that? Part of the issue here is that I developed the heapcheck code as a sequence of patches, and there is much greater coverage in the tests in the 0002 patch, which hasn't been committed yet. (Nor do I know that it ever will be.) Over time, the patch set became: 0001 -- adds verify_heapam.c to contrib/amcheck, with basic test coverage 0002 -- adds pg_amcheck command line interface to contrib/pg_amcheck, with more extensive test coverage 0003 -- creates a non-throwing interface to clog 0004 -- uses the non-throwing clog interface from within verify_heapam 0005 -- adds some controversial acl checks to verify_heapam Your question doesn't have much to do with 3,4,5 above, but it definitely matters whether we're going to commit 0002. The test in that patch, in contrib/pg_amcheck/t/004_verify_heapam.pl, does quite a bit of bit twiddling of heap tuples and toast records and checks that the right corruption messages come back. Part of the reason I was trying to keep 0001's t/001_verify_heapam.pl test ignorant of the exact page layout information is that I already had this other test that covers that. So, should I port that test from (currently non-existant) contrib/pg_amcheck into contrib/amcheck, or should we wait to see if the 0002 patch is going to get committed? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: vacuum -vs reltuples on insert only index
On Fri, Oct 23, 2020 at 8:51 AM Jehan-Guillaume de Rorthais wrote: > Before 0d861bbb70, btvacuumpage was adding to relation stats the number of > leaving lines in the block using: > > stats->num_index_tuples += maxoff - minoff + 1; > > After 0d861bbb70, it is set using new variable nhtidslive: > > stats->num_index_tuples += nhtidslive > > However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback) > is set, which seems not to be the case on select-only workload. I agree that that's a bug. > A naive fix might be to use "maxoff - minoff + 1" when callback==NULL. The problem with that is that we really should use nhtidslive (or something like it), and we're not really willing to do the work to get that information when callback==NULL. We could use "maxoff - minoff + 1" in the way you suggest, but that will be only ~30% of what nhtidslive would be in pages where deduplication is maximally effective (which is not at all uncommon -- you only need about 10 TIDs per distinct value for the space savings to saturate like this). GIN does this for cleanup (but not for delete, which has a real count available): /* * XXX we always report the heap tuple count as the number of index * entries. This is bogus if the index is partial, but it's real hard to * tell how many distinct heap entries are referenced by a GIN index. */ stats->num_index_tuples = Max(info->num_heap_tuples, 0); stats->estimated_count = info->estimated_count; I suspect that we need to move in this direction within nbtree. I'm a bit concerned about the partial index problem, though. I suppose maybe we could do it the old way (which won't account for posting list tuples) during cleanup as you suggest, but only use the final figure when it turns out to have been a partial indexes. For other indexes we could do what GIN does here. Anybody else have thoughts on this? -- Peter Geoghegan
[var]char versus character [varying]
I've been wondering recently why the external canonical form of types like char and varchar doesn't match the typname in pg_type. Additionally, the alternative/extended names are hardcoded in format_type.c rather than being an additional column in that catalog table. I would have assumed there were largely historical reasons for this, but I see the following relevant comments in that file: /* * See if we want to special-case the output for certain built-in types. * Note that these special cases should all correspond to special * productions in gram.y, to ensure that the type name will be taken as a * system type, not a user type of the same name. * * If we do not provide a special-case output here, the type name will be * handled the same way as a user type name --- in particular, it will be * double-quoted if it matches any lexer keyword. This behavior is * essential for some cases, such as types "bit" and "char". */ But I'm not following what would actually break if it weren't done this way. Is the issue that a user defined type (in a different schema, perhaps?) could overshadow the system type? And would it make more sense (though I'm not volunteering right now to write such a patch :D) to have these names be an additional column on pg_type so that they can be queried by the user? Thanks, James
Re: new heapcheck contrib module
I wrote: > Mark Dilger writes: >> The patch I *should* have attached last night this time: > Thanks, I'll do some big-endian testing with this. Seems to work, so I pushed it (after some compulsive fooling about with whitespace and perltidy-ing). It appears to me that the code coverage for verify_heapam.c is not very good though, only circa 50%. Do we care to expend more effort on that? regards, tom lane
Re: Deleting older versions in unique indexes to avoid page splits
On Fri, Oct 23, 2020 at 9:03 AM Simon Riggs wrote: > Please publish details of how long a pre-split cleaning operation > takes and what that does to transaction latency. It *might* be true > that the cost of avoiding splits is worth it in balance against the > cost of splitting, but it might not. I don't think that you understand how the patch works. I cannot very well isolate that cost because the patch is designed to only pay it when there is a strong chance of getting a much bigger reward, and when the only alternative is to split the page. When it fails the question naturally doesn't come up again for the same two pages that follow from the page split. As far as I know the only cases that are regressed all involve small indexes with lots of contention, which is not surprising. And not necessarily due to the heap page accesses - making indexes smaller sometimes has that effect, even when it happens due to things like better page split heuristics. If anybody finds a serious problem with my patch then it'll be a weakness or hole in the argument I just made -- it won't have much to do with how expensive any of these operations are in isolation. It usually isn't sensible to talk about page splits as isolated things. Most of my work on B-Trees in the past couple of years built on the observation that sometimes successive page splits are related to each other in one way or another. It is a fallacy of composition to think of the patch as a thing that prevents some page splits. The patch is valuable because it more or less eliminates *unnecessary* page splits (and makes it so that there cannot be very many TIDs for each logical row in affected indexes). The overall effect is not linear. If you added code to artificially make the mechanism fail randomly 10% of the time (at the point where it becomes clear that the current operation would otherwise be successful) that wouldn't make the resulting code 90% as useful as the original. It would actually make it approximately 0% as useful. On human timescales the behavioral difference between this hobbled version of my patch and the master branch would be almost imperceptible. It's obvious that a page split is more expensive than the delete operation (when it works out). It doesn't need a microbenchmark (and I really can't think of one that would make any sense). Page splits typically have WAL records that are ~4KB in size, whereas the opportunistic delete records are almost always less than 100 bytes, and typically close to 64 bytes -- which is the same size as most individual leaf page insert WAL records. Plus you have almost double the FPIs going forward with the page split. > You've shown us a very nice paper analysing the page split waves, but > we need a similar detailed analysis so we can understand if what you > propose is better or not (and in what situations). That paper was just referenced in passing. It isn't essential to the main argument. > The leaf page locks are held for longer, so we need to perform > sensible tests that show if this has a catastrophic effect on related > workloads, or not. > > The SELECT tests proposed need to be aimed at the same table, at the same > time. But that's exactly what I did the first time! I had two SELECT statements against the same table. They use almost the same distribution as the UPDATE, so that they'd hit the same part of the key space without it being exactly the same as the UPDATE from the same xact in each case (I thought that if it was exactly the same part of the table then that might unfairly favor my patch). > > The strength of the design is in how clever it isn't. > > What it doesn't do could be good or bad so we need to review more > details on behavior. Since the whole idea of the patch is to change > behavior, that seems a reasonable ask. I don't have any doubt about > the validity of the approach or coding. I agree, but the patch isn't the sum of its parts. You need to talk about a workload or a set of conditions, and how things develop over time. -- Peter Geoghegan
Re: The ultimate extension hook.
On Thu, 24 Sep 2020 17:08:44 +1200 David Rowley wrote: [...] > I wondered if there was much in the way of use-cases like a traffic > filter, or statement replication. I wasn't sure if it was a solution > looking for a problem or not, but it seems like it could be productive > to talk about possibilities here and make a judgement call based on if > any alternatives exist today that will allow that problem to be solved > sufficiently in another way. If I understand correctly the proposal, this might enable traffic capture using a loadable extension. This kind of usage would allows to replay and validate any kind of traffic from a major version to another one. Eg. to look for regressions from the application point of view, before a major upgrade. I did such regression tests in past. We were capturing production traffic using libpcap and replay it using pgshark on upgraded test env. Very handy. However: * libpcap can drop network packet during high load. This make the capture painful to recover past the hole. * useless with encrypted traffic So, +1 for such hooks. Regards,
Re: [PATCH] Add section headings to index types doc
On Fri, Oct 23, 2020 at 3:18 AM Jürgen Purtz wrote: > and add a link to the "CREATE INDEX" command from the chapter preamble. > > is the link necessary? > I suppose it would make more sense to add it to the previous section - the introduction page. I do think having a link (or more than one) to CREATE INDEX from the Indexes chapter is reader friendly. Having links to SQL commands is never truly necessary - the reader knows a SQL command reference exists and the name of the command allows them to find the correct page. David J.
Re: speed up unicode decomposition and recomposition
I chanced to do an --enable-coverage test run today, and I got this weird message during "make coverage-html": genhtml: WARNING: function data mismatch at /home/postgres/pgsql/src/common/unicode_norm.c:102 I've never seen anything like that before. I suppose it means that something about 783f0cc64 is a bit fishy, but I don't know what. The emitted coverage report looks fairly normal anyway. It says unicode_norm.c has zero test coverage, which is very possibly correct since I wasn't running in UTF8 encoding, but I'm not entirely sure of that either. This is with RHEL8's lcov-1.13-4.el8 package. I suppose the first question is does anybody else see that? regards, tom lane
Re: Deleting older versions in unique indexes to avoid page splits
On Thu, 22 Oct 2020 at 18:42, Peter Geoghegan wrote: > > The average latency is x2. What is the distribution of latencies? > > Occasional very long or all uniformly x2? > > The latency is generally very even with the patch. There is a constant > hum of cleanup by the new mechanism in the case of the benchmark > workload. As opposed to a cascade of page splits, which occur in > clearly distinct correlated waves. Please publish details of how long a pre-split cleaning operation takes and what that does to transaction latency. It *might* be true that the cost of avoiding splits is worth it in balance against the cost of splitting, but it might not. You've shown us a very nice paper analysing the page split waves, but we need a similar detailed analysis so we can understand if what you propose is better or not (and in what situations). > > I would guess that holding the page locks will also slow down SELECT > > workload, so I think you should also report that workload as well. > > > > Hopefully that will be better in the latest version. > > But the same benchmark that you're asking about here has two SELECT > statements and only one UPDATE. It already is read-heavy in that > sense. And we see that the latency is also significantly improved for > the SELECT queries. > > Even if there was often a performance hit rather than a benefit (which > is definitely not what we see), it would still probably be worth it. > Users create indexes for a reason. I believe that we are obligated to > maintain the indexes to a reasonable degree, and not just when it > happens to be convenient to do so in passing. The leaf page locks are held for longer, so we need to perform sensible tests that show if this has a catastrophic effect on related workloads, or not. The SELECT tests proposed need to be aimed at the same table, at the same time. > The strength of the design is in how clever it isn't. What it doesn't do could be good or bad so we need to review more details on behavior. Since the whole idea of the patch is to change behavior, that seems a reasonable ask. I don't have any doubt about the validity of the approach or coding. What you've done so far is very good and I am very positive about this, well done. -- Simon Riggshttp://www.EnterpriseDB.com/
vacuum -vs reltuples on insert only index
Hello, I've found a behavior change with pg_class.reltuples on btree index. With only insert activity on a table, when an index is processed, its related reltuples is set to 0. Here is a demo script: -- force index cleanup set vacuum_cleanup_index_scale_factor to 0; drop table if exists t; create table t as select i from generate_series(1, 100) i; create index t_i on t(i); -- after index creation its reltuples is correct select reltuples from pg_class where relname = 't_i' -- result: reltuples | 100 -- vacuum set index reltuples to 0 vacuum t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 0 -- analyze set it back to correct value analyze t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 100 -- insert + vacuum reset it again to 0 insert into t values(101); vacuum (verbose off, analyze on, index_cleanup on) t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 0 -- delete + vacuum set it back to correct value delete from t where i=10; vacuum (verbose off, analyze on, index_cleanup on) t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 100 -- and back to 0 again with insert+vacuum insert into t values(102); vacuum (verbose off, analyze on, index_cleanup on) t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 0 Before 0d861bbb70, btvacuumpage was adding to relation stats the number of leaving lines in the block using: stats->num_index_tuples += maxoff - minoff + 1; After 0d861bbb70, it is set using new variable nhtidslive: stats->num_index_tuples += nhtidslive However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback) is set, which seems not to be the case on select-only workload. A naive fix might be to use "maxoff - minoff + 1" when callback==NULL. Thoughts? Regards,
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > We do need to decide at what point we're going to move forward pg_dump's > > oldest server version support. > > I'm not really in a big hurry to move it forward at all. There were > good solid reasons to drop support for pre-schema and pre-pg_depend > servers, because of the messy kluges pg_dump had to implement > to provide only-partial workarounds for those lacks. But I don't > see comparable reasons or code savings that we'll get from dropping > later versions. > > There is an argument for dropping support for server versions that > fail to build anymore with modern toolchains, since once that happens > it becomes difficult to test, unless you have old executables already > laying around. But I don't think we're at that point yet for 8.0 or > later. (I rebuilt 7.4 and later when I updated my workstation to > RHEL8 a few months ago, and they seem fine, though I did use -O0 out of > fear of -faggressive-loop-optimizations bugs for anything before 8.2.) Along those same lines though- keeping all of the versions working with pg_dump requires everyone who is working with pg_dump to have those old versions not just able to compile but to also take the time to test against those older versions when making changes. > But anyway, this was about documentation not code. Perhaps it didn't come across very well, but I was making an argument that we should consider them both under a general "every 5 years, go through and clean out anything that's older than 10 years" type of policy. I don't know that we need to spend time doing it every year, but I wouldn't be against it either. > What I'm wondering > about is when to drop things like, say, this bit in the regex docs: > > Two significant incompatibilities exist between AREs and the ERE syntax > recognized by pre-7.4 releases of PostgreSQL: > (etc etc) > > Seems like we could have gotten rid of that by now, but when exactly > does it become fair game? And can we have a non-ad-hoc process for > getting rid of such cruft? I agree we should get rid of it and I'm suggesting our policy be that we only go back about 10 years. As for the process part, I suggested that we make it a every-5-year thing, but we could make it be part of the annual process instead. We have a number of general tasks that go into each major release and some of that process is documented, though it seems like a lot isn't as explicitly spelled out as perhaps it should be. Here I'm thinking about things like: - Get a CFM for each commitfest - Form an RMT for each major release - Figure out who will run each major/minor release - Get translations done - Review contributors to see who might become a committer - other things, I'm sure "Clean up documentation and remove things older than 10 years" could be another item to get checked off each year. We might consider looking at Debian- https://wiki.debian.org/Teams/ReleaseTeam and https://wiki.debian.org/Teams/ReleaseTeam/ReleaseCheckList Perhaps the past RMTs have thought about this also. Having these things written down and available would be good though, and then we should make sure that they're assigned out and get addressed (maybe that becomes part of what the RMT does, maybe not). Thanks, Stephen signature.asc Description: PGP signature
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Stephen Frost writes: > We do need to decide at what point we're going to move forward pg_dump's > oldest server version support. I'm not really in a big hurry to move it forward at all. There were good solid reasons to drop support for pre-schema and pre-pg_depend servers, because of the messy kluges pg_dump had to implement to provide only-partial workarounds for those lacks. But I don't see comparable reasons or code savings that we'll get from dropping later versions. There is an argument for dropping support for server versions that fail to build anymore with modern toolchains, since once that happens it becomes difficult to test, unless you have old executables already laying around. But I don't think we're at that point yet for 8.0 or later. (I rebuilt 7.4 and later when I updated my workstation to RHEL8 a few months ago, and they seem fine, though I did use -O0 out of fear of -faggressive-loop-optimizations bugs for anything before 8.2.) But anyway, this was about documentation not code. What I'm wondering about is when to drop things like, say, this bit in the regex docs: Two significant incompatibilities exist between AREs and the ERE syntax recognized by pre-7.4 releases of PostgreSQL: (etc etc) Seems like we could have gotten rid of that by now, but when exactly does it become fair game? And can we have a non-ad-hoc process for getting rid of such cruft? regards, tom lane
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
2020年10月23日(金) 23:12 Stephen Frost : > > Greetings, > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Stephen Frost writes: > > > Isn't this a bit pre-mature as we still support running pg_dump against > > > 8.0 clusters..? > > > > The removed para was discussing the behavior of pg_dump itself. What > > server version you run it against isn't relevant. > > Ah, alright, that makes a bit more sense then.. Yes, it's removing a note regarding a behavioural change between pg_dump introduced in 8.2. This will severely inconvenience anyone who has emerged from a coma they fell into before December 2006 and who is just getting to grips with the brave new world of post-8.1 pg_dump, but anyone running pg_dump against an 8.x server has hopefully caught up with the change sometime during the last 14 years. > > Having said that, there are a *lot* of past-their-sell-by-date bits > > of info throughout our documentation, because we don't have any sort > > of policy or mechanism for getting rid of this kind of backwards > > compatibility note. Maybe we should first try to agree on a policy > > for when it's okay to remove such info. > > I would have thought the general policy would be "match what the tool > works with", so if we've got references to things about how pg_dump > works against older-than-8.0 then we should clearly remove those as > pg_dump no londer will run against versions that old. > > Extending that to more general notes would probably make sense though. > That is- we'll keep anything relevant to the oldest version that pg_dump > runs against (since I'm pretty sure pg_dump's compatibility goes the > farthest back of anything we've got in core and probably always will). Obviously any references to supporting functionality which is no longer actually supported should be updated/removed. Any notes about behavioural differences between two versions no longer under community support (such as the bit removed by this patch) seems like fair game (though I'm sure there are exceptions). However I'm not sure what else there is out there which needs consideration. > We do need to decide at what point we're going to move forward pg_dump's > oldest server version support. (...) I suggest starting a new thread for that. Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com
Re: new heapcheck contrib module
Mark Dilger writes: > The patch I *should* have attached last night this time: Thanks, I'll do some big-endian testing with this. regards, tom lane
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > Isn't this a bit pre-mature as we still support running pg_dump against > > 8.0 clusters..? > > The removed para was discussing the behavior of pg_dump itself. What > server version you run it against isn't relevant. Ah, alright, that makes a bit more sense then.. > Having said that, there are a *lot* of past-their-sell-by-date bits > of info throughout our documentation, because we don't have any sort > of policy or mechanism for getting rid of this kind of backwards > compatibility note. Maybe we should first try to agree on a policy > for when it's okay to remove such info. I would have thought the general policy would be "match what the tool works with", so if we've got references to things about how pg_dump works against older-than-8.0 then we should clearly remove those as pg_dump no londer will run against versions that old. Extending that to more general notes would probably make sense though. That is- we'll keep anything relevant to the oldest version that pg_dump runs against (since I'm pretty sure pg_dump's compatibility goes the farthest back of anything we've got in core and probably always will). We do need to decide at what point we're going to move forward pg_dump's oldest server version support. I had thought we would do that with each top-level major version change (eg: support 8.0+ until we reach 11.0 or someting), but that doesn't work since we've moved to a single integer for major versions. Looking at the timeline though: 2016-10-12: 64f3524e2c8deebc02808aa5ebdfa17859473add Removed pre-8.0 2005-01-19: 8.0 released So, that's about 10 years. 2010-09-20: 9.0 released Or about 10 years from today, which seems to me to imply we should probably be considering moving pg_dump forward already. I'm not really inclined to do this every year as I don't really think it's helpful, but once every 5 years or so probably makes sense. To be a bit more specific about my thoughts: - Move pg_dump up to 9.0 as the required minimum, starting with v14. - In about 5 years or so, move pg_dump up to minimum of v10. (clean up all documentation with older references and such too) If we wanted to be particularly cute about it, we could wait until v15 to drop support for older-than-9.0, and then v20 would remove support for older-than-10, and then v25 would remove support for older-than-v15, etc. Thanks, Stephen signature.asc Description: PGP signature
Re: new heapcheck contrib module
> On Oct 22, 2020, at 9:21 PM, Mark Dilger wrote: > > > >> On Oct 22, 2020, at 7:01 PM, Tom Lane wrote: >> >> Mark Dilger writes: >>> Ahh, crud. It's because >>> syswrite($fh, '\x77\x77\x77\x77', 500) >>> is wrong twice. The 500 was wrong, but the string there isn't the bit >>> pattern we want -- it's just a string literal with backslashes and such. >>> It should have been double-quoted. >> >> Argh. So we really have, using same test except >> >> memcpy(&lp, "\\x77", sizeof(lp)); >> >> little endian: off = 785c, flags = 2, len = 1b9b >> big endian: off = 2e3c, flags = 0, len = 3737 >> >> which explains the apparent LP_DEAD result. >> >> I'm not particularly on board with your suggestion of "well, if it works >> sometimes then it's okay". Then we have no idea of what we really tested. >> >> regards, tom lane > > Ok, I've pruned it down to something you may like better. Instead of just > checking that *some* corruption occurs, it checks the returned corruption > against an expected regex, and if it fails to match, you should see in the > logs what you got vs. what you expected. > > It only corrupts the first two line pointers, the first one with 0x > and the second one with 0x, which are consciously chosen to be > bitwise reverses of each other and just strings of alternating bits rather > than anything that could have a more complicated interpretation. > > On my little-endian mac, the 0x value creates a line pointer which > redirects to an invalid offset 0x, which gets reported as decimal 30583 > in the corruption report, "line pointer redirection to item at offset 30583 > exceeds maximum offset 38". The test is indifferent to whether the > corruption it is looking for is reported relative to the first line pointer > or the second one, so if endian-ness matters, it may be the 0x that > results in that corruption message. I don't have a machine handy to test > that. It would be nice to determine the minimum amount of paranoia necessary > to make this portable and not commit the rest. Obviously, that should have said 0x and 0x. After writing the patch that way, I checked that the old value 0x also works on my mac, which it does, and checked that writing the line pointers starting at offset 24 rather than 32 works on my mac, which it does, and then went on to write this rather confused email and attached the patch with those changes, which all work (at least on my mac) but are potentially less portable than what I had before testing those changes. I apologize for any confusion my email from last night may have caused. The patch I *should* have attached last night this time: regress.patch.WIP.2 Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Stephen Frost writes: > Isn't this a bit pre-mature as we still support running pg_dump against > 8.0 clusters..? The removed para was discussing the behavior of pg_dump itself. What server version you run it against isn't relevant. Having said that, there are a *lot* of past-their-sell-by-date bits of info throughout our documentation, because we don't have any sort of policy or mechanism for getting rid of this kind of backwards compatibility note. Maybe we should first try to agree on a policy for when it's okay to remove such info. regards, tom lane
Re: git clone failed in windows
Thanks All, it wotked with git://git.postgresql.org/git/postgresql.git Thanks Sridhar On Fri, Oct 23, 2020 at 7:02 PM Amit Kapila wrote: > On Fri, Oct 23, 2020 at 6:21 PM Dave Page wrote: > > > > On Fri, Oct 23, 2020 at 1:39 PM Amit Kapila > wrote: > >> > >> On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally > >> wrote: > >> > > >> > Am trying to clone postgresql git, getting error > >> > > >> > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git > >> > Cloning into 'postgresql'... > >> > remote: Enumerating objects: 806507, done. > >> > remote: Counting objects: 100% (806507/806507), done. > >> > remote: Compressing objects: 100% (122861/122861), done. > >> > error: RPC failed; curl 18 transfer closed with 3265264 bytes > remaining to read > >> > fatal: the remote end hung up unexpectedly > >> > fatal: early EOF > >> > fatal: index-pack failed > >> > > >> > >> I have also just tried this and it failed with same error. However, it > >> worked when I tried 'git clone > >> git://git.postgresql.org/git/postgresql.git'. I don't know what is the > >> issue. > > > > > > It worked for me with https. Can you try again? > > > > This time it worked but on the third try. > > -- > With Regards, > Amit Kapila. >
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Greetings, * Heikki Linnakangas (hlinn...@iki.fi) wrote: > On 06/10/2020 15:15, Ian Lawrence Barwick wrote: > >2020年10月6日(火) 21:13 Ian Lawrence Barwick : > >>The pg_dump doc page [1], under the -t/--table option, contains a Note > >>documenting the behavioural differences introduced in PostgreSQL 8.2. > >> > >>As it's been almost exactly 14 years since that note was added [2], I > >>suggest > >>it can be removed entirely. > >> > >>[1] https://www.postgresql.org/docs/current/app-pgdump.html > >>[2] > >>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/ref/pg_dump.sgml;h=9aa4baf84e74817a3c3e8359b2c4c8a847fda987;hp=deafd7c9a989c2cbce3979d94416a298609f5e84;hb=24e97528631e7e810ce61fc0f5fbcaca0c001c4c;hpb=77d2b1b625c7decd7a25ec865bced3b927de6d4b > > > > > >Oh yes, I was planning to attach an ultra-trivial patch for that too. > > Applied, thanks. Isn't this a bit pre-mature as we still support running pg_dump against 8.0 clusters..? Removing support for older clusters is certainly something we can discuss but I don't know that it makes sense to just piecemeal pull things out. I get that this was just a documentation note, but, still, we do support pg_dump run against 8.0 and 8.1 clusters, at least today. Thanks, Stephen signature.asc Description: PGP signature
Re: git clone failed in windows
On Fri, Oct 23, 2020 at 6:21 PM Dave Page wrote: > > On Fri, Oct 23, 2020 at 1:39 PM Amit Kapila wrote: >> >> On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally >> wrote: >> > >> > Am trying to clone postgresql git, getting error >> > >> > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git >> > Cloning into 'postgresql'... >> > remote: Enumerating objects: 806507, done. >> > remote: Counting objects: 100% (806507/806507), done. >> > remote: Compressing objects: 100% (122861/122861), done. >> > error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to >> > read >> > fatal: the remote end hung up unexpectedly >> > fatal: early EOF >> > fatal: index-pack failed >> > >> >> I have also just tried this and it failed with same error. However, it >> worked when I tried 'git clone >> git://git.postgresql.org/git/postgresql.git'. I don't know what is the >> issue. > > > It worked for me with https. Can you try again? > This time it worked but on the third try. -- With Regards, Amit Kapila.
Re: Parallel copy
On Fri, Oct 23, 2020 at 5:42 PM Ashutosh Sharma wrote: > > Hi Vignesh, > > Thanks for the updated patches. Here are some more comments that I can > find after reviewing your latest patches: > > +/* > + * This structure helps in storing the common data from CopyStateData that > are > + * required by the workers. This information will then be allocated and > stored > + * into the DSM for the worker to retrieve and copy it to CopyStateData. > + */ > +typedef struct SerializedParallelCopyState > +{ > + /* low-level state data */ > + CopyDestcopy_dest; /* type of copy source/destination */ > + int file_encoding; /* file or remote side's character encoding */ > + boolneed_transcoding; /* file encoding diff from server? */ > + boolencoding_embeds_ascii; /* ASCII can be non-first byte? */ > + > ... > ... > + > + /* Working state for COPY FROM */ > + AttrNumber num_defaults; > + Oid relid; > +} SerializedParallelCopyState; > > Can the above structure not be part of the CopyStateData structure? I > am just asking this question because all the fields present in the > above structure are also present in the CopyStateData structure. So, > including it in the CopyStateData structure will reduce the code > duplication and will also make CopyStateData a bit shorter. > > -- > > + pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist, > +relid); > > Do we need to pass cstate->nworkers and relid to BeginParallelCopy() > function when we are already passing cstate structure, using which > both of these information can be retrieved ? > > -- > > +/* DSM keys for parallel copy. */ > +#define PARALLEL_COPY_KEY_SHARED_INFO 1 > +#define PARALLEL_COPY_KEY_CSTATE 2 > +#define PARALLEL_COPY_WAL_USAGE3 > +#define PARALLEL_COPY_BUFFER_USAGE 4 > > DSM key names do not appear to be consistent. For shared info and > cstate structures, the key name is prefixed with "PARALLEL_COPY_KEY", > but for WalUsage and BufferUsage structures, it is prefixed with > "PARALLEL_COPY". I think it would be better to make them consistent. > > -- > > if (resultRelInfo->ri_TrigDesc != NULL && > (resultRelInfo->ri_TrigDesc->trig_insert_before_row || > resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) > { > /* > * Can't support multi-inserts when there are any BEFORE/INSTEAD OF > * triggers on the table. Such triggers might query the table we're > * inserting into and act differently if the tuples that have already > * been processed and prepared for insertion are not there. > */ > insertMethod = CIM_SINGLE; > } > else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL && > resultRelInfo->ri_TrigDesc->trig_insert_new_table) > { > /* > * For partitioned tables we can't support multi-inserts when there > * are any statement level insert triggers. It might be possible to > * allow partitioned tables with such triggers in the future, but for > * now, CopyMultiInsertInfoFlush expects that any before row insert > * and statement level insert triggers are on the same relation. > */ > insertMethod = CIM_SINGLE; > } > else if (resultRelInfo->ri_FdwRoutine != NULL || > cstate->volatile_defexprs) > { > ... > ... > > I think, if possible, all these if-else checks in CopyFrom() can be > moved to a single function which can probably be named as > IdentifyCopyInsertMethod() and this function can be called in > IsParallelCopyAllowed(). This will ensure that in case of Parallel > Copy when the leader has performed all these checks, the worker won't > do it again. I also feel that it will make the code look a bit > cleaner. > Just rewriting above comment to make it a bit more clear: I think, if possible, all these if-else checks in CopyFrom() should be moved to a separate function which can probably be named as IdentifyCopyInsertMethod() and this function called from IsParallelCopyAllowed() and CopyFrom() functions. It will only be called from CopyFrom() when IsParallelCopy() returns false. This will ensure that in case of Parallel Copy if the leader has performed all these checks, the worker won't do it again. I also feel that having a separate function containing all these checks will make the code look a bit cleaner. > -- > > +void > +ParallelCopyMain(dsm_segment *seg, shm_toc *toc) > +{ > ... > ... > + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], > + &walusage[ParallelWorkerNumber]); > + > + MemoryContextSwitchTo(oldcontext); > + pfree(cstate); > + return; > +} > > It seems like you also need to delete the memory context > (cstate->copycontext) here. > > -- > > +void > +ExecBeforeStmtTrigger(CopyState cstate) > +{ > + EState
Re: Enumize logical replication message actions
On Fri, 23 Oct 2020 at 18:23, Amit Kapila wrote: > On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi > wrote: > > > > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera < > alvhe...@alvh.no-ip.org> wrote in > > > On 2020-Oct-22, Ashutosh Bapat wrote: > > > > > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi < > horikyota@gmail.com> > > > > wrote: > > > > > > > > pg_send_logicalrep_msg_type() looks somewhat too-much. If we need > > > > > something like that we shouldn't do this refactoring, I think. > > > > > > > > Enum is an integer, and we want to send byte. The function asserts > that the > > > > enum fits a byte. If there's a way to declare byte long enums I > would use > > > > that. But I didn't find a way to do that. > > > > > > I didn't look at the code, but maybe it's sufficient to add a > > > StaticAssert? > > > > That check needs to visit all symbols in a enum and confirm that each > > of them is in a certain range. > > > > Can we define something like LOGICAL_REP_MSG_LAST (also add a comment > indicating this is a fake message and must be the last one) as the > last and just check that? > > I don't think that's required once I applied suggestions from Kyotaro and Peter. Please check the latest patch. Usually LAST is added to an enum when we need to cap the number of symbols or want to find the number of symbols. None of that is necessary here. Do you see any other use? -- Best Wishes, Ashutosh
Re: Enumize logical replication message actions
On Fri, 23 Oct 2020 at 17:02, Kyotaro Horiguchi wrote: > At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith > wrote in > > On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi > > wrote: > > > > > > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera < > alvhe...@alvh.no-ip.org> wrote in > > > > On 2020-Oct-22, Ashutosh Bapat wrote: > > > > > > > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi < > horikyota@gmail.com> > > > > > wrote: > > > > > > > > > > pg_send_logicalrep_msg_type() looks somewhat too-much. If we > need > > > > > > something like that we shouldn't do this refactoring, I think. > > > > > > > > > > Enum is an integer, and we want to send byte. The function asserts > that the > > > > > enum fits a byte. If there's a way to declare byte long enums I > would use > > > > > that. But I didn't find a way to do that. > > > > The pq_send_logicalrep_msg_type() function seemed a bit overkill to me. > > Ah, yes, it is what I meant. I didn't come up with the word "overkill". > > > The comment in the LogicalRepMsgType enum will sufficiently ensure > > nobody is going to accidentally add any bad replication message codes. > > And it's not like these are going to be changed often. > > Agreed. > > > Why not simply downcast your enums when calling pq_sendbyte? > > There are only a few of them. > > > > e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT); > > If you are worried about compiler warning, that explicit cast is not > required. Even if the symbol is larger than 0xff, the upper bytes are > silently truncated off. > > I agree with Peter that the prologue of LogicalRepMsgType is enough. I also agree with Kyotaro, that explicit cast is unnecessary. All this together makes the second patch useless. Removed it. Instead used Kyotaro's idea in previous mail. PFA updated patch. -- Best Wishes, Ashutosh From 76f578416503478bf5e39993eec4bbd0f17d5a17 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Fri, 16 Oct 2020 12:31:35 +0530 Subject: [PATCH] Enumize top level logical replication actions Logical replication protocol uses single byte character to identify different chunks of logical repliation messages. The code uses string literals for the same. Enumize those so that 1. All the string literals used can be found at a single place. This makes it easy to add more actions without the risk of conflicts. 2. It's easy to locate the code handling a given action. Ashutosh Bapat --- src/backend/replication/logical/proto.c | 26 +++ src/backend/replication/logical/worker.c | 87 src/include/replication/logicalproto.h | 25 +++ 3 files changed, 81 insertions(+), 57 deletions(-) diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c index eb19142b48..fdb31182d7 100644 --- a/src/backend/replication/logical/proto.c +++ b/src/backend/replication/logical/proto.c @@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in); void logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn) { - pq_sendbyte(out, 'B'); /* BEGIN */ + pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN); /* fixed fields */ pq_sendint64(out, txn->final_lsn); @@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn, { uint8 flags = 0; - pq_sendbyte(out, 'C'); /* sending COMMIT */ + pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT); /* send the flags field (unused for now) */ pq_sendbyte(out, flags); @@ -112,7 +112,7 @@ void logicalrep_write_origin(StringInfo out, const char *origin, XLogRecPtr origin_lsn) { - pq_sendbyte(out, 'O'); /* ORIGIN */ + pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN); /* fixed fields */ pq_sendint64(out, origin_lsn); @@ -141,7 +141,7 @@ void logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel, HeapTuple newtuple, bool binary) { - pq_sendbyte(out, 'I'); /* action INSERT */ + pq_sendbyte(out, LOGICAL_REP_MSG_INSERT); /* transaction ID (if not valid, we're not streaming) */ if (TransactionIdIsValid(xid)) @@ -185,7 +185,7 @@ void logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel, HeapTuple oldtuple, HeapTuple newtuple, bool binary) { - pq_sendbyte(out, 'U'); /* action UPDATE */ + pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE); Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || @@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel, rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX); - pq_sendbyte(out, 'D'); /* action DELETE */ + pq_sendbyte(out, LOGICAL_REP_MSG_DELETE); /* transaction ID (if not valid, we're not streaming) */ if (TransactionIdIsValid(xid)) @@ -317,7 +317,7 @@ logicalrep_write_truncate(StringInfo out, int i; uint8 flags = 0; - pq_sendbyte(out, 'T'); /* action TRUNCATE */ + pq_sendbyte(out, LOGICAL_RE
Re: Enumize logical replication message actions
On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi wrote: > > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera > wrote in > > On 2020-Oct-22, Ashutosh Bapat wrote: > > > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi > > > wrote: > > > > > > pg_send_logicalrep_msg_type() looks somewhat too-much. If we need > > > > something like that we shouldn't do this refactoring, I think. > > > > > > Enum is an integer, and we want to send byte. The function asserts that > > > the > > > enum fits a byte. If there's a way to declare byte long enums I would use > > > that. But I didn't find a way to do that. > > > > I didn't look at the code, but maybe it's sufficient to add a > > StaticAssert? > > That check needs to visit all symbols in a enum and confirm that each > of them is in a certain range. > Can we define something like LOGICAL_REP_MSG_LAST (also add a comment indicating this is a fake message and must be the last one) as the last and just check that? -- With Regards, Amit Kapila.
Re: git clone failed in windows
On Fri, Oct 23, 2020 at 1:39 PM Amit Kapila wrote: > On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally > wrote: > > > > Am trying to clone postgresql git, getting error > > > > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git > > Cloning into 'postgresql'... > > remote: Enumerating objects: 806507, done. > > remote: Counting objects: 100% (806507/806507), done. > > remote: Compressing objects: 100% (122861/122861), done. > > error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining > to read > > fatal: the remote end hung up unexpectedly > > fatal: early EOF > > fatal: index-pack failed > > > > I have also just tried this and it failed with same error. However, it > worked when I tried 'git clone > git://git.postgresql.org/git/postgresql.git'. I don't know what is the > issue. > It worked for me with https. Can you try again? It may be that the Varnish cache was doing it's meditation thing for some reason. I can't see anything obvious on the system though - nothing in the logs, and the services have all been up for days. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EDB: http://www.enterprisedb.com
Re: Enumize logical replication message actions
On Fri, 23 Oct 2020 at 06:50, Kyotaro Horiguchi wrote: > > Those two switch()es are apparently redundant. That code is exactly > equivalent to: > > apply_dispatch(s) > { > LogicalRepMsgType msgtype = pq_getmsgtype(s); > > switch (msgtype) > { > case LOGICAL_REP_MSG_BEGIN: > apply_handle_begin(); > ! return; > ... > case LOGICAL_REP_MSG_STREAM_COMMIT: > apply_handle_begin(); > ! return; > } > > ereport(ERROR, (errmsg("invalid logical replication message type".. > } > > which is smaller and fast. > Good idea. Implemented in the latest patch posted with the next mail. -- Best Wishes, Ashutosh
Re: git clone failed in windows
On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally wrote: > > Am trying to clone postgresql git, getting error > > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git > Cloning into 'postgresql'... > remote: Enumerating objects: 806507, done. > remote: Counting objects: 100% (806507/806507), done. > remote: Compressing objects: 100% (122861/122861), done. > error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to > read > fatal: the remote end hung up unexpectedly > fatal: early EOF > fatal: index-pack failed > I have also just tried this and it failed with same error. However, it worked when I tried 'git clone git://git.postgresql.org/git/postgresql.git'. I don't know what is the issue. -- With Regards, Amit Kapila.
Re: git clone failed in windows
Hi Sridhar! > 23 окт. 2020 г., в 16:09, Sridhar N Bamandlapally > написал(а): > > Am trying to clone postgresql git, getting error > > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git > Cloning into 'postgresql'... > remote: Enumerating objects: 806507, done. > remote: Counting objects: 100% (806507/806507), done. > remote: Compressing objects: 100% (122861/122861), done. > error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to > read > fatal: the remote end hung up unexpectedly > fatal: early EOF > fatal: index-pack failed It seems like your internet connection is not stable enough. As an alternative you can try to clone https://github.com/postgres/postgres It's synced with official repository you mentioned and allows you to have your fork for personal branches. Thanks! Best regards, Andrey Borodin.
Logical replication from HA cluster
Hi! I'm working on providing smooth failover to a CDC system in HA cluster. Currently, we do not replicate logical slots and when we promote a replica. This renders impossible continuation of changed data capture (CDC) from new primary after failover. We cannot start logical replication from LSN different from LSN of a slot. And cannot create a slot on LSN in the past, particularly before or right after promotion. This leads to massive waste of network bandwidth in our installations, due to necessity of initial table sync. We are considering to use the extension that creates replication slot with LSN in the past [0]. I understand that there might be some caveats with logical replication, but do not see scale of possible implications of this approach. User get error if WAL is rotated or waits if LSN is not reached yet, this seems perfectly fine for us. In most of our cases when CDC agent detects failover and goes to new primary there are plenty of old WALs to restart CDC. Are there strong reasons why we do not allow creation of slots with given LSNs, possibly within narrow LSN range (but wider that just GetXLogInsertRecPtr())? Thanks! Best regards, Andrey Borodin. [0] https://github.com/x4m/pg_tm_aux/blob/master/pg_tm_aux.c#L74-L77
Re: git clone failed in windows
git clone repository showing failed from Visual studio [image: git-clone-error.PNG] Please let me know is there any issue, Thanks Sridhar BN On Fri, Oct 23, 2020 at 4:39 PM Sridhar N Bamandlapally < sridhar@gmail.com> wrote: > Am trying to clone postgresql git, getting error > > D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git > Cloning into 'postgresql'... > remote: Enumerating objects: 806507, done. > remote: Counting objects: 100% (806507/806507), done. > remote: Compressing objects: 100% (122861/122861), done. > error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to > read > fatal: the remote end hung up unexpectedly > fatal: early EOF > fatal: index-pack failed > > Please let me know anything as am doing this for first time > > Thanks > Sridhar BN > > >
Re: Error in pg_restore (could not close data file: Success)
Maybe it would be better to commit this patches to mainstream, but I don't realy know. ср, 21 окт. 2020 г. в 09:20, Kyotaro Horiguchi : > At Wed, 21 Oct 2020 13:45:15 +0900 (JST), Kyotaro Horiguchi < > horikyota@gmail.com> wrote in > > > https://www.postgresql.org/message-id/flat/20200416.181945.759179589924840062.horikyota.ntt%40gmail.com#ed85c5fda64873c45811be4e3027a2ea > > > > Me> Hmm. Sounds reasonable. I'm going to do that. Thanks! > > > > But somehow that haven't happened, I'll come up with a new version. > > pg_restore shows the following error instead of "Success" for broken > compressed file. > > pg_restore: error: could not close data file "d/3149.dat": zlib error: > error reading or writing compressed file > > > 0001: > > cfclose() calls fatal() instead of returning the result to the callers > on error, which isobviously okay for all existing callers that are > handling errors from the function. Other callers ignored the returned > value but we should fatal() on error of the function. > > At least for me, gzerror doesn't return a message (specifically, > returns NULL) after gzclose failure so currently cfclose shows its own > messages for erros of gzclose(). Am I missing something? > > 0002: > > cfread has the same code with get_cfp_error() and cfgetc uses > sterror() after gzgetc(). It would be suitable for a separate patch, > but 0002 fixes those bugs. I changed _EndBlob() to show the cause of > an error. > > Did not do in this patch: > > We could do further small refactoring to remove temporary variables in > pg_backup_directory.c for _StartData(), InitArchiveFmt_Directory, > _LoadBlobs(), _StartBlobs() and _CloseArchive(), but I left them as is > for the ease of back-patching. > > Now that we have the file name in the context variable so we could > show the file name in all error messages, but that change was large > and there's a part where that change is a bit more complex so I didn't > do that. > > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center >
Re: Parallel copy
Hi Vignesh, Thanks for the updated patches. Here are some more comments that I can find after reviewing your latest patches: +/* + * This structure helps in storing the common data from CopyStateData that are + * required by the workers. This information will then be allocated and stored + * into the DSM for the worker to retrieve and copy it to CopyStateData. + */ +typedef struct SerializedParallelCopyState +{ + /* low-level state data */ + CopyDestcopy_dest; /* type of copy source/destination */ + int file_encoding; /* file or remote side's character encoding */ + boolneed_transcoding; /* file encoding diff from server? */ + boolencoding_embeds_ascii; /* ASCII can be non-first byte? */ + ... ... + + /* Working state for COPY FROM */ + AttrNumber num_defaults; + Oid relid; +} SerializedParallelCopyState; Can the above structure not be part of the CopyStateData structure? I am just asking this question because all the fields present in the above structure are also present in the CopyStateData structure. So, including it in the CopyStateData structure will reduce the code duplication and will also make CopyStateData a bit shorter. -- + pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist, +relid); Do we need to pass cstate->nworkers and relid to BeginParallelCopy() function when we are already passing cstate structure, using which both of these information can be retrieved ? -- +/* DSM keys for parallel copy. */ +#define PARALLEL_COPY_KEY_SHARED_INFO 1 +#define PARALLEL_COPY_KEY_CSTATE 2 +#define PARALLEL_COPY_WAL_USAGE3 +#define PARALLEL_COPY_BUFFER_USAGE 4 DSM key names do not appear to be consistent. For shared info and cstate structures, the key name is prefixed with "PARALLEL_COPY_KEY", but for WalUsage and BufferUsage structures, it is prefixed with "PARALLEL_COPY". I think it would be better to make them consistent. -- if (resultRelInfo->ri_TrigDesc != NULL && (resultRelInfo->ri_TrigDesc->trig_insert_before_row || resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) { /* * Can't support multi-inserts when there are any BEFORE/INSTEAD OF * triggers on the table. Such triggers might query the table we're * inserting into and act differently if the tuples that have already * been processed and prepared for insertion are not there. */ insertMethod = CIM_SINGLE; } else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL && resultRelInfo->ri_TrigDesc->trig_insert_new_table) { /* * For partitioned tables we can't support multi-inserts when there * are any statement level insert triggers. It might be possible to * allow partitioned tables with such triggers in the future, but for * now, CopyMultiInsertInfoFlush expects that any before row insert * and statement level insert triggers are on the same relation. */ insertMethod = CIM_SINGLE; } else if (resultRelInfo->ri_FdwRoutine != NULL || cstate->volatile_defexprs) { ... ... I think, if possible, all these if-else checks in CopyFrom() can be moved to a single function which can probably be named as IdentifyCopyInsertMethod() and this function can be called in IsParallelCopyAllowed(). This will ensure that in case of Parallel Copy when the leader has performed all these checks, the worker won't do it again. I also feel that it will make the code look a bit cleaner. -- +void +ParallelCopyMain(dsm_segment *seg, shm_toc *toc) +{ ... ... + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], + &walusage[ParallelWorkerNumber]); + + MemoryContextSwitchTo(oldcontext); + pfree(cstate); + return; +} It seems like you also need to delete the memory context (cstate->copycontext) here. -- +void +ExecBeforeStmtTrigger(CopyState cstate) +{ + EState *estate = CreateExecutorState(); + ResultRelInfo *resultRelInfo; This function has a lot of comments which have been copied as it is from the CopyFrom function, I think it would be good to remove those comments from here and mention that this code changes done in this function has been taken from the CopyFrom function. If any queries people may refer to the CopyFrom function. This will again avoid the unnecessary code in the patch. -- As Heikki rightly pointed out in his previous email, we need some high level description of how Parallel Copy works somewhere in copyparallel.c file. For reference, please see how a brief description about parallel vacuum has been added in the vacuumlazy.c file. * Lazy vacuum supports parallel execution with parallel worker processes. In * a parallel vacuum, we perform both index vacuum and index cleanup with * parallel worker processes.
Re: Enumize logical replication message actions
At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith wrote in > On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi > wrote: > > > > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera > > wrote in > > > On 2020-Oct-22, Ashutosh Bapat wrote: > > > > > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi > > > > > > > > wrote: > > > > > > > > pg_send_logicalrep_msg_type() looks somewhat too-much. If we need > > > > > something like that we shouldn't do this refactoring, I think. > > > > > > > > Enum is an integer, and we want to send byte. The function asserts that > > > > the > > > > enum fits a byte. If there's a way to declare byte long enums I would > > > > use > > > > that. But I didn't find a way to do that. > > The pq_send_logicalrep_msg_type() function seemed a bit overkill to me. Ah, yes, it is what I meant. I didn't come up with the word "overkill". > The comment in the LogicalRepMsgType enum will sufficiently ensure > nobody is going to accidentally add any bad replication message codes. > And it's not like these are going to be changed often. Agreed. > Why not simply downcast your enums when calling pq_sendbyte? > There are only a few of them. > > e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT); If you are worried about compiler warning, that explicit cast is not required. Even if the symbol is larger than 0xff, the upper bytes are silently truncated off. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Autovacuum on partitioned table (autoanalyze)
Thanks you for the new version. At Fri, 23 Oct 2020 15:12:51 +0900, yuzuko wrote in > Hello, > > I reconsidered a way based on the v5 patch in line with > Horiguchi-san's comment. > > This approach is as follows: > - A partitioned table is checked whether it needs analyze like a plain > table in relation_needs_vacanalyze(). To do this, we should store > partitioned table's stats (changes_since_analyze). > - Partitioned table's changes_since_analyze is updated when > analyze a leaf partition by propagating its changes_since_analyze. > In the next scheduled analyze time, it is used in the above process. > That is, the partitioned table is analyzed behind leaf partitions. > - The propagation process differs between autoanalyze or plain analyze. > In autoanalyze, a leaf partition's changes_since_analyze is propagated > to *all* ancestors. Whereas, in plain analyze on an inheritance tree, > propagates to ancestors not included the tree to avoid needless counting. > > Attach the latest patch to this email. > Could you check it again please? + /* +* Get its all ancestors to propagate changes_since_analyze count. +* However, when ANALYZE inheritance tree, we get ancestors of +* toprel_oid to avoid needless counting. +*/ + if (!OidIsValid(toprel_oid)) + ancestors = get_partition_ancestors(RelationGetRelid(rel)); + else + ancestors = get_partition_ancestors(toprel_oid); This comment doesn't explaining what the code intends but what the code does. The reason for the difference is that if we have a valid toprel_oid, we analyze all descendants of the relation this time, and if we propagate the number to the descendants of the top relation, the next analyze on the relations could happen shortly than expected. - msg.m_live_tuples = livetuples; - msg.m_dead_tuples = deadtuples; + msg.m_live_tuples = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + ? 0 /* if this is a partitioned table, skip modifying */ + : livetuples; + msg.m_dead_tuples = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + ? 0 /* if this is a partitioned table, skip modifying */ + : deadtuples; Two successive branching with the same condition looks odd. And we need an explanation of *why* we don't set the values for partitioned tables. + foreach(lc, ancestors) + { + Oid parentOid = lfirst_oid(lc); + Relation parentrel; + + parentrel = table_open(parentOid, AccessShareLock); I'm not sure, but all of the ancestors always cannot be a parent (in other words, a parent of a parent of mine is not a parent of mine). Isn't just rel sufficient? -* Report ANALYZE to the stats collector, too. However, if doing -* inherited stats we shouldn't report, because the stats collector only -* tracks per-table stats. Reset the changes_since_analyze counter only -* if we analyzed all columns; otherwise, there is still work for -* auto-analyze to do. +* Report ANALYZE to the stats collector, too. Reset the +* changes_since_analyze counter only if we analyzed all columns; +* otherwise, there is still work for auto-analyze to do. */ - if (!inh) + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) pgstat_report_analyze(onerel, totalrows, totaldeadrows, This still rejects traditional inheritance nonleaf relations. But if we remove the description about that completely in the comment above, we should support traditional inheritance parents here. I think we can do that as far as we need to propagate only per-tuple stats (that mans not per-attribute) like changes_since_analyze. Whichever way we take, do we need the description about the behavior in the documentation? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
git clone failed in windows
Am trying to clone postgresql git, getting error D:\sridhar>git clone https://git.postgresql.org/git/postgresql.git Cloning into 'postgresql'... remote: Enumerating objects: 806507, done. remote: Counting objects: 100% (806507/806507), done. remote: Compressing objects: 100% (122861/122861), done. error: RPC failed; curl 18 transfer closed with 3265264 bytes remaining to read fatal: the remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed Please let me know anything as am doing this for first time Thanks Sridhar BN
Re: [PATCH] Add extra statistics to explain for Nested Loop
wrote: You should update the explain_parallel_append() plpgsql function created in that test file to make sure that both "rows" and the two new counters are changed to "N". There might be other similar changes needed. Thank you for watching this issue. I made the necessary changes in tests following your advice. Now, for the changes themselves. For the min/max time, you're aggregating "totaltime - instr->firsttuple". Why removing the startup time from the loop execution time? I think this should be kept. I think it's right remark. I fixed it. Also, in explain.c you display the counters in the "Nested loop" node, but this should be done in the inner plan node instead, as this is the one being looped on. So the condition should probably be "nloops > 1" rather than testing if it's a NestLoop. Condition "nloops > 1" is not the same as checking if it's NestLoop. This condition will lead to printing extra statistics for nodes with different types of join, not only Nested Loop Join. If this statistic is useful for other plan nodes, it makes sense to change the condition. wrote: I'm a bit worried that further increasing the size of struct Instrumentation will increase the overhead of EXPLAIN ANALYZE further - in some workloads we spend a fair bit of time in code handling that. It would be good to try to find a few bad cases, and see what the overhead is. Thank you for this comment, I will try to figure it out. Do you have some examples of large overhead dependent on this struct? I think I need some sample to know which way to think. Thank you all for the feedback. I hope the new version of my patch will be more correct and useful. Please don't hesitate to share any thoughts on this topic! -- Ekaterina Sokolova Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyFrom: "Ekaterina Sokolova" diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 41317f18374..2132d82fe79 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1568,29 +1568,64 @@ ExplainNode(PlanState *planstate, List *ancestors, double startup_ms = 1000.0 * planstate->instrument->startup / nloops; double total_ms = 1000.0 * planstate->instrument->total / nloops; double rows = planstate->instrument->ntuples / nloops; + double min_r = planstate->instrument->min_tuples; + double max_r = planstate->instrument->max_tuples; + double min_t_ms = 1000.0 * planstate->instrument->min_t; + double max_t_ms = 1000.0 * planstate->instrument->max_t; if (es->format == EXPLAIN_FORMAT_TEXT) { - if (es->timing) -appendStringInfo(es->str, - " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)", - startup_ms, total_ms, rows, nloops); + if (nodeTag(plan) == T_NestLoop) { +if (es->timing) + appendStringInfo(es->str, + " (actual time=%.3f..%.3f min_time=%.3f max_time=%.3f min_rows=%.0f rows=%.0f max_rows=%.0f loops=%.0f)", + startup_ms, total_ms, min_t_ms, max_t_ms, min_r, rows, max_r, nloops); +else + appendStringInfo(es->str, + " (actual min_rows=%.0f rows=%.0f max_rows=%.0f loops=%.0f)", + min_r, rows, max_r, nloops); + } else -appendStringInfo(es->str, - " (actual rows=%.0f loops=%.0f)", - rows, nloops); + { +if (es->timing) + appendStringInfo(es->str, + " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)", + startup_ms, total_ms, rows, nloops); +else + appendStringInfo(es->str, + " (actual rows=%.0f loops=%.0f)", + rows, nloops); + } } else { - if (es->timing) + if (nodeTag(plan) == T_NestLoop) { +if (es->timing) { + ExplainPropertyFloat("Actual Startup Time", "s", startup_ms, + 3, es); + ExplainPropertyFloat("Actual Total Time", "s", total_ms, + 3, es); + ExplainPropertyFloat("Min Time", "s", min_t_ms, + 3, es); + ExplainPropertyFloat("Max Time", "s", max_t_ms, + 3, es); +} +ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); +ExplainPropertyFloat("Min Rows", NULL, rows, 0, es); +ExplainPropertyFloat("Max Rows", NULL, rows, 0, es); +ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); + } + else { -ExplainPropertyFloat("Actual Startup Time", "s", startup_ms, - 3, es); -ExplainPropertyFloat("Actual Total Time", "s", total_ms, - 3, es); +if (es->timing) { + ExplainPropertyFloat("Actual Startup Time", "s", startup_ms, + 3, es); + ExplainPropertyFloat("Actual Total Time", "s", total_ms, + 3, es); +} +ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); +ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); } - ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); - ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); } } else if (es->analyze) @@ -1599,6 +1634,7 @@ ExplainNode
Re: [HACKERS] logical decoding of two-phase transactions
On Fri, Oct 23, 2020 at 3:41 PM Ajin Cherian wrote: > > > Amit, > I have also modified the stream callback APIs to not include > stream_commit_prpeared and stream_rollback_prepared, instead use the > non-stream APIs for the same functionality. > I have also updated the test_decoding and pgoutput plugins accordingly. > Thanks, I think you forgot to address one of my comments in the previous email[1] (See "One minor comment .."). You have not even responded to it. [1] - https://www.postgresql.org/message-id/CAA4eK1JzRvUX2XLEKo2f74Vjecnt6wq-kkk1OiyMJ5XjJN%2BGvQ%40mail.gmail.com -- With Regards, Amit Kapila.
Re: [PATCH] Add section headings to index types doc
On 21.10.20 23:12, David G. Johnston wrote: On Wed, Sep 30, 2020 at 4:25 AM Dagfinn Ilmari Mannsåker mailto:ilm...@ilmari.org>> wrote: Michael Paquier mailto:mich...@paquier.xyz>> writes: > On Mon, Aug 10, 2020 at 12:52:17PM +, Jürgen Purtz wrote: >> The new status of this patch is: Waiting on Author > > This has not been answered yet, so I have marked the patch as returned > with feedback. Updated patch attached, wich reformats the operator lists as requested by Jürgen, A couple of things: One, I would place the equality operator for hash inside a standalone synopsis just like all of the others. ok Two, why is hash special in having its create index command provided here? (I notice this isn't the fault of this patch but it stands out when reviewing it) yes, this looks odd. I would suggest rewording hash to look more like the others ok and add a link to the "CREATE INDEX" command from the chapter preamble. is the link necessary? and skips the reindentation as suggested by Tom. Agreed David J. -- J. Purtz
Re: speed up unicode decomposition and recomposition
On Thu, Oct 22, 2020 at 10:11 PM Michael Paquier wrote: > On Thu, Oct 22, 2020 at 05:50:52AM -0400, John Naylor wrote: > > Looks good to me. > > Thanks. Committed, then. Great work! > Thank you! -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: partition routing layering in nodeModifyTable.c
On Fri, Oct 23, 2020 at 4:04 PM Heikki Linnakangas wrote: > On 22/10/2020 16:49, Amit Langote wrote: > > On Tue, Oct 20, 2020 at 9:57 PM Amit Langote > > wrote: > >> On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas wrote: > >>> It's probably true that there's no performance gain from initializing > >>> them more lazily. But the reasoning and logic around the initialization > >>> is complicated. After tracing through various path through the code, I'm > >>> convinced enough that it's correct, or at least these patches didn't > >>> break it, but I still think some sort of lazy initialization on first > >>> use would make it more readable. Or perhaps there's some other > >>> refactoring we could do. > >> > >> So the other patch I have mentioned is about lazy initialization of > >> the ResultRelInfo itself, not the individual fields, but maybe with > >> enough refactoring we can get the latter too. > > > > So, I tried implementing a lazy-initialization-on-first-access > > approach for both the ResultRelInfos themselves and some of the > > individual fields of ResultRelInfo that don't need to be set right > > away. You can see the end result in the attached 0003 patch. This > > slims down ExecInitModifyTable() significantly, both in terms of code > > footprint and the amount of work that it does. > > Have you done any performance testing? I'd like to know how much of a > difference this makes in practice. I have shown some numbers here: https://www.postgresql.org/message-id/ca+hiwqg7zrubmmih3wpsbz4s0h2ehywrnxeducky5hr3fwz...@mail.gmail.com To reiterate, if you apply the following patch: > Does this patch become moot if we do the "Overhaul UPDATE/DELETE > processing"? > (https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com)? ...and run this benchmark with plan_cache_mode = force_generic_plan pgbench -i -s 10 --partitions={0, 10, 100, 1000} pgbench -T120 -f test.sql -M prepared test.sql: \set aid random(1, 100) update pgbench_accounts set abalance = abalance + 1 where aid = :aid; you may see roughly the following results: HEAD: 0 tps = 13045.485121 (excluding connections establishing) 10 tps = 9358.157433 (excluding connections establishing) 100 tps = 1878.274500 (excluding connections establishing) 1000 tps = 84.684695 (excluding connections establishing) Patched (overhaul update/delete processing): 0 tps = 12743.487196 (excluding connections establishing) 10 tps = 12644.240748 (excluding connections establishing) 100 tps = 4158.123345 (excluding connections establishing) 1000 tps = 391.248067 (excluding connections establishing) And if you apply the patch being discussed here, TPS shoots up a bit, especially for higher partition counts: Patched (lazy-ResultRelInfo-initialization) 0 tps = 13419.283168 (excluding connections establishing) 10 tps = 12588.016095 (excluding connections establishing) 100 tps = 8560.824225 (excluding connections establishing) 1000 tps = 1926.553901 (excluding connections establishing) To explain these numbers a bit, "overheaul update/delete processing" patch improves the performance of that benchmark by allowing the updates to use run-time pruning when executing generic plans, which they can't today. However without "lazy-ResultRelInfo-initialization" patch, ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can be seen to be spending time initializing all of those result relations, whereas only one of those will actually be used. As mentioned further in that email, it's really the locking of all relations by AcquireExecutorLocks() that occurs even before we enter the executor that's a much thornier bottleneck for this benchmark. But the ResultRelInfo initialization bottleneck sounded like it could get alleviated in a relatively straightforward manner. The patches that were developed for attacking the locking bottleneck would require further reflection on whether they are correct. (Note: I've just copy pasted the numbers I reported in that email. To reproduce, I'll have to rebase the "overhaul update/delete processing" patch on this one, which I haven't yet done.) > Another alternative is to continue to create the ResultRelInfos in > ExecInitModify(), but initialize the individual fields in them lazily. If you consider the above, maybe you can see how that will not really eliminate the bottleneck I'm aiming to fix here. -- Amit Langote EDB: http://www.enterprisedb.com
Re: Online checksums verification in the backend
On Fri, Oct 23, 2020 at 04:31:56PM +0800, Julien Rouhaud wrote: > I agree. However I'm assuming that this refactor is relying on the > not yet committed patch (in the nearby thread dealing with base backup > checksums check) to also refactor PageIsVerified? As all the logic > you removed was done to avoid spamming a lot of warnings when calling > the function. Yeah, it should use a refactored version, but I was as well in the mood of looking at version based on what we have now on HEAD. Even if I am not completely clear where the patch for page verification and base backups will go, I was thinking as well to do the refactoring introducing PageIsVerifiedExtended() first, before considering the next steps for this thread. It seems to me that the path where we generate no WARNINGs at all makes the whole experience more consistent for the user with this function. > Mmm, is it really an improvement to report warnings during this > function execution? Note also that PageIsVerified as-is won't report > a warning if a page is found as PageIsNew() but isn't actually all > zero, while still being reported as corrupted by the SRF. Yep, joining the point of above to just have no WARNINGs at all. > Have you also considered that it's possible to execute > pg_relation_check_pages with ignore_checksum_failure = on? That's > evidently a bad idea, but doing so would report some of the data > corruption as warnings while still not reporting anything in the SRF. Yeah, I thought about that as well, but I did not see a strong argument against preventing this behavior either, even if it sounds a bit strange. We could always tune that later depending on the feedback. -- Michael signature.asc Description: PGP signature
Re: dynamic result sets support in extended query protocol
On 2020-10-20 12:24, Dave Cramer wrote: Finally, we could do it an a best-effort basis. We use binary format for registered types, until there is some invalidation event for the type, at which point we revert to default/text format until the end of a session (or until another protocol message arrives re-registering the type). Does the driver tell the server what registered types it wants in binary ? Yes, the driver tells the server, "whenever you send these types, send them in binary" (all other types keep sending in text). This should work, because the result row descriptor contains the actual format type, and there is no guarantee that it's the same one that was requested. So how about that last option? I imagine a new protocol message, say, TypeFormats, that contains a number of type/format pairs. The message would typically be sent right after the first ReadyForQuery, gets no response. This seems a bit hard to control. How long do you wait for no response? In this design, you don't need a response. It could also be sent at any other time, but I expect that to be less used in practice. Binary format is used for registered types if they have binary format support functions, otherwise text continues to be used. There is no error response for types without binary support. (There should probably be an error response for registering a type that does not exist.) I'm not sure we (pgjdbc) want all types with binary support functions sent automatically. Turns out that decoding binary is sometimes slower than decoding the text and the on wire overhead isn't significant. Timestamps/dates with timezone are also interesting as the binary output does not include the timezone. In this design, you pick the types you want. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
2020年10月23日(金) 17:52 Heikki Linnakangas : > > On 06/10/2020 15:15, Ian Lawrence Barwick wrote: > > 2020年10月6日(火) 21:13 Ian Lawrence Barwick : > >> > >> Hi > >> > >> The pg_dump doc page [1], under the -t/--table option, contains a Note > >> documenting the behavioural differences introduced in PostgreSQL 8.2. > >> > >> As it's been almost exactly 14 years since that note was added [2], I > >> suggest > >> it can be removed entirely. > >> > >> [1] https://www.postgresql.org/docs/current/app-pgdump.html > >> [2] > >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/ref/pg_dump.sgml;h=9aa4baf84e74817a3c3e8359b2c4c8a847fda987;hp=deafd7c9a989c2cbce3979d94416a298609f5e84;hb=24e97528631e7e810ce61fc0f5fbcaca0c001c4c;hpb=77d2b1b625c7decd7a25ec865bced3b927de6d4b > > > > > > Oh yes, I was planning to attach an ultra-trivial patch for that too. > > Applied, thanks. > > - Heikki Thanks! Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
On 06/10/2020 15:15, Ian Lawrence Barwick wrote: 2020年10月6日(火) 21:13 Ian Lawrence Barwick : Hi The pg_dump doc page [1], under the -t/--table option, contains a Note documenting the behavioural differences introduced in PostgreSQL 8.2. As it's been almost exactly 14 years since that note was added [2], I suggest it can be removed entirely. [1] https://www.postgresql.org/docs/current/app-pgdump.html [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/ref/pg_dump.sgml;h=9aa4baf84e74817a3c3e8359b2c4c8a847fda987;hp=deafd7c9a989c2cbce3979d94416a298609f5e84;hb=24e97528631e7e810ce61fc0f5fbcaca0c001c4c;hpb=77d2b1b625c7decd7a25ec865bced3b927de6d4b Oh yes, I was planning to attach an ultra-trivial patch for that too. Applied, thanks. - Heikki
Re: Enumize logical replication message actions
On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi wrote: > > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera > wrote in > > On 2020-Oct-22, Ashutosh Bapat wrote: > > > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi > > > wrote: > > > > > > pg_send_logicalrep_msg_type() looks somewhat too-much. If we need > > > > something like that we shouldn't do this refactoring, I think. > > > > > > Enum is an integer, and we want to send byte. The function asserts that > > > the > > > enum fits a byte. If there's a way to declare byte long enums I would use > > > that. But I didn't find a way to do that. The pq_send_logicalrep_msg_type() function seemed a bit overkill to me. The comment in the LogicalRepMsgType enum will sufficiently ensure nobody is going to accidentally add any bad replication message codes. And it's not like these are going to be changed often. Why not simply downcast your enums when calling pq_sendbyte? There are only a few of them. e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT); Kind Regards. Peter Smith Fujitsu Australia.
Re: Parallel copy
I had a brief look at at this patch. Important work! A couple of first impressions: 1. The split between patches 0002-Framework-for-leader-worker-in-parallel-copy.patch and 0003-Allow-copy-from-command-to-process-data-from-file.patch is quite artificial. All the stuff introduced in the first is unused until the second patch is applied. The first patch introduces a forward declaration for ParallelCopyData(), but the function only comes in the second patch. The comments in the first patch talk about LINE_LEADER_POPULATING and LINE_LEADER_POPULATED, but the enum only comes in the second patch. I think these have to merged into one. If you want to split it somehow, I'd suggest having a separate patch just to move CopyStateData from copy.c to copy.h. The subsequent patch would then be easier to read as you could see more easily what's being added to CopyStateData. Actually I think it would be better to have a new header file, copy_internal.h, to hold CopyStateData and the other structs, and keep copy.h as it is. 2. This desperately needs some kind of a high-level overview of how it works. What is a leader, what is a worker? Which process does each step of COPY processing, like reading from the file/socket, splitting the input into lines, handling escapes, calling input functions, and updating the heap and indexes? What data structures are used for the communication? How does is the work synchronized between the processes? There are comments on those individual aspects scattered in the patch, but if you're not already familiar with it, you don't know where to start. There's some of that in the commit message, but it needs to be somewhere in the source code, maybe in a long comment at the top of copyparallel.c. 3. I'm surprised there's a separate ParallelCopyLineBoundary struct for every input line. Doesn't that incur a lot of synchronization overhead? I haven't done any testing, this is just my gut feeling, but I assumed you'd work in batches of, say, 100 or 1000 lines each. - Heikki
Re: Online checksums verification in the backend
On Fri, Oct 23, 2020 at 3:28 PM Michael Paquier wrote: > > On Mon, Oct 19, 2020 at 04:52:48PM +0900, Michael Paquier wrote: > > No issues with reporting the block number and the fork type in the SRF > > at least of course as this is helpful to detect the position of the > > broken blocks. For the checksum found in the header and the one > > calculated (named expected in the patch), I am less sure how to put a > > clear definition on top of that but we could always consider that > > later and extend the SRF as needed. Once the user knows that both do > > not match, he/she knows that there is a problem. > > So, I have reviewed your patch set, and heavily reworked the logic to > be more consistent on many thinks, resulting in a largely simplified > patch without sacrificing its usefulness: Thanks! > - Removal of the dependency with checksums for this feature. While > simplifying the code, I have noticed that this feature can also be > beneficial for clusters that do not have have data checksums, as > PageIsVerified() is perfectly able to run some page header checks and > the zero-page case. That's of course less useful than having the > checksums, but there is no need to add a dependency here. The file > for the SQL functions is renamed from checksumfuncs.c to pagefuncs.c. I agree. However I'm assuming that this refactor is relying on the not yet committed patch (in the nearby thread dealing with base backup checksums check) to also refactor PageIsVerified? As all the logic you removed was done to avoid spamming a lot of warnings when calling the function. > - The function is changed to return no tuples if the relkind is not > supported, and the same applies for temporary relations. That's more > consistent with other system functions like the ones in charge of > partition information, and this makes full scans of pg_class much > easier to work with. Temporary tables were not handled correctly > anyway as these are in local buffers, but the use case of this > function in this case is not really obvious to me. Agreed > - Having the forknum in the SRF is kind of confusing, as the user > would need to map a number with the physical on-disk name. Instead, I > have made the choice to return the *path* of the corrupted file with a > block number. This way, an operator can know immediately where a > problem comes from. The patch does not append the segment number, and > I am not sure if we should actually do that, but adding it is > straight-forward as we have the block number. There is a dependency > with table AMs here as well, as this goes down to fd.c, explaining why > I have not added it and just. That's a clear improvement, thanks! > - I really don't know what kind of default ACL should apply for such > functions, but I am sure that SCAN_TABLES is not what we are looking > for here, and there is nothing preventing us from having a safe > default from the start of times, so I moved the function to be > superuser-only by default, and GRANT can be used to allow its > execution to other roles. We could relax that in the future, of > course, this can be discussed separately. I don't have a strong opinion here, SCAN_TABLES was maybe not ideal. No objections. > - The WARNING report for each block found as corrupted gains an error > context, as a result of a switch to PageIsVerified(), giving a user > all the information needed in the logs on top of the result in the > SRF. That's useful as well if combined with CHECK_FOR_INTERRUPTS(), > and I got to wonder if we should have some progress report for this > stuff, though that's a separate discussion. Mmm, is it really an improvement to report warnings during this function execution? Note also that PageIsVerified as-is won't report a warning if a page is found as PageIsNew() but isn't actually all zero, while still being reported as corrupted by the SRF. Have you also considered that it's possible to execute pg_relation_check_pages with ignore_checksum_failure = on? That's evidently a bad idea, but doing so would report some of the data corruption as warnings while still not reporting anything in the SRF. Having some progress report would be nice to have, but +1 to have a separate discussion for that. > - The function is renamed to something less generic, > pg_relation_check_pages(), and I have reduced the number of functions > from two to one, where the user can specify the fork name with a new > option. The default of NULL means that all the forks of a relation > are checked. Ok. > - The TAP tests are rather bulky. I have moved all the non-corruption > test cases into a new SQL test file. That's useful for people willing > to do some basic sanity checks with a non-default table AM. At least > it provides a minimum coverage. Agreed
Re: warn_unused_results
On 2020-10-17 17:58, Tom Lane wrote: Peter Eisentraut writes: Forgetting to assign the return value of list APIs such as lappend() is a perennial favorite. The compiler can help point out such mistakes. GCC has an attribute warn_unused_results. Also C++ has standardized this under the name "nodiscard", and C has a proposal to do the same [0]. In my patch I call the symbol pg_nodiscard, so that perhaps in a distant future one only has to do s/pg_nodiscard/nodiscard/ or something similar. Also, the name is short enough that it doesn't mess up the formatting of function declarations too much. +1 in principle (I've not read the patch in detail); but I wonder what pgindent does with these added keywords. pgindent doesn't seem to want to change anything about the patched files as I had sent them. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Bizarre coding in recovery target GUC management
On 2020-10-12 18:00, Tom Lane wrote: Would someone explain to me why assign_recovery_target_lsn and related GUC assign hooks throw errors, rather than doing so in the associated check hooks? An assign hook is not supposed to throw an error. Full stop, no exceptions. We wouldn't bother to separate those hooks otherwise. That code is checking whether more than one recovery target GUC has been set. I don't think the check hook sees the right state to be able to check that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: "unix_socket_directories" should be GUC_LIST_INPUT?
On Fri, Oct 23, 2020 at 12:49:57AM -0400, Tom Lane wrote: > Although ... just to argue against myself for a moment, how likely > is it that pg_dump is going to be faced with the need to dump a > value for unix_socket_directories? I am trying to think about some scenarios here, but honestly I cannot.. > So maybe we could get away with just changing it. It'd be good to > verify though that this doesn't break existing string values for > the variable, assuming they contain no unlikely characters that'd > need quoting. Yeah, that's the kind of things I wanted to check anyway before considering doing the switch. -- Michael signature.asc Description: PGP signature
Re: Implementing Incremental View Maintenance
Hi Adam, On Thu, 22 Oct 2020 10:07:29 -0400 Adam Brusselback wrote: > Hey there Yugo, > I've asked a coworker to prepare a self contained example that encapsulates > our multiple use cases. Thank you very much! > The immediate/eager approach is exactly what we need, as within the same > transaction we have statements that can cause one of those "materialized > tables" to be updated, and then sometimes have the need to query that > "materialized table" in a subsequent statement and need to see the changes > reflected. The proposed patch provides the exact this feature and I think this will meet your needs. > As soon as my coworker gets that example built up I'll send a followup with > it attached. Great! We are looking forward to it. Regards, Yugo Nagata -- Yugo NAGATA
Re: Hash support for row types
On 2020-10-20 17:10, Peter Eisentraut wrote: On 2020-10-20 01:32, Andres Freund wrote: How does this deal with row types with a field that doesn't have a hash function? Erroring out at runtime could cause queries that used to succeed, e.g. because all fields have btree ops, to fail, if we just have a generic unconditionally present hash opclass? Is that an OK "regression"? Good point. There is actually code in the type cache that is supposed to handle that, so I'll need to adjust that. Here is an updated patch with the type cache integration added. To your point, this now checks each fields hashability before considering a row type as hashable. It can still have run-time errors for untyped record datums, but that's not something we can do anything about. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From eda6f7182cfe8ac7317d6874317ace24c7c7d5f6 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 23 Oct 2020 09:33:41 +0200 Subject: [PATCH v2] Hash support for row types Add hash functions for the record type as well as a hash operator family and operator class for the record type. This enables all the hash functionality for the record type such as UNION DISTINCT, hash joins, and hash partitioning. Discussion: https://www.postgresql.org/message-id/flat/38eccd35-4e2d-6767-1b3c-dada1eac3124%402ndquadrant.com --- doc/src/sgml/queries.sgml | 9 - src/backend/utils/adt/rowtypes.c| 249 src/backend/utils/cache/lsyscache.c | 7 +- src/backend/utils/cache/typcache.c | 78 +--- src/include/catalog/pg_amop.dat | 5 + src/include/catalog/pg_amproc.dat | 4 + src/include/catalog/pg_opclass.dat | 2 + src/include/catalog/pg_operator.dat | 2 +- src/include/catalog/pg_opfamily.dat | 2 + src/include/catalog/pg_proc.dat | 7 + src/test/regress/expected/hash_func.out | 12 ++ src/test/regress/expected/join.out | 1 + src/test/regress/expected/with.out | 38 src/test/regress/sql/hash_func.sql | 9 + src/test/regress/sql/join.sql | 1 + src/test/regress/sql/with.sql | 10 + 16 files changed, 402 insertions(+), 34 deletions(-) diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index f06afe2c3f..8e70e57b72 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -2182,15 +2182,6 @@ Search Order - - - The queries shown in this and the following section involving - ROW constructors in the target list only support - UNION ALL (not plain UNION) in the - current implementation. - - - Omit the ROW() syntax in the common case where only one diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c index 674cf0a55d..28bbce128f 100644 --- a/src/backend/utils/adt/rowtypes.c +++ b/src/backend/utils/adt/rowtypes.c @@ -19,6 +19,7 @@ #include "access/detoast.h" #include "access/htup_details.h" #include "catalog/pg_type.h" +#include "common/hashfn.h" #include "funcapi.h" #include "libpq/pqformat.h" #include "miscadmin.h" @@ -1766,3 +1767,251 @@ btrecordimagecmp(PG_FUNCTION_ARGS) { PG_RETURN_INT32(record_image_cmp(fcinfo)); } + + +/* + * Row type hash functions + */ + +Datum +hash_record(PG_FUNCTION_ARGS) +{ + HeapTupleHeader record = PG_GETARG_HEAPTUPLEHEADER(0); + uint32 result = 0; + Oid tupType; + int32 tupTypmod; + TupleDesc tupdesc; + HeapTupleData tuple; + int ncolumns; + RecordCompareData *my_extra; + Datum *values; + bool *nulls; + + check_stack_depth();/* recurses for record-type columns */ + + /* Extract type info from tuple */ + tupType = HeapTupleHeaderGetTypeId(record); + tupTypmod = HeapTupleHeaderGetTypMod(record); + tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod); + ncolumns = tupdesc->natts; + + /* Build temporary HeapTuple control structure */ + tuple.t_len = HeapTupleHeaderGetDatumLength(record); + ItemPointerSetInvalid(&(tuple.t_self)); + tuple.t_tableOid = InvalidOid; + tuple.t_data = record; + + /* +* We arrange to look up the needed hashing info just once per series +* of calls, assuming the record type doesn't change underneath us. +*/ + my_extra = (RecordCompareData *) fcinfo->flinfo->fn_extra; + if (my_extra == NULL || + my_extra->ncolumns < ncolumns) + { + fcinfo->flinfo->fn_extra = + MemoryContextAlloc(fcinfo->flinfo->fn_mcxt, + offsetof(RecordCompareData, columns) + + ncolumns * sizeof(ColumnCo
Re: Tab complete for alter table rls
On Fri, Oct 23, 2020 at 05:22:57AM +, Li Japin wrote: > Sorry, I forgot add the subject. No worries. Good catch. I'll try to test that and apply it later, but by reading the code it looks like you got that right. -- Michael signature.asc Description: PGP signature
Re: Online checksums verification in the backend
On Mon, Oct 19, 2020 at 04:52:48PM +0900, Michael Paquier wrote: > No issues with reporting the block number and the fork type in the SRF > at least of course as this is helpful to detect the position of the > broken blocks. For the checksum found in the header and the one > calculated (named expected in the patch), I am less sure how to put a > clear definition on top of that but we could always consider that > later and extend the SRF as needed. Once the user knows that both do > not match, he/she knows that there is a problem. So, I have reviewed your patch set, and heavily reworked the logic to be more consistent on many thinks, resulting in a largely simplified patch without sacrificing its usefulness: - The logic of the core routine of bufmgr.c is unchanged. I have simplified it a bit though by merging the subroutines that were part of the patch. SMgrRelation is used as argument instead of a Relation. That's more consistent with the surroundings. The initial read of a page without locks is still on the table as an extra optimization, though I am not completely sure if this should be part of CheckBuffer() or not. I also thought about the previous benchmarks and I think that not using the most-optimized improved performance, because it reduced the successive runes of the SQL functions, reducing the back-pressure on the partition locks (we held on of them at the same time for a longer period, meaning that the other 127 ran free for a longer time). Please note that this part still referred to a "module", which was incorrect. - Removal of the expected and found checksums from the SRF of the function. Based on my recent business with the page checks for base backups, I have arrived at the conclusion that the SRF should return data that we can absolutely trust, and the minimum I think we have to trust here is if a given page is thought as safe or not, considering all the sanity checks done by PageIsVerified() as the main entry point for everything. This has led to a bit of confusion with the addition of NoComputedChecksum for a page that was empty as of the initial of the patch, so it happens that we don't need it anymore. - Removal of the dependency with checksums for this feature. While simplifying the code, I have noticed that this feature can also be beneficial for clusters that do not have have data checksums, as PageIsVerified() is perfectly able to run some page header checks and the zero-page case. That's of course less useful than having the checksums, but there is no need to add a dependency here. The file for the SQL functions is renamed from checksumfuncs.c to pagefuncs.c. - The function is changed to return no tuples if the relkind is not supported, and the same applies for temporary relations. That's more consistent with other system functions like the ones in charge of partition information, and this makes full scans of pg_class much easier to work with. Temporary tables were not handled correctly anyway as these are in local buffers, but the use case of this function in this case is not really obvious to me. - Having the forknum in the SRF is kind of confusing, as the user would need to map a number with the physical on-disk name. Instead, I have made the choice to return the *path* of the corrupted file with a block number. This way, an operator can know immediately where a problem comes from. The patch does not append the segment number, and I am not sure if we should actually do that, but adding it is straight-forward as we have the block number. There is a dependency with table AMs here as well, as this goes down to fd.c, explaining why I have not added it and just. - I really don't know what kind of default ACL should apply for such functions, but I am sure that SCAN_TABLES is not what we are looking for here, and there is nothing preventing us from having a safe default from the start of times, so I moved the function to be superuser-only by default, and GRANT can be used to allow its execution to other roles. We could relax that in the future, of course, this can be discussed separately. - The WARNING report for each block found as corrupted gains an error context, as a result of a switch to PageIsVerified(), giving a user all the information needed in the logs on top of the result in the SRF. That's useful as well if combined with CHECK_FOR_INTERRUPTS(), and I got to wonder if we should have some progress report for this stuff, though that's a separate discussion. - The function is renamed to something less generic, pg_relation_check_pages(), and I have reduced the number of functions from two to one, where the user can specify the fork name with a new option. The default of NULL means that all the forks of a relation are checked. - The TAP tests are rather bulky. I have moved all the non-corruption test cases into a new SQL test file. That's useful for people willing to do some basic sanity checks with a non-default table AM. At least it provides a minimum
Re: partition routing layering in nodeModifyTable.c
On 22/10/2020 16:49, Amit Langote wrote: On Tue, Oct 20, 2020 at 9:57 PM Amit Langote wrote: On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas wrote: It's probably true that there's no performance gain from initializing them more lazily. But the reasoning and logic around the initialization is complicated. After tracing through various path through the code, I'm convinced enough that it's correct, or at least these patches didn't break it, but I still think some sort of lazy initialization on first use would make it more readable. Or perhaps there's some other refactoring we could do. So the other patch I have mentioned is about lazy initialization of the ResultRelInfo itself, not the individual fields, but maybe with enough refactoring we can get the latter too. So, I tried implementing a lazy-initialization-on-first-access approach for both the ResultRelInfos themselves and some of the individual fields of ResultRelInfo that don't need to be set right away. You can see the end result in the attached 0003 patch. This slims down ExecInitModifyTable() significantly, both in terms of code footprint and the amount of work that it does. Have you done any performance testing? I'd like to know how much of a difference this makes in practice. Another alternative is to continue to create the ResultRelInfos in ExecInitModify(), but initialize the individual fields in them lazily. Does this patch become moot if we do the "Overhaul UPDATE/DELETE processing"? (https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com)? - Heikki