Re: [HACKERS] Setting pd_lower in GIN metapage
On Wed, Sep 20, 2017 at 9:19 AM, Amit Kapilawrote: > On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier > wrote: >> On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane wrote: >>> Amit Kapila writes: On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier wrote: > I am not saying that no index AMs take advantage FPW compressibility > for their meta pages. There are cases like this one, as well as one > code path in BRIN where this is useful, and it is useful as well when > logging FPWs of the init forks for unlogged relations. >>> Hmm, why is it useful for logging FPWs of the init forks for unlogged relations? We don't use REGBUF_STANDARD in those cases. >>> >>> But if we started to do so, that would be a concrete benefit of this >>> patch ... >> >> In the proposed set of patches, all the empty() routines part of index >> AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the >> right thing by updating log_newpage_buffer(). btree also should have >> its call to log_newpage updated in btbuildempty(), and your patch is >> missing that. >> > > We can add that for btree patch. > Added and updated the comments for both btree and hash index patches. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com change_metapage_usage_btree-v2.patch Description: Binary data change_metapage_usage_hash-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM in the PG 10 release notes
On 9/21/17 14:15, Jeff Janes wrote: > Here is a patch that expands the SCRAM documentation a bit, adds more > explanation how the different options are related, and sets some better > links. I think now you can get from the release notes to the relevant > documentation and have enough information on how to put the new features > into use. > > > > This looks good to me. Might suggest adding verifying the clients as a > specific step: committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in plugin for logical decoding output
2017-09-23 14:01 GMT-03:00 Alvaro Hernandez: > However, AFAIK, AWS's DMS uses it for production purposes (see > http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html). > It seems a bad idea. AFAICS test_decoding was not designed to be a ready-for-production plugin. It is just a proof of concept for logical decoding. > I would be happy to see another logical decoding plugin into core > starting on 11. However, this also poses a bit of a challenge for middleware > implementors: you need to support one for 9.4-9.5 (test_decoding), another > for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users > to install a binary plugin is very unsexy, so these are the options > available. > wal2json works for 9.4+ (besides the WAL messages I committed a month ago). Since this boat was already shipped we can arrange some packages for 9.4-10 (an external project) and ask vendors to support the backward-compatible plugin. The middleware implementor will have to support this new plugin format. Being JSON a widespread format, it is easier to refactor the code to parse JSON. > However, having said that, and while json is a great output format for > interoperability, if there's a discussion on which plugin to include next, > I'd also favor one that has some more compact representation format (or that > supports several formats, not only json). > We could certainly extend pgoutput to support more than one format (like pglogical did AFAIR), however, we wouldn't reuse code (different formats) and will have a fat plugin (I don't foresee a plugin using different formats in the same connection. It is difficult to coordinate a change like that having only one-way communication). -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
Andrew Dunstanwrites: > OK, I think I'm convinced. Here's is the WIP code I put together for the > blacklist. I'm was looking for a place to put the init call, but since > it's possibly not going anywhere I stopped :-) . My initial thought > about substransactions was that we should ignore them for this purpose > (That's why I used TopTransactionContext for the table). For the blacklist, I agree we could just ignore subtransactions: all subtransaction levels are equally uncommitted for this purpose, and leaving entries from failed subtransactions in place seems like a non-issue, since they'd never be referenced again. (Well, barring OID wraparound and an enum-value-OID collision while the transaction runs, but I think we can ignore that as having probability epsilon.) But you need to actually put the table in TopTransactionContext, not CurTransactionContext ;-). Also, I don't think you need an init call so much as an end-of-transaction cleanup call. Maybe call it AtEOXactEnum(), for consistency with other functions called in the same area. > w.r.t. table size - how large? I confess I haven't seen any systems with > more than a few hundred enum types. But even a million or two shouldn't > consume a huge amount of memory, should it? Dynahash tables are self-expanding, so I don't see a need to stress about that too much. Anything in 10-100 seems reasonable for initial size. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On 09/23/2017 03:52 PM, Tom Lane wrote: > Andrew Dunstanwrites: >> On 09/23/2017 02:00 PM, Tom Lane wrote: >>> So I'm back to not being sure about the path forward. Maybe it would be >>> all right to say "the value added by ADD VALUE can't be used in the same >>> transaction, period". That's still a step forward compared to the pre-v10 >>> prohibition on doing it at all. I don't remember if there were use-cases >>> where we really needed the exception for new-in-transaction types. >> Well, my idea was to have the test run like this: >> * is the value an old one? Test txnid of tuple. If yes it's ok >> * is the value one created by ALTER TYPE ADD VALUE? Test >> blacklist. If no, it's ok. >> * is the enum a new one? Test whitelist. If yes, it's ok. >> * anything else is not ok. > My point is that if you do 1 and 3, you don't need 2. Or if you do > 2 and 3, you don't need 1. But in most cases, testing the tuple > hint bits is cheap, so you don't really want that option. > > In any case, what I'm worried about is the amount of bookkeeping > overhead added by keeping a whitelist of enum-types-created-in- > current-transaction. That's less than trivial, especially since > you have to account correctly for subtransactions. And there are > common use-cases where that table will become large. > >> If we just did the blacklist and stuck with our current heuristic test >> for enum being created in the current transaction, we'd still probably >> avoid 99% of the problems, including specifically the one that gave rise >> to the bug report. > True. But I'm not sure whether the heuristic test is adding anything > meaningful if we use a blacklist first. The case where it could help > is > > begin; > create type t as enum(); > alter type t add value 'v'; > -- do something with 'v' > commit; > > That perhaps is worth something, but if somebody is trying to build a new > enum type in pieces like that, doesn't it seem fairly likely that they > might throw in an ALTER OWNER or GRANT as well? My feeling is that the > lesson we need to learn is that the heuristic test isn't good enough. > > OK, I think I'm convinced. Here's is the WIP code I put together for the blacklist. I'm was looking for a place to put the init call, but since it's possibly not going anywhere I stopped :-) . My initial thought about substransactions was that we should ignore them for this purpose (That's why I used TopTransactionContext for the table). I agree the heuristic test isn't good enough, and if we can get a 100% accurate test for the newness of the enum type then the blacklist would be redundant. w.r.t. table size - how large? I confess I haven't seen any systems with more than a few hundred enum types. But even a million or two shouldn't consume a huge amount of memory, should it? cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c index fe61d4d..52c1271 100644 --- a/src/backend/catalog/pg_enum.c +++ b/src/backend/catalog/pg_enum.c @@ -28,6 +28,8 @@ #include "utils/builtins.h" #include "utils/catcache.h" #include "utils/fmgroids.h" +#include "utils/hsearch.h" +#include "utils/memutils.h" #include "utils/syscache.h" #include "utils/tqual.h" @@ -38,6 +40,9 @@ Oid binary_upgrade_next_pg_enum_oid = InvalidOid; static void RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems); static int sort_order_cmp(const void *p1, const void *p2); +/* hash table of values added in current transaction by AddEnumLabel */ + +static HTAB *enum_blacklist = NULL; /* * EnumValuesCreate @@ -460,8 +465,44 @@ restart: heap_freetuple(enum_tup); heap_close(pg_enum, RowExclusiveLock); + + /* set up blacklist hash if required */ + if (enum_blacklist == NULL) + { + HASHCTL hash_ctl; + memset(_ctl, 0, sizeof(hash_ctl)); + hash_ctl.keysize = sizeof(Oid); + hash_ctl.entrysize = sizeof(Oid); + hash_ctl.hcxt = CurTransactionContext; + enum_blacklist = hash_create("Enum blacklist for current transaction", + 32, + _ctl, + HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); + + } + + /* and add the new value to the blacklist */ + + (void) hash_search(enum_blacklist, , HASH_ENTER, NULL); } +bool +EnumBlacklisted(Oid enum_id) +{ + bool found; + + if (enum_blacklist == NULL) + return false; + + (void) hash_search(enum_blacklist, _id, HASH_FIND, ); + return found; +} + +void +InitEnumBlacklist(void) +{ + enum_blacklist = NULL; +} /* * RenameEnumLabel diff --git a/src/backend/utils/adt/enum.c b/src/backend/utils/adt/enum.c index 973397c..a7ba3d0 100644 --- a/src/backend/utils/adt/enum.c +++ b/src/backend/utils/adt/enum.c @@ -76,6 +76,10 @@ check_safe_enum_use(HeapTuple enumval_tup) TransactionIdDidCommit(xmin))
[HACKERS] Proposed fix for bug #14826 (the invalid empty array business)
I poked around among the callers of construct_[md_]array() and found at least two more besides ts_lexize() that could build zero-element arrays; one of very ancient standing is in the pg_prepared_statements view. So I think we've got a clear hazard here that justifies changing the behavior of the low-level array functions, rather than expecting every call site to be on its guard about empty arrays. Accordingly, I propose the attached patch which makes construct_md_array responsible for getting this right. In principle we could now remove code from the places that do take care to call construct_empty_array instead. But I found only one place where it really seemed worth removing anything, in ExecEvalArrayExpr. I'm a little bit scared about back-patching this, as it seems at least possible that some external code could be broken by the change in what construct_[md_]array could hand back. Given that some of these bugs have been in place for ~ten years and nobody noticed till now, seems like it might be good enough to fix them in HEAD only. Comments? regards, tom lane diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index bd8a15d..09abd46 100644 *** a/src/backend/executor/execExprInterp.c --- b/src/backend/executor/execExprInterp.c *** ExecEvalArrayExpr(ExprState *state, Expr *** 2131,2144 Datum *dvalues = op->d.arrayexpr.elemvalues; bool *dnulls = op->d.arrayexpr.elemnulls; - /* Shouldn't happen here, but if length is 0, return empty array */ - if (nelems == 0) - { - *op->resvalue = - PointerGetDatum(construct_empty_array(element_type)); - return; - } - /* setup for 1-D array of the given length */ ndims = 1; dims[0] = nelems; --- 2131,2136 diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index e4101c9..d1f2fe7 100644 *** a/src/backend/utils/adt/arrayfuncs.c --- b/src/backend/utils/adt/arrayfuncs.c *** array_map(FunctionCallInfo fcinfo, Oid r *** 3297,3302 --- 3297,3303 * * A palloc'd 1-D array object is constructed and returned. Note that * elem values will be copied into the object even if pass-by-ref type. + * Also note the result will be 0-D not 1-D if nelems = 0. * * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info * from the system catalogs, given the elmtype. However, the caller is *** construct_array(Datum *elems, int nelems *** 3331,3336 --- 3332,3338 * * A palloc'd ndims-D array object is constructed and returned. Note that * elem values will be copied into the object even if pass-by-ref type. + * Also note the result will be 0-D not ndims-D if any dims[i] = 0. * * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info * from the system catalogs, given the elmtype. However, the caller is *** construct_md_array(Datum *elems, *** 3362,3373 errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", ndims, MAXDIM))); - /* fast track for empty array */ - if (ndims == 0) - return construct_empty_array(elmtype); - nelems = ArrayGetNItems(ndims, dims); /* compute required space */ nbytes = 0; hasnulls = false; --- 3364,3375 errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", ndims, MAXDIM))); nelems = ArrayGetNItems(ndims, dims); + /* if ndims <= 0 or any dims[i] == 0, return empty array */ + if (nelems <= 0) + return construct_empty_array(elmtype); + /* compute required space */ nbytes = 0; hasnulls = false; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
Andrew Dunstanwrites: > On 09/23/2017 02:00 PM, Tom Lane wrote: >> So I'm back to not being sure about the path forward. Maybe it would be >> all right to say "the value added by ADD VALUE can't be used in the same >> transaction, period". That's still a step forward compared to the pre-v10 >> prohibition on doing it at all. I don't remember if there were use-cases >> where we really needed the exception for new-in-transaction types. > Well, my idea was to have the test run like this: > * is the value an old one? Test txnid of tuple. If yes it's ok > * is the value one created by ALTER TYPE ADD VALUE? Test > blacklist. If no, it's ok. > * is the enum a new one? Test whitelist. If yes, it's ok. > * anything else is not ok. My point is that if you do 1 and 3, you don't need 2. Or if you do 2 and 3, you don't need 1. But in most cases, testing the tuple hint bits is cheap, so you don't really want that option. In any case, what I'm worried about is the amount of bookkeeping overhead added by keeping a whitelist of enum-types-created-in- current-transaction. That's less than trivial, especially since you have to account correctly for subtransactions. And there are common use-cases where that table will become large. > If we just did the blacklist and stuck with our current heuristic test > for enum being created in the current transaction, we'd still probably > avoid 99% of the problems, including specifically the one that gave rise > to the bug report. True. But I'm not sure whether the heuristic test is adding anything meaningful if we use a blacklist first. The case where it could help is begin; create type t as enum(); alter type t add value 'v'; -- do something with 'v' commit; That perhaps is worth something, but if somebody is trying to build a new enum type in pieces like that, doesn't it seem fairly likely that they might throw in an ALTER OWNER or GRANT as well? My feeling is that the lesson we need to learn is that the heuristic test isn't good enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variable substitution in psql backtick expansion
Hello Gerdan, This is an internal address that should not be exposed: postgresql@coelho.net I'm unsure why it gets substituted by the "commitfest application"... When i try apply this patch he failed with a following messenger: It just worked for me on head with git checkout -b test git apply ~/psql-server-version-1.patch My guess is that your mailer or navigator mangles the file contents because its mime type is "text/x-diff" and that it considers that it is okay to change NL to CR or CRNL... which is a BAD IDEA (tm). I re-attached the file compressed so that it uses another mime-type and should not change its contents. -- Fabien. psql-server-version-1.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On 09/23/2017 02:00 PM, Tom Lane wrote: > I wrote: >> Andrew Dunstanwrites: >>> I see what you're saying, but my idea was slightly different. We would >>> only add to the hashtable I had in mind at the bottom AddEnumLabel(). >>> Any other value, whether added in the current transaction or not, should >>> be safe, AIUI. >> Oh, I see: a list of enum values we need to blacklist, not whitelist. >> Yes, that's a significantly better idea than mine, because in common >> use-cases that would be empty or have a very small number of entries. > Oh, wait a minute ... that's not where the problem is, really. We can > already tell reliably whether an enum value was created in the current > transaction: the is-it-committed check in check_safe_enum_use is > perfectly safe for that AFAICS. The hard part of this is the part about > "was the enum type created in the current transaction?". We could make > that reliable with the table I proposed of enum types created in the > current transaction, but the possible performance drag is a concern. > > What I understand you to be proposing is to blacklist individual > enum values added by ALTER ADD VALUE, but *not* values created > aboriginally by CREATE TYPE AS ENUM. (The latter are surely safe, > because the type must disappear if they do.) However, that would > require dropping the second part of the current documentation promise: > >If ALTER TYPE ... ADD VALUE (the form that adds a new value to >an enum type) is executed inside a transaction block, the new value cannot >be used until after the transaction has been committed, except in the case >that the enum type itself was created earlier in the same transaction. > > We'd have to just say "it can't be used till the transaction has been > committed", full stop. Otherwise we're right back into the problem of > needing to detect whether the enum type is new in transaction. > >>> Maybe we should also keep a cache of whitelisted enums >>> created in the current transaction. >> What for? > I now realize that what you probably meant here was to track enum *types*, > not values, created in the current transaction. But if we're doing that > then we don't really need the per-enum-value-blacklist part of it. > > So I'm back to not being sure about the path forward. Maybe it would be > all right to say "the value added by ADD VALUE can't be used in the same > transaction, period". That's still a step forward compared to the pre-v10 > prohibition on doing it at all. I don't remember if there were use-cases > where we really needed the exception for new-in-transaction types. > > Well, my idea was to have the test run like this: * is the value an old one? Test txnid of tuple. If yes it's ok * is the value one created by ALTER TYPE ADD VALUE? Test blacklist. If no, it's ok. * is the enum a new one? Test whitelist. If yes, it's ok. * anything else is not ok. I think that would let us keep our promise. If we just did the blacklist and stuck with our current heuristic test for enum being created in the current transaction, we'd still probably avoid 99% of the problems, including specifically the one that gave rise to the bug report. Am I missing something? cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
I wrote: > Andrew Dunstanwrites: >> I see what you're saying, but my idea was slightly different. We would >> only add to the hashtable I had in mind at the bottom AddEnumLabel(). >> Any other value, whether added in the current transaction or not, should >> be safe, AIUI. > Oh, I see: a list of enum values we need to blacklist, not whitelist. > Yes, that's a significantly better idea than mine, because in common > use-cases that would be empty or have a very small number of entries. Oh, wait a minute ... that's not where the problem is, really. We can already tell reliably whether an enum value was created in the current transaction: the is-it-committed check in check_safe_enum_use is perfectly safe for that AFAICS. The hard part of this is the part about "was the enum type created in the current transaction?". We could make that reliable with the table I proposed of enum types created in the current transaction, but the possible performance drag is a concern. What I understand you to be proposing is to blacklist individual enum values added by ALTER ADD VALUE, but *not* values created aboriginally by CREATE TYPE AS ENUM. (The latter are surely safe, because the type must disappear if they do.) However, that would require dropping the second part of the current documentation promise: If ALTER TYPE ... ADD VALUE (the form that adds a new value to an enum type) is executed inside a transaction block, the new value cannot be used until after the transaction has been committed, except in the case that the enum type itself was created earlier in the same transaction. We'd have to just say "it can't be used till the transaction has been committed", full stop. Otherwise we're right back into the problem of needing to detect whether the enum type is new in transaction. >> Maybe we should also keep a cache of whitelisted enums >> created in the current transaction. > What for? I now realize that what you probably meant here was to track enum *types*, not values, created in the current transaction. But if we're doing that then we don't really need the per-enum-value-blacklist part of it. So I'm back to not being sure about the path forward. Maybe it would be all right to say "the value added by ADD VALUE can't be used in the same transaction, period". That's still a step forward compared to the pre-v10 prohibition on doing it at all. I don't remember if there were use-cases where we really needed the exception for new-in-transaction types. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On 09/23/2017 11:16 AM, Tom Lane wrote: > Andrew Dunstanwrites: > >>> The immediate question is do we care to design/implement such a thing >>> post-RC1. I'd have to vote "no". I think the most prudent thing to >>> do is revert 15bc038f9 and then have another go at it during the v11 >>> cycle. >> Sadly I agree. We've made decisions like this in the past, and I have >> generally been supportive of them. I think this is the first time I have >> been on the receiving end of one so late in the process :-( > Unless you want to try writing a patch for this in the next day or two, > I think we have to do that. But now that I see the plan clearly, maybe > we could get away with a post-RC1 fix. OK, I'll give it a shot. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rethinking autovacuum.c memory handling
On 9/23/17, 12:36 PM, "Tom Lane"wrote: >"Bossart, Nathan" writes: >> This looks reasonable to me as well. I haven't noticed any issues after >> a couple hours of pgbench with aggressive autovacuum settings, either. > > Thanks for looking. As I'm sure you realize, what motivated that was > not liking the switch into AutovacMemCxt that you'd added in > autovacuum_do_vac_analyze ... with this patch, we can drop that. Yup. I’ll go ahead and update that patch now. Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rethinking autovacuum.c memory handling
"Bossart, Nathan"writes: > This looks reasonable to me as well. I haven't noticed any issues after > a couple hours of pgbench with aggressive autovacuum settings, either. Thanks for looking. As I'm sure you realize, what motivated that was not liking the switch into AutovacMemCxt that you'd added in autovacuum_do_vac_analyze ... with this patch, we can drop that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rethinking autovacuum.c memory handling
Michael Paquierwrites: > I have spent some time looking at your patch and testing it. This > looks sane. A small comment that I have would be to add an assertion > at the top of perform_work_item to be sure that it is called in the > memory context of AutovacMemCxt. Done like that, thanks for reviewing! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rethinking autovacuum.c memory handling
On 9/23/17, 5:27 AM, "Michael Paquier"wrote: >On Sat, Sep 23, 2017 at 6:09 AM, Tom Lane wrote: >> I notice that autovacuum.c calls autovacuum_do_vac_analyze, and >> thereby vacuum(), in TopTransactionContext. This doesn't seem >> like a terribly great idea, because it doesn't correspond to what >> happens during a manually-invoked vacuum. > > Indeed, the inconsistency is not good here. > >> What I think we should do instead is invoke autovacuum_do_vac_analyze >> in the PortalContext that do_autovacuum has created, which we already >> have a mechanism to reset once per table processed in do_autovacuum. >> >> The attached patch does that, and also modifies perform_work_item() >> to use the same approach. Right now perform_work_item() has a >> copied-and-pasted MemoryContextResetAndDeleteChildren(PortalContext) >> call in its error recovery path, but that seems a bit out of place >> given that perform_work_item() isn't using PortalContext otherwise. > > I have spent some time looking at your patch and testing it. This > looks sane. A small comment that I have would be to add an assertion > at the top of perform_work_item to be sure that it is called in the > memory context of AutovacMemCxt. This looks reasonable to me as well. I haven't noticed any issues after a couple hours of pgbench with aggressive autovacuum settings, either. Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> [...] After another week of buildfarm runs, we have a few more cases of >> 3 rows of output, and none of more than 3 or less than 1. So I went >> ahead and pushed your patch. I'm still suspicious of these results, but >> we might as well try to make the buildfarm green pending investigation >> of how this is happening. > Yep. I keep the issue of pgbench tap test determinism in my todo list, > among other things. > I think that it should be at least clearer under which condition (load ? > luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs > some thinking. skink blew up real good just now: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-09-23%2010%3A50%3A01 the critical bit being # Failed test 'pgbench progress stderr /(?^:progress: 1\b)/' # at /home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 369. # 'starting vacuum...end. # progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped # progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped # progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 skipped # ' # doesn't match '(?^:progress: 1\b)' # Failed test 'transaction count for 001_pgbench_log_1.15981 (5)' # at t/001_pgbench_with_server.pl line 438. # Failed test 'transaction count for 001_pgbench_log_1.15981.1 (4)' # at t/001_pgbench_with_server.pl line 438. # Looks like you failed 3 tests of 233. That's exceeded my patience with this test case, so I've removed it for the moment. We can put it back as soon as we figure some way to make it more robust. (BTW, the "-nan" bits suggest an actual pgbench bug, independently of anything else.) Possibly you can duplicate skink's issue by running things under valgrind. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in plugin for logical decoding output
On 23/09/17 18:42, Euler Taveira wrote: 2017-09-22 19:28 GMT-03:00 Gregory Brail: We have been working on a project that makes extensive use of logical replication for use inside Apigee (which is a very small part of Google): https://github.com/apigee-labs/transicator In order to do this, we had to write our own logical replication plugin because the supplied "test_decoding" plugin from the "contrib" directory was hard for us to work with. Primarily: test_decoding is a proof of concept to illustrate the logical decoding potential. It is not intended for production. However, AFAIK, AWS's DMS uses it for production purposes (see http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html). I developed wal2json [1] for general use. It outputs changes in JSON. It was one of the first logical decoding plugins. 1) It doesn't include all the fields that we need for Transicator (most importantly we need the LSN and the 64-bit transaction ID). wal2json includes both. 2) It outputs a text format that is hard to parse. There are a lot of JSON parsers. I imagine that other users of logical decoding are facing similar problems. Would the community support the development of another plugin that is distributed as part of "contrib" that addresses these issues? I'd be happy to submit a patch, or GitHub repo, or whatever works best as an example. (Also, although Transicator uses protobuf, I'm happy to have it output a simple binary format as well.) There was a prior discussion and it was suggestted that we have a ready-for-production plugin in core (besides pgoutput). It was suggested [1] that I submit wal2json for 11. I'm in process to clean up the code and hope to submit it to CF2. I would be happy to see another logical decoding plugin into core starting on 11. However, this also poses a bit of a challenge for middleware implementors: you need to support one for 9.4-9.5 (test_decoding), another for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users to install a binary plugin is very unsexy, so these are the options available. However, having said that, and while json is a great output format for interoperability, if there's a discussion on which plugin to include next, I'd also favor one that has some more compact representation format (or that supports several formats, not only json). Regards, Álvaro -- Alvaro Hernandez --- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Built-in plugin for logical decoding output
2017-09-22 19:28 GMT-03:00 Gregory Brail: > We have been working on a project that makes extensive use of logical > replication for use inside Apigee (which is a very small part of Google): > > https://github.com/apigee-labs/transicator > > In order to do this, we had to write our own logical replication plugin > because the supplied "test_decoding" plugin from the "contrib" directory was > hard for us to work with. Primarily: > test_decoding is a proof of concept to illustrate the logical decoding potential. It is not intended for production. I developed wal2json [1] for general use. It outputs changes in JSON. It was one of the first logical decoding plugins. > 1) It doesn't include all the fields that we need for Transicator (most > importantly we need the LSN and the 64-bit transaction ID). > wal2json includes both. > 2) It outputs a text format that is hard to parse. > There are a lot of JSON parsers. > I imagine that other users of logical decoding are facing similar problems. > > Would the community support the development of another plugin that is > distributed as part of "contrib" that addresses these issues? I'd be happy > to submit a patch, or GitHub repo, or whatever works best as an example. > (Also, although Transicator uses protobuf, I'm happy to have it output a > simple binary format as well.) > There was a prior discussion and it was suggestted that we have a ready-for-production plugin in core (besides pgoutput). It was suggested [1] that I submit wal2json for 11. I'm in process to clean up the code and hope to submit it to CF2. [1] https://github.com/eulerto/wal2json [2] https://www.postgresql.org/message-id/CAHE3wggh6ucSCB%2BhnSK04xEQx75f3DTz0wPSjRMJFjum6pRrPQ%40mail.gmail.com -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
Tomas Vondrawrites: >>> [ scalarineqsel may fall over when used by extension operators ] > What about using two-pronged approach: > 1) fall back to mid bucket in back branches (9.3 - 10) > 2) do something smarter (along the lines you outlined) in PG11 Sure. We need to test the fallback case anyway. >> [ sketch of a more extensible design ] > Sounds reasonable to me, I guess - I can't really think about anything > simpler giving us the same flexibility. Actually, on further thought, that's still too simple. If you look at convert_string_to_scalar() you'll see it's examining all three values concurrently (the probe value, of one datatype, and two bin boundary values of possibly a different type). The reason is that it's stripping off whatever common prefix those strings have before trying to form a numeric equivalent. While certainly convert_string_to_scalar is pretty stupid in the face of non-ASCII sort orders, the prefix-stripping is something I really don't want to give up. So the design I sketched of considering each value totally independently isn't good enough. We could, perhaps, provide an API that lets an operator estimation function replace convert_to_scalar in toto, but that seems like you'd still end up duplicating code in many cases. Not sure about how to find a happy medium. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
Andrew Dunstanwrites: > On 09/22/2017 11:19 PM, Tom Lane wrote: >> Yeah, I was considering the same thing over dinner, though I'd phrase >> it oppositely: keep a list of enum type OIDs created in the current >> transaction, so that we could whitelist them. This could maybe become >> a problem if someone created a zillion enums in one xact, though. > I see what you're saying, but my idea was slightly different. We would > only add to the hashtable I had in mind at the bottom AddEnumLabel(). > Any other value, whether added in the current transaction or not, should > be safe, AIUI. Oh, I see: a list of enum values we need to blacklist, not whitelist. Yes, that's a significantly better idea than mine, because in common use-cases that would be empty or have a very small number of entries. In particular that fixes the "pg_restore -1" scenario, because no matter how many enums you're restoring, pg_dump doesn't use ALTER TYPE ADD VALUE. (Well, it does in --binary-upgrade mode, but those scripts are run in transaction-per-statement mode so it's fine.) > Maybe we should also keep a cache of whitelisted enums > created in the current transaction. What for? Wouldn't be any faster to search, in fact likely slower because it could get large in common use-cases. >> The immediate question is do we care to design/implement such a thing >> post-RC1. I'd have to vote "no". I think the most prudent thing to >> do is revert 15bc038f9 and then have another go at it during the v11 >> cycle. > Sadly I agree. We've made decisions like this in the past, and I have > generally been supportive of them. I think this is the first time I have > been on the receiving end of one so late in the process :-( Unless you want to try writing a patch for this in the next day or two, I think we have to do that. But now that I see the plan clearly, maybe we could get away with a post-RC1 fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
On 09/21/2017 04:24 PM, Tom Lane wrote: > Tomas Vondrawrites: >> [ scalarineqsel may fall over when used by extension operators ] > > I concur with your thought that we could have > ineq_histogram_selectivity fall back to a "mid bucket" default if > it's working with a datatype it is unable to convert_to_scalar. But I > think if we're going to touch this at all, we ought to have higher > ambition than that, and try to provide a mechanism whereby an > extension that's willing to work a bit harder could get that > additional increment of estimation accuracy. I don't care for this > way to do that: > The question is - do we need a solution that is back-patchable? This means we can't really use FIXEDDECIMAL without writing effectively copying a lot of the selfuncs.c stuff, only to make some minor fixes. What about using two-pronged approach: 1) fall back to mid bucket in back branches (9.3 - 10) 2) do something smarter (along the lines you outlined) in PG11 I'm willing to spend some time on both, but (2) alone is not a particularly attractive for us as it only helps PG11 and we'll have to do the copy-paste evil anyway to get the data type working on back branches. >> * Make convert_numeric_to_scalar smarter, so that it checks if >> there is an implicit cast to numeric, and fail only if it does not >> find one. > > because it's expensive, and it only works for numeric-category cases, > and it will fail outright for numbers outside the range of "double". > (Notice that convert_numeric_to_scalar is *not* calling the regular > cast function for numeric-to-double.) Moreover, any operator ought to > know what types it can accept; we shouldn't have to do more catalog > lookups to figure out what to do. > Ah, I see. I haven't realized it's not using the regular cast functions, but now that you point that out it's clear relying on casts would fail. > Now that scalarltsel and friends are just trivial wrappers for a > common function, we could imagine exposing scalarineqsel_wrapper as a > non-static function, with more arguments (and perhaps a better-chosen > name ;-)). The idea would be for extensions that want to go this > extra mile to provide their own selectivity estimation functions, > which again would just be trivial wrappers for the core function, but > would provide additional knowledge through additional arguments. > > The additional arguments I'm envisioning are a couple of C function > pointers, one function that knows how to convert the operator's > left-hand input type to scalar, and one function that knows how to > convert the right-hand type to scalar. (Identical APIs of course.) > Passing a NULL would imply that the core code must fall back on its > own devices for that input. > > Now the thing about convert_to_scalar is that there are several > different conversion conventions already (numeric, string, timestamp, > ...), and there probably could be more once extension types are > coming to the party. So I'm imagining that the API for these > conversion functions could be something like > > bool convert(Datum value, Oid valuetypeid, >int *conversion_convention, double *converted_value); > > The conversion_convention output would make use of some agreed-on > constants, say 1 for numeric, 2 for string, etc etc. In the core > code, if either converter fails (returns false) or if they don't > return the same conversion_convention code, we give up and use the > mid-bucket default. If they both produce results with the same > conversion_convention, then we can treat the converted_values as > commensurable. > OK, so instead of re-implementing the whole function, we'd essentially do just something like this: #typedef bool (*convert_callback) (Datum value, Oid valuetypeid, \ int *conversion_convention, \ double *converted_value); double scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq, VariableStatData *vardata, Datum constval, Oid consttype, convert_calback convert); and then, from the extension double scalarineqsel_wrapper(PlannerInfo *root, Oid operator, bool isgt, bool iseq, VariableStatData *vardata, Datum constval, Oid consttype) { return scalarineqsel(root, operator, isgt, iseq, vardata, constval, consttype, my_convert_func); } Sounds reasonable to me, I guess - I can't really think about anything simpler giving us the same flexibility. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] OpenFile() Permissions Refactor
On 9/13/17 10:26, David Steele wrote: > Here's a new patch based on your review. Where I had a question I made > a choice as described below: I have committed the changes to the file APIs and a fix for the umask save/restore issue. The mkdir changes didn't really inspire me. Replacing mkdir() with MakeDirectoryPerm() without any change in functionality doesn't really improve clarity. Maybe we'll revisit that when your next patches arrive. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?
On 09/22/2017 11:19 PM, Tom Lane wrote: > Andrew Dunstanwrites: >> On 09/22/2017 05:46 PM, Tom Lane wrote: >>> I'm not sure if that qualifies as a stop-ship problem, but it ain't >>> good, for sure. We need to look at whether we should revert 15bc038f9 >>> or somehow revise its rules. >> I wonder if we wouldn't be better >> doing this more directly, keeping a per-transaction hash of unsafe enum >> values (which will almost always be empty). It might even speed up the >> check. > Yeah, I was considering the same thing over dinner, though I'd phrase > it oppositely: keep a list of enum type OIDs created in the current > transaction, so that we could whitelist them. This could maybe become > a problem if someone created a zillion enums in one xact, though. I see what you're saying, but my idea was slightly different. We would only add to the hashtable I had in mind at the bottom AddEnumLabel(). Any other value, whether added in the current transaction or not, should be safe, AIUI. Maybe we should also keep a cache of whitelisted enums created in the current transaction. I'm not to worried about people creating a zillion enums (or enum labels being added for the solution I had in mind). Even a hash of a million Oids will only consume a few megabytes, won't it? > > The immediate question is do we care to design/implement such a thing > post-RC1. I'd have to vote "no". I think the most prudent thing to > do is revert 15bc038f9 and then have another go at it during the v11 > cycle. > > Sadly I agree. We've made decisions like this in the past, and I have generally been supportive of them. I think this is the first time I have been on the receiving end of one so late in the process :-( cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin_summarize_new_values error checking
Fujii Masao wrote: > On Wed, Jan 27, 2016 at 11:56 AM, Fujii Masaowrote: > > On Mon, Jan 25, 2016 at 4:03 PM, Jeff Janes wrote: > >> In reviewing one of my patches[1], Fujii-san has pointed out that I > >> didn't include checks for being in recovery, or for working on another > >> backend's temporary index. > >> > >> I think that brin_summarize_new_values in 9.5.0 commits those same > >> sins. In its case, I don't think those are critical, as they just > >> result in getting less specific error messages that one might hope > >> for, rather than something worse like a panic. > >> > >> But still, we might want to address them. > > > > Agreed to add those checks. > > Attached patch does this. Uh, I just realized you never applied this patch ... Will you do so now? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rethinking autovacuum.c memory handling
Alvaro Herrera wrote: > One idea I just had is to somehow add it to src/test/modules/brin, and > set up the postmaster in that test with autovacuum_naptime=1s. I'll go > check how feasible that is. (By placing it there we could also verify > that the index does indeed contain the index entries we expect, since it > has pageinspect available.) Yeah, this works, as per the attached patch. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 196bec86e957cc6413d1e4823cb891a0edc192de Mon Sep 17 00:00:00 2001 From: Alvaro HerreraDate: Sat, 23 Sep 2017 12:56:56 +0200 Subject: [PATCH] Add autovacuum work-items test for BRIN --- src/test/modules/brin/Makefile | 7 -- src/test/modules/brin/t/01_workitems.pl | 39 + 2 files changed, 44 insertions(+), 2 deletions(-) create mode 100644 src/test/modules/brin/t/01_workitems.pl diff --git a/src/test/modules/brin/Makefile b/src/test/modules/brin/Makefile index dda84c23c7..18c5cafd5e 100644 --- a/src/test/modules/brin/Makefile +++ b/src/test/modules/brin/Makefile @@ -16,7 +16,7 @@ include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk endif -check: isolation-check +check: isolation-check prove-check isolation-check: | submake-isolation $(MKDIR_P) isolation_output @@ -24,7 +24,10 @@ isolation-check: | submake-isolation --outputdir=./isolation_output \ $(ISOLATIONCHECKS) -.PHONY: check isolation-check +prove-check: + $(prove_check) + +.PHONY: check isolation-check prove-check submake-isolation: $(MAKE) -C $(top_builddir)/src/test/isolation all diff --git a/src/test/modules/brin/t/01_workitems.pl b/src/test/modules/brin/t/01_workitems.pl new file mode 100644 index 00..11c9981d40 --- /dev/null +++ b/src/test/modules/brin/t/01_workitems.pl @@ -0,0 +1,39 @@ +# Verify that work items work correctly + +use strict; +use warnings; + +use TestLib; +use Test::More tests => 2; +use PostgresNode; + +my $node = get_new_node('tango'); +$node->init; +$node->append_conf('postgresql.conf', 'autovacuum_naptime=1s'); +$node->start; + +$node->safe_psql('postgres', 'create extension pageinspect'); + +# Create a table with an autosummarizing BRIN index +$node->safe_psql('postgres', + 'create table brin_wi (a int) with (fillfactor = 10); +create index brin_wi_idx on brin_wi using brin (a) with (pages_per_range=1, autosummarize=on); +' +); +my $count = $node->safe_psql('postgres', + "select count(*) from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)" +); +is($count, '1', "initial index state is correct"); + +$node->safe_psql('postgres', + 'insert into brin_wi select * from generate_series(1, 100)'); + +$node->poll_query_until('postgres', + "select count(*) > 1 from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)", + 't'); + +$count = $node->safe_psql('postgres', + "select count(*) > 1 from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)" +); +is($count, 't', "index got summarized"); +$node->stop; -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rethinking autovacuum.c memory handling
Tom Lane wrote: > I notice that autovacuum.c calls autovacuum_do_vac_analyze, and > thereby vacuum(), in TopTransactionContext. This doesn't seem > like a terribly great idea, because it doesn't correspond to what > happens during a manually-invoked vacuum. TopTransactionContext > will go away when vacuum() commits the outer transaction, whereas > in non-autovac usage, we call vacuum() in a PortalHeapMemory > context that is not a child of TopTransactionContext and is not > at risk of being reset multiple times during the vacuum(). This'd > be a hazard if autovacuum_do_vac_analyze or vacuum did any palloc's > before getting to the main loop. More generally, I'm not aware of > other cases where we invoke a function in a context that we know > that function will destroy as it executes. Oh, good catch. This must be a very old oversight -- I bet it goes all the way back to the introduction of autovacuum. I think if there were any actual bugs, we would have noticed by now. > I don't see any live bug associated with this in HEAD, but this behavior > requires a rather ugly (and memory-leaking) workaround in the proposed > patch to allow multiple vacuum target rels. Well, it's definitely broken, so I'd vote for fixing it even if we weren't considering that patch. > What I think we should do instead is invoke autovacuum_do_vac_analyze > in the PortalContext that do_autovacuum has created, which we already > have a mechanism to reset once per table processed in do_autovacuum. Sounds sensible. > The attached patch does that, and also modifies perform_work_item() > to use the same approach. Right now perform_work_item() has a > copied-and-pasted MemoryContextResetAndDeleteChildren(PortalContext) > call in its error recovery path, but that seems a bit out of place > given that perform_work_item() isn't using PortalContext otherwise. Oops :-( > PS: I was disappointed to find out that perform_work_item() isn't > exercised at all in the standard regression tests. Yeah ... It's fairly simple to create a test that will invoke it a few times, but one problem is that it depends on autovacuum's timing. If I add this in brin.sql -- Test BRIN work items CREATE TABLE brin_work_items (LIKE brintest) WITH (fillfactor = 10); CREATE INDEX brin_work_items_idx ON brin_work_items USING brin (textcol) WITH (autosummarize = on, pages_per_range=1); INSERT INTO brin_work_items SELECT * FROM brintest; the work item is only performed about 15 seconds after the complete regression tests are finished, which I fear would mean "never" in practice. One idea I just had is to somehow add it to src/test/modules/brin, and set up the postmaster in that test with autovacuum_naptime=1s. I'll go check how feasible that is. (By placing it there we could also verify that the index does indeed contain the index entries we expect, since it has pageinspect available.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Fri, Sep 22, 2017 at 11:09 PM, Robert Haaswrote: > During planning, *every* node has a list of pathkeys associated with > it which represent the presumed output ordering. You can support > ordered append paths without changing any data structures at all; it's > just a matter of putting pathkeys into an AppendPath. > I totally agree. > The reason why MergeAppend has extra stuff in the node (numCols, > sortColIdx, etc.) is so that it can actually perform the sort at > runtime. But this feature requires no runtime support -- that's kinda > the point -- so there's no need for it to carry that information, or > to add any new nodes. > > At least, IIUC. That's true, but numCols, sortColdIdx etc are also used to display the sort key in an explain. If an append can return sorted data, it should also display the sort information, so I think these fields are still required in an Append node. If it's ok to only add these fields in an Append node for the explain part, I'll revert the Append/MergeAppend refactoring and do some other cleanup as you advised earlier. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rethinking autovacuum.c memory handling
On Sat, Sep 23, 2017 at 6:09 AM, Tom Lanewrote: > I notice that autovacuum.c calls autovacuum_do_vac_analyze, and > thereby vacuum(), in TopTransactionContext. This doesn't seem > like a terribly great idea, because it doesn't correspond to what > happens during a manually-invoked vacuum. Indeed, the inconsistency is not good here. > What I think we should do instead is invoke autovacuum_do_vac_analyze > in the PortalContext that do_autovacuum has created, which we already > have a mechanism to reset once per table processed in do_autovacuum. > > The attached patch does that, and also modifies perform_work_item() > to use the same approach. Right now perform_work_item() has a > copied-and-pasted MemoryContextResetAndDeleteChildren(PortalContext) > call in its error recovery path, but that seems a bit out of place > given that perform_work_item() isn't using PortalContext otherwise. I have spent some time looking at your patch and testing it. This looks sane. A small comment that I have would be to add an assertion at the top of perform_work_item to be sure that it is called in the memory context of AutovacMemCxt. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
Hello, Claudio. Thank you for review and confirm of improvement. On 2017-09-23 01:12, Claudio Freire wrote: On Tue, Sep 12, 2017 at 12:49 PM, Sokolov Yurawrote: On 2017-07-21 13:49, Sokolov Yura wrote: On 2017-05-17 17:46, Sokolov Yura wrote: Alvaro Herrera писал 2017-05-15 20:13: As I understand, these patches are logically separate, so putting them together in a single file isn't such a great idea. If you don't edit the patches further, then you're all set because we already have the previously archived patches. Next commitfest starts in a few months yet, and if you feel the need to submit corrected versions in the meantime, please do submit in separate files. (Some would even argue that each should be its own thread, but I don't think that's necessary.) Thank you for explanation. I'm adding new version of first patch with minor improvement: - I added detection of a case when all buckets are trivial (ie 0 or 1 element). In this case no need to sort buckets at all. I'm putting rebased version of second patch. Again rebased version of both patches. Now second patch applies cleanly independent of first patch. Patch 1 applies cleanly, builds, and make check runs fine. The code looks similar in style to surrounding code too, so I'm not going to complain about the abundance of underscores in the macros :-p I can reproduce the results in the OP's benchmark, with slightly different numbers, but an overall improvement of ~6%, which matches the OP's relative improvement. Algorithmically, everything looks sound. A few minor comments about patch 1: +if (max == 1) +goto end; That goto is unnecessary, you could just as simply say if (max > 1) { ... } Done. (I don't like indentation, though :-( ) +#define pg_shell_sort_pass(elem_t, cmp, off) \ +do { \ +int _i, _j; \ +elem_t _temp; \ +for (_i = off; _i < _n; _i += off) \ +{ \ _n right there isn't declared in the macro, and it isn't an argument either. It should be an argument, having stuff inherited from the enclosing context like that is confusing. Same with _arr, btw. pg_shell_sort_pass is not intended to be used outside pg_shell_sort and ph_insertion_sort, so I think, stealing from their context is ok. Nonetheless, done. Patch 2 LGTM. -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres CompanyFrom e486014c0a7545f7bb2264d0d7ac96f0544eacdb Mon Sep 17 00:00:00 2001 From: Sokolov Yura Date: Mon, 15 May 2017 14:23:39 +0300 Subject: [PATCH] Improve compactify_tuples Items passed to compactify_tuples are almost sorted. So call to general qsort makes unnecessary overhead on function calls (swapfunc, med, itemoffcompare). This patch implements bucket sort: - one pass of stable prefix sort on high 8 bits of offset - and insertion sort for buckets larger than 1 element Also for smaller arrays shell sort is used. Insertion and Shell sort are implemented using macroses. This approach allows to save 3% of cpu in degenerate case (highly intensive HOT random updates on unlogged table with synchronized_commit=off), and speeds up WAL replaying (as were found by Heikki Linnakangas). Same approach were implemented by Heikki Linnakangas some time ago with several distinctions. --- src/backend/storage/page/bufpage.c | 87 -- src/include/c.h| 63 +++ 2 files changed, 147 insertions(+), 3 deletions(-) diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index 41642eb59c..0f9f7d5dfc 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -434,6 +434,86 @@ itemoffcompare(const void *itemidp1, const void *itemidp2) } /* + * Sort an array of itemIdSort's on itemoff, descending. + * + * It uses Shell sort. Given array is small and itemoffcompare is inlined, + * it is much faster than call to qsort. + */ +static void +sort_itemIds_small(itemIdSort itemidbase, int nitems) +{ + pg_shell_sort(itemIdSortData, itemidbase, nitems, itemoffcompare); +} + +/* + * Sort an array of itemIdSort's on itemoff, descending. + * + * It uses bucket sort: + * - single pass of stable prefix sort on high 8 bits + * - and insertion sort on buckets larger than 1 element + */ +static void +sort_itemIds(itemIdSort itemidbase, int nitems) +{ + itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)]; +#define NSPLIT 256 +#define PREFDIV (BLCKSZ / NSPLIT) + /* two extra elements to emulate offset on previous step */ + uint16 count[NSPLIT + 2] = {0}; + int i, +max, +total, +pos, +highbits; + + Assert(nitems <= MaxIndexTuplesPerPage); + for (i = 0; i < nitems; i++) + { + highbits = itemidbase[i].itemoff / PREFDIV; + count[highbits]++; + } + /* sort in decreasing order */ + max = total = count[NSPLIT - 1]; + for (i =