Is there a cache consistent interface to tables ?
Hi, This is an odd request for help. I'm looking to expose an interface so an external app can insert to a table while maintaining cache consistency and inserts be promoted via wal. I need to support about 100k+ inserts/sec from a sensor data stream. It simply won't work using sql queries. If the call overhead is too high for single calls, multiple records per call is better. The data must be available for selects in 500ms. I current only have 24gb ram for pg, but production will be 56gb. I'm taking this approach because pgpool2 chokes, delaying past requirements. I initially wanted to use wal, but masters don't want wal in feeds and slaves have unpredictable delays of seconds before provisioning occurs. Suggestions are appreciated. Gary Sent from my iPad
Re: Using scalar function as set-returning: bug or feature?
Hello > select into b from my_insert('from func atx'); You missed select something into b. For example, select ret into b from my_insert('from func atx') as ret; Using scalar function in from is not bug. Silent assigning NULL for variables in "into" not matches same in "select"... I think better would be raise warning. regards, Sergei
Re: Temporary tables prevent autovacuum, leading to XID wraparound
On Sat, Feb 3, 2018 at 1:48 AM, Robert Haas wrote: > On Fri, Feb 2, 2018 at 1:27 AM, Masahiko Sawada wrote: >> Thank you for suggestion. It sounds more smarter. So it would be more >> better if we vacuums database for anti-wraparound in ascending order >> of relfrozenxid? > > Currently, we're doing it based on datfrozenxid. I was looking for a > small, relatively safe change to improve things, which led to my > proposal: continue using datfrozenxid when the database isn't being > vacuumed, but when it is, substitute the datfrozenxid that we expect > the database *will have* after the tables currently being vacuumed are > finished for the actual datfrozenxid. > > On further reflection, though, I see problems. Suppose db1 has > age(datfrozenxid) = 600m and two tables with age(relfrozenxid) of 600m > and 400m. db2 has age(datfrozenxid) = 500m. Then suppose #1 starts > vacuuming the table with age 600m. The AV launcher sees that, with > that table out of the way, we'll be able to bring datfrozenxid forward > to 400m and so sends worker #2 to db2. But actually, when the worker > in db1 finishes vacuuming the table with age 600m, it's going to have > to vacuum the table with age 400m before performing any relfrozenxid > update. So actually, the table with age 400m is holding the > relfrozenxid at 600m just as much as if it too had a relfrozenxid of > 600m. In fact, any tables in the same database that need routine > vacuuming are a problem, too: we're going to vacuum all of those > before updating relfrozenxid, too. > > Maybe we should start by making the scheduling algorithm used by the > individual workers smarter: > > 1. Let's teach do_autovacuum() that, when there are any tables needing > vacuum for wraparound, either for relfrozenxid or relminmxid, it > should vacuum only those and forget about the rest. This seems like > an obvious optimization to prevent us from delaying > datfrozenxid/datminmxid updates for the sake of vacuuming tables that > are "merely" bloated. > > 2. Let's have do_autovacuum() sort the tables to be vacuumed in > ascending order by relfrozenxid, so older tables are vacuumed first. > A zero-order idea is to put tables needing relfrozenxid vacuuming > before tables needing relminmxid vacuuming, and sort the latter by > ascending relminmxid, but maybe there's something smarter possible > there. The idea here is to vacuum tables in order of priority rather > than in whatever order they happen to be physically mentioned in > pg_class. > > 3. Let's have do_autovacuum() make an extra call to > vac_update_datfrozenxid() whenever the next table to be vacuumed is at > least 10 million XIDs (or MXIDs) newer than the first one it vacuumed > either since the last call to vac_update_datfrozenxid() or, if none, > since it started. That way, we don't have to wait for every single > table to get vacuumed before we can consider advancing > relfrozenxid/relminmxid. > > 4. When it's performing vacuuming for wraparound, let's have AV > workers advertise in shared memory the oldest relfrozenxid and > relminmxid that it might exist in the database. Given #1 and #2, this > is pretty easy, since we start by moving through tables in increasing > relfrozenxid order and then shift to moving through them in increasing > relminmxid order. When we're working through the relfrozenxid tables, > the oldest relminmxid doesn't move, and the oldest relfrozenxid is > that of the next table in the list. When we're working through the > relminmxid tables, it's the reverse. We need a little cleverness to > figure out what value to advertise when we're on the last table in > each list -- it should be the next-higher value, even though that will > be above the relevant threshold, not a sentinel value. So I think we should includes tables as well that are not at risk of wraparound in order to get the next-higher value (that is, the oldest table of the non-risked tables) instead of forgetting them. And then we skip vacuuming them. > > 5. With those steps in place, I think we can now adopt my previous > idea to have the AV launcher use any advertised relfrozenxid (and, as > I now realize, relminmxid) instead of the ones found in pg_database, > because now we know that individual workers are definitely focused on > getting relfrozenxid (and/or relminmxid) as soon as possible, and > vacuuming unrelated tables won't help them do it any faster. > > This gets us fairly close to vacuuming tables in decreasing order of > wraparound danger across the entire cluster. It's not perfect. It > prefers to keep vacuuming tables in the same database rather than > having a worker exit and maybe launching a new one in a different > database -- but the alternative is not very appealing. If we didn't > do it that way, and if we had a million tables with XIDs that were > closely spaced spread across different databases, we'd have to > terminate and relaunching workers at a very high rate to get > everything sorted out, which wou
Using scalar function as set-returning: bug or feature?
Hi hackers, I wonder if the following behavior is considered to be a bug in plpgsql or it is expected result: create table my_data(id serial primary key, time timestamp, type text); create or replace function my_insert(type text) RETURNS BOOLEAN AS $BODY$ BEGIN insert into my_data (time, type) values (now(), type); return true; END $BODY$ LANGUAGE plpgsql; create or replace function my_call_insert() RETURNS BOOLEAN AS $BODY$ DECLARE b boolean; BEGIN select into b from my_insert('from func atx'); return b; END $BODY$ LANGUAGE plpgsql; select my_call_insert(); my_call_insert (1 row) So, if function returning boolean is used in from list, then no error or warning is produced, but null value is assigned to the target variable. If this code is rewritten in more natural way: b := my_insert('from func atx'); or select my_insert('from func atx') into b; then function returns expected result (true). I spent some time investigating this code under debugger and looks like the reason of the problem is that tuple descriptor for the row returned by my_insert() contains no columns (number of attributes is zero). May be there are some reasons for such behavior, but I find it quite confusing and unnatural. I prefer to report error in this case or return tuple with single column, containing return value of the function. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] path toward faster partition pruning
Hi Ashutosh. On 2018/02/09 14:09, Ashutosh Bapat wrote: > On Wed, Feb 7, 2018 at 7:17 PM, Robert Haas wrote: >> On Wed, Feb 7, 2018 at 8:37 AM, Ashutosh Bapat >> wrote: >>> While looking at the changes in partition.c I happened to look at the >>> changes in try_partition_wise_join(). They mark partitions deemed >>> dummy by pruning as dummy relations. If we accept those changes, we >>> could very well change the way we handle dummy partitioned tables, >>> which would mean that we also revert the recent commit >>> f069c91a5793ff6b7884120de748b2005ee7756f. But I guess, those changes >>> haven't been reviewed yet and so not final. >> >> Well, if you have an opinion on those proposed changes, I'd like to hear it. > > I am talking about changes after this comment > /* > + * If either child_rel1 or child_rel2 is not a live partition, they'd > + * not have been touched by set_append_rel_size. So, its RelOptInfo > + * would be missing some information that set_append_rel_size sets > for > + * live partitions, such as the target list, child EQ members, etc. > + * We need to make the RelOptInfo of even the dead partitions look > + * minimally valid and as having a valid dummy path attached to it. > + */ > > There are couple of problems with this change > 1. An N way join may call try_partition_wise_join() with the same base > relation on one side N times. The condition will be tried those many > times. > > 2. We will have to adjust or make similar changes in > try_partition_wise_aggregate() proposed in the partition-wise > aggregate patch. Right now it checks if the relation is dummy but it > will have to check whether the pathlist is also NULL. Any > partition-wise operation that we try in future will need this > adjustment. > > AFAIU, for pruned partitions, we don't set necessary properties of the > corresponding RelOptInfo when it is pruned. If we were sure that we > will not use that RelOptInfo anywhere in the rest of the planning, > this would work. But that's not the case. AFAIU, current planner > assumes that a relation which has not been eliminated before planning > (DEAD relation), but later proved to not contribute any rows in the > result, is marked dummy. Partition pruning breaks that assumption and > thus may have other side effects, that we do not see right now. We > have similar problem with dummy partitioned tables, but we have code > in place to avoid looking at the pathlists of their children by not > considering such a partitioned table as partitioned. May be we want to > change that too. > > Either we add refactoring patches to change the planner so that it > doesn't assume something like that OR we make sure that the pruned > partition's RelOptInfo have necessary properties and a dummy pathlist > set. I will vote for second. We spend CPU cycles marking pruned > partitions as dummy if the dummy pathlist is never used. May be we can > avoid setting dummy pathlist if we can detect that a pruned partition > is guaranteed not to be used, e.g when the corresponding partitioned > relation does not participate in any join or other upper planning. Thanks for the analysis. I agree with all the points of concern. so for now, I have dropped all the changes from my patch that give rise to the concerns. With the new patch, changes to the existing optimizer code beside introducing partprune.c in the util directory are pretty thin: git diff master --stat src/backend/optimizer/ src/backend/optimizer/path/allpaths.c | 16 ++ src/backend/optimizer/util/Makefile|2 +- src/backend/optimizer/util/clauses.c |4 +- src/backend/optimizer/util/partprune.c | 1421 +++ src/backend/optimizer/util/plancat.c | 83 --- src/backend/optimizer/util/relnode.c |8 + 6 files changed, 1504 insertions(+), 30 deletions(-) So, no refactoring the existing optimizer code, just replacing the partition pruning mechanism with partprune.c functions. > Apart from that another change that caught my eye is > > Instead of going through root->append_rel_list to pick up the child > appinfos, store them in an array called part_appinfos that stores > partition appinfos in the same order as RelOptInfos are stored in > part_rels, right when the latter are created. > > Further, instead of going through root->pcinfo_list to get the list > of partitioned child rels, which ends up including even the rels > that are pruned by set_append_rel_size(), build up a list of "live" > partitioned child rels and use the same to initialize partitioned_rels > field of AppendPath. > > That was voted down by Robert during partition-wise join > implementation. And I agree with him. Any changes around changing that > should change the way we handle AppendRelInfos for all relations, not > just (declarative) partitioned relations. I removed part_appinfos from the patch. Also, I have made the changes introducing live_partitioned_rels a separate patch,
Re: CALL stmt, ERROR: unrecognized node type: 113 bug
On Fri, Feb 02, 2018 at 04:07:28PM +0900, Michael Paquier wrote: > You need to read that as "only a SubPlan can be executed after a SubLink > has been processed by the planner", so please replace the last "latter" > by "planner". (I forgot to add Peter and Andrew in CC: previously, so done now.) e4128ee7 is making is clear that SubLink are authorized when transforming it in transformSubLink(), however I cannot think about a use case so should we just forbid them, and this is actually untested. So the patch attached does so. The second problem involves a cache lookup failure for a type when trying to use pg_get_functiondef on a procedure. Luckily, it is possible to make the difference between a procedure and a function by checking if prorettype is InvalidOid or not. There is room for a new patch which supports pg_get_proceduredef() to generate the definition of a procedure, with perhaps a dedicated psql shortcut, but that could always be done later on. -- Michael From a33e015fa06c118c7430506ca2c42c1146deb439 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 9 Feb 2018 15:49:37 +0900 Subject: [PATCH] Fix minor issues with procedure calls Procedures invoked with subqueries as arguments fail to treat them correctly as those are generated as SubLink nodes which need to be processed through the planner first to be correctly executed. The CALL infrastructure lacks the logic, and actually it may not make much sense to support such cases as any application can use a proper SELECT query to do the same, so block them. A second problem is related to the use of pg_get_functiondef which complains about a cache lookup failure when called on a procedure. It is possible to make the difference between a procedure and a function by looking at prorettype, so block properly the case where this function is called on a procedure. There is room for support of a system function which generates definitions for procedures, and which could share much with pg_get_functiondef, but this is left as future work. Bug report by Pavel Stehule, patch by Michael Paquier. Discussion: https://postgr.es/m/CAFj8pRDxOwPPzpA8i+AQeDQFj7bhVw-dR2==rfwz3zmgkm5...@mail.gmail.com --- src/backend/parser/parse_expr.c| 4 +++- src/backend/utils/adt/ruleutils.c | 5 + src/test/regress/expected/create_procedure.out | 8 src/test/regress/sql/create_procedure.sql | 8 4 files changed, 24 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index d45926f27f..031f1b72fb 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -1818,7 +1818,6 @@ transformSubLink(ParseState *pstate, SubLink *sublink) case EXPR_KIND_RETURNING: case EXPR_KIND_VALUES: case EXPR_KIND_VALUES_SINGLE: - case EXPR_KIND_CALL: /* okay */ break; case EXPR_KIND_CHECK_CONSTRAINT: @@ -1847,6 +1846,9 @@ transformSubLink(ParseState *pstate, SubLink *sublink) case EXPR_KIND_PARTITION_EXPRESSION: err = _("cannot use subquery in partition key expression"); break; + case EXPR_KIND_CALL: + err = _("cannot use subquery in CALL arguments"); + break; /* * There is intentionally no default: case here, so that the diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 28767a129a..dc3d3c7752 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -2472,6 +2472,11 @@ pg_get_functiondef(PG_FUNCTION_ARGS) (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is an aggregate function", name))); + if (!OidIsValid(proc->prorettype)) + ereport(ERROR, +(errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" is a procedure", name))); + /* * We always qualify the function name, to ensure the right function gets * replaced. diff --git a/src/test/regress/expected/create_procedure.out b/src/test/regress/expected/create_procedure.out index ccad5c87d5..41e0921b33 100644 --- a/src/test/regress/expected/create_procedure.out +++ b/src/test/regress/expected/create_procedure.out @@ -94,6 +94,14 @@ ALTER ROUTINE testfunc1a RENAME TO testfunc1; ALTER ROUTINE ptest1(text) RENAME TO ptest1a; ALTER ROUTINE ptest1a RENAME TO ptest1; DROP ROUTINE testfunc1(int); +-- subqueries +CALL ptest2((SELECT 5)); +ERROR: cannot use subquery in CALL arguments +LINE 1: CALL ptest2((SELECT 5)); +^ +-- Function definition +SELECT pg_get_functiondef('ptest2'::regproc); +ERROR: "ptest2" is a procedure -- cleanup DROP PROCEDURE ptest1; DROP PROCEDURE ptest2; diff --git a/src/test/regress/sql/create_procedure.sql b/src/test/regress/sql/create_procedure.sql index 8c47b7e9ef..a140fef928 100644 --- a/src/test/regress/sql/create_procedure.sql +++ b/src/test/regress/sql/create_procedure.sql @@ -72,6 +72,14 @@ ALTER ROUTINE ptest1a RENAME TO ptest1; DROP ROUTINE testfunc1(int); +-- subqueries +CALL ptest2((SELECT 5)); + + +-- Fu
Re: Creation of wiki page for open items of v11
On Fri, Feb 9, 2018 at 9:14 AM, Michael Paquier wrote: > Hi all, > > I begin seeing bugs related to new features of v11 popping up around, > like this one for procedures: > https://www.postgresql.org/message-id/CAFj8pRDxOwPPzpA8i%2BAQeDQFj7bhVw-dR2%3D%3DrfWZ3zMGkm568Q%40mail.gmail.com > > Are there any objections in creating a wiki page to track all those bugs > and issues? We don't want to lose track of all that. +1. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] advanced partition matching algorithm for partition-wise join
On Fri, Feb 9, 2018 at 11:26 AM, Amit Langote wrote: > On 2018/02/09 14:31, Ashutosh Bapat wrote: >>> I also noticed that a later patch adds partsupfunc to PartitionScheme, >>> which the pruning patch needs too. So, perhaps would be nice to take out >>> that portion of the patch. That is, the changes to PartitionScheme struct >>> definition and those to find_partition_scheme(). >> >> I am not sure whether a patch with just that change and without any >> changes to use that member will be acceptable. So leaving this aside. > > I asked, because with everything that I have now changed in the partition > pruning patch, one would need to pass these FmgrInfo pointers down to > partition bound searching functions from the optimizer. If the changes to > add partsupfunc to the optimizer were taken out from your main patch, the > pruning patch could just start using it. For now, I'm making those > changes part of the pruning patch. That's fine. Someone's patch will be committed first and the other will just take out those changes. But I am open to separate those changes into other patch if a committer feels so. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Proposal: partition pruning by secondary attributes
On Thu, Feb 8, 2018 at 4:51 PM, Ildar Musin wrote: > > The idea is to store min and max values of secondary attributes (like > 'id' in the example above) for each partition somewhere in catalog and > use it for partition pruning along with partitioning key. Every insertion and update of secondary attribute will touch catalog and update if required. That will increase the contention on catalog. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] advanced partition matching algorithm for partition-wise join
On 2018/02/09 14:31, Ashutosh Bapat wrote: >> I also noticed that a later patch adds partsupfunc to PartitionScheme, >> which the pruning patch needs too. So, perhaps would be nice to take out >> that portion of the patch. That is, the changes to PartitionScheme struct >> definition and those to find_partition_scheme(). > > I am not sure whether a patch with just that change and without any > changes to use that member will be acceptable. So leaving this aside. I asked, because with everything that I have now changed in the partition pruning patch, one would need to pass these FmgrInfo pointers down to partition bound searching functions from the optimizer. If the changes to add partsupfunc to the optimizer were taken out from your main patch, the pruning patch could just start using it. For now, I'm making those changes part of the pruning patch. Thanks, Amit
Re: non-bulk inserts and tuple routing
Fujita-san, Thanks a lot for the review. I had mistakenly tagged these patches v24, but they were actually supposed to be v5. So the attached updated patch is tagged v6. On 2018/02/07 19:36, Etsuro Fujita wrote: >> (2018/02/05 14:34), Amit Langote wrote: >>> The code in tupconv_map_for_subplan() currently assumes that it can rely >>> on all leaf partitions having been initialized. > > On reflection I noticed this analysis is not 100% correct; I think what > that function actually assumes is that all sublplans' partitions have > already been initialized, not all leaf partitions. Yes, you're right. >>> Since we're breaking that >>> assumption with this proposal, that needed to be changed. So the patch >>> contained some refactoring to make it not rely on that assumption. > > I don't think we really need this refactoring because since that as in > another patch you posted, we could initialize all subplans' partitions in > ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be > called without any changes to that function because of what I said above. What my previous approach failed to consider is that in the update case, we'd already have ResultRelInfo's for some of the leaf partitions initialized, which could be saved into proute->partitions array right away. Updated patch does things that way, so all the changes I had proposed to tupconv_map_for_subplan are rendered unnecessary. > Here are comments for the other patch (patch > v24-0002-During-tuple-routing-initialize-per-partition-ob.patch): > > o On changes to ExecSetupPartitionTupleRouting: > > * The comment below wouldn't be correct; no ExecInitResultRelInfo in the > patch. > > - proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions * > - sizeof(ResultRelInfo *)); > + /* > +* Actual ResultRelInfo's and TupleConversionMap's are allocated in > +* ExecInitResultRelInfo(). > +*/ > + proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions * > + sizeof(ResultRelInfo *)); I removed the comment altogether, as the comments elsewhere make the point clear. > * The patch removes this from the initialization step for a subplan's > partition, but I think it would be better to keep this here because I > think it's a good thing to put the initialization stuff together into one > place. > > - /* > -* This is required in order to we convert the partition's > -* tuple to be compatible with the root partitioned table's > -* tuple descriptor. When generating the per-subplan result > -* rels, this was not set. > -*/ > - leaf_part_rri->ri_PartitionRoot = rel; It wasn't needed here with the previous approach, because we didn't touch any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it back in the updated patch. > * I think it would be better to keep this comment here. > > - /* Remember the subplan offset for this ResultRelInfo */ Fixed. > * Why is this removed from that initialization? > > - proute->partitions[i] = leaf_part_rri; Because of the old approach. Now it's back in. > o On changes to ExecInitPartitionInfo: > > * I don't understand the step starting from this, but I'm wondering if > that step can be removed by keeping the above setup of proute->partitions > for the subplan's partition in ExecSetupPartitionTupleRouting. > > + /* > +* If we are doing tuple routing for update, try to reuse the > +* per-subplan resultrel for this partition that ExecInitModifyTable() > +* might already have created. > +*/ > + if (mtstate && mtstate->operation == CMD_UPDATE) Done, as mentioned above. On 2018/02/08 19:16, Etsuro Fujita wrote: > Here are some minor comments: > > o On changes to ExecInsert > > * This might be just my taste, but I think it would be better to (1) > change ExecInitPartitionInfo so that it creates and returns a > newly-initialized ResultRelInfo for an initialized partition, and (2) > rewrite this bit: > > + /* Initialize partition info, if not done already. */ > + ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate, > + leaf_part_index); > + > /* > * Save the old ResultRelInfo and switch to the one corresponding to > * the selected partition. > */ > saved_resultRelInfo = resultRelInfo; > resultRelInfo = proute->partitions[leaf_part_index]; > + Assert(resultRelInfo != NULL); > > to something like this (I would say the same thing to the copy.c changes): > > /* > * Save the old ResultRelInfo and switch to the one corresponding to > * the selected partition. > */ > saved_resultRelInfo = resultRelInfo; > resultRelInfo = proute->partitions[leaf_part_index]; > if (resultRelInfo == NU
Re: [HACKERS] advanced partition matching algorithm for partition-wise join
On Thu, Feb 8, 2018 at 10:41 AM, Amit Langote wrote: > On 2018/02/08 11:55, Amit Langote wrote: >> Hi Ashutosh. >> >> On 2018/02/07 13:51, Ashutosh Bapat wrote: >>> Here's a new patchset with following changes >>> >>> 1. Rebased on the latest head taking care of partition bound >>> comparison function changes >> >> I was about to make these changes myself while revising the fast pruning >> patch. Instead, I decided to take a look at your patch and try to use it >> in my tree. > > I also noticed that a later patch adds partsupfunc to PartitionScheme, > which the pruning patch needs too. So, perhaps would be nice to take out > that portion of the patch. That is, the changes to PartitionScheme struct > definition and those to find_partition_scheme(). I am not sure whether a patch with just that change and without any changes to use that member will be acceptable. So leaving this aside. > > Regarding the latter, wouldn't be nice to have a comment before the code > that does the copying about why we don't compare the partsupfunc field to > decide if we have a match or not. I understand it's because the > partsupfunc array contains pointers, not OIDs. But maybe, that's too > obvious to warrant a comment. It's because partsupfuncs should point to the information of the same function when partopfamily matches and partopcintype matches. I would have added an assertion for that with a comment, but with the pointer that would be risky. Or we can just assert that the oids match. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: ALTER TABLE ADD COLUMN fast default
On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan wrote: > On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro > wrote: >> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan >> wrote: >>> Yeah, thanks. revised patch attached >> >> FYI the identity regression test started failing recently with this >> patch applied (maybe due to commit >> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?) >> > > Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it. > Here's a version that fixes the above issue and also the issue with VACUUM that Tomas Vondra reported. I'm still working on the issue with aggregates that Tomas also reported. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services fast_default-v8.patch Description: Binary data
Re: [HACKERS] path toward faster partition pruning
On Wed, Feb 7, 2018 at 7:17 PM, Robert Haas wrote: > On Wed, Feb 7, 2018 at 8:37 AM, Ashutosh Bapat > wrote: >> While looking at the changes in partition.c I happened to look at the >> changes in try_partition_wise_join(). They mark partitions deemed >> dummy by pruning as dummy relations. If we accept those changes, we >> could very well change the way we handle dummy partitioned tables, >> which would mean that we also revert the recent commit >> f069c91a5793ff6b7884120de748b2005ee7756f. But I guess, those changes >> haven't been reviewed yet and so not final. > > Well, if you have an opinion on those proposed changes, I'd like to hear it. I am talking about changes after this comment /* + * If either child_rel1 or child_rel2 is not a live partition, they'd + * not have been touched by set_append_rel_size. So, its RelOptInfo + * would be missing some information that set_append_rel_size sets for + * live partitions, such as the target list, child EQ members, etc. + * We need to make the RelOptInfo of even the dead partitions look + * minimally valid and as having a valid dummy path attached to it. + */ There are couple of problems with this change 1. An N way join may call try_partition_wise_join() with the same base relation on one side N times. The condition will be tried those many times. 2. We will have to adjust or make similar changes in try_partition_wise_aggregate() proposed in the partition-wise aggregate patch. Right now it checks if the relation is dummy but it will have to check whether the pathlist is also NULL. Any partition-wise operation that we try in future will need this adjustment. AFAIU, for pruned partitions, we don't set necessary properties of the corresponding RelOptInfo when it is pruned. If we were sure that we will not use that RelOptInfo anywhere in the rest of the planning, this would work. But that's not the case. AFAIU, current planner assumes that a relation which has not been eliminated before planning (DEAD relation), but later proved to not contribute any rows in the result, is marked dummy. Partition pruning breaks that assumption and thus may have other side effects, that we do not see right now. We have similar problem with dummy partitioned tables, but we have code in place to avoid looking at the pathlists of their children by not considering such a partitioned table as partitioned. May be we want to change that too. Either we add refactoring patches to change the planner so that it doesn't assume something like that OR we make sure that the pruned partition's RelOptInfo have necessary properties and a dummy pathlist set. I will vote for second. We spend CPU cycles marking pruned partitions as dummy if the dummy pathlist is never used. May be we can avoid setting dummy pathlist if we can detect that a pruned partition is guaranteed not to be used, e.g when the corresponding partitioned relation does not participate in any join or other upper planning. Apart from that another change that caught my eye is Instead of going through root->append_rel_list to pick up the child appinfos, store them in an array called part_appinfos that stores partition appinfos in the same order as RelOptInfos are stored in part_rels, right when the latter are created. Further, instead of going through root->pcinfo_list to get the list of partitioned child rels, which ends up including even the rels that are pruned by set_append_rel_size(), build up a list of "live" partitioned child rels and use the same to initialize partitioned_rels field of AppendPath. That was voted down by Robert during partition-wise join implementation. And I agree with him. Any changes around changing that should change the way we handle AppendRelInfos for all relations, not just (declarative) partitioned relations. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire wrote: > On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada wrote: >> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire >> wrote: >>> I can look into doing 3, that *might* get rid of the need to do that >>> initial FSM vacuum, but all other intermediate vacuums are still >>> needed. >> >> Understood. So how about that this patch focuses only make FSM vacuum >> more frequently and leaves the initial FSM vacuum and the handling >> cancellation cases? The rest can be a separate patch. > > Ok. > > Attached split patches. All but the initial FSM pass is in 1, the > initial FSM pass as in the original patch is in 2. > > I will post an alternative patch for 2 when/if I have one. In the > meanwhile, we can talk about 1. Thank you for updating the patch! +/* Tree pruning for partial vacuums */ +if (threshold) +{ +child_avail = fsm_get_avail(page, slot); +if (child_avail >= threshold) +continue; +} I think slots in a non-leaf page might not have a correct value because fsm_set_avail doesn't propagate up beyond pages. So do we need to check the root of child page rather than a slot in the non-lead page? I might be missing something though. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: ldapi support
On Fri, Feb 9, 2018 at 4:05 PM, Peter Eisentraut wrote: > After the LDAP code was switched to use ldap_initialize() as part of the > ldaps support, ldapi (LDAP over Unix-domain sockets) also works. I > noticed an old bug report (#13625) that asked for it. So I suggest this > patch to document this and add some tests. > > One flaw is that this only works when using the URL syntax. Providing a > separate option would require coding URL escaping, since ultimately an > URL must be composed and passed to ldap_initialize(). But since > OpenLDAP apparently now considers URLs to be the preferred form for > connection parameters, I'm comfortable just sticking to that format. Nice. The test doesn't actually succeed in reloading the pg_hba.conf file though: 2018-02-09 16:41:15.886 NZDT [24472] LOG: received SIGHUP, reloading configuration files 2018-02-09 16:41:15.893 NZDT [24472] LOG: unsupported LDAP URL scheme: ldapi 2018-02-09 16:41:15.893 NZDT [24472] LOG: pg_hba.conf was not reloaded I think hba.c needs to learn to consider "ldapi" to be acceptable (after it parses the URL). Then I think when InitializeLDAPConnection() reconstitutes the URL with psprintf, it'll probably need to avoid sticking :port on the end. The fact that we take the URL to pieces and then stick it back together again may seem a bit odd, but it is required by the documentation (ldap_initialize() wants a URL "containing only the schema, the host, and the port fields"). I see there is another scheme called "cldap" (which seems to be something like LDAP over UDP). I wonder if anyone cares about that. -- Thomas Munro http://www.enterprisedb.com
Creation of wiki page for open items of v11
Hi all, I begin seeing bugs related to new features of v11 popping up around, like this one for procedures: https://www.postgresql.org/message-id/CAFj8pRDxOwPPzpA8i%2BAQeDQFj7bhVw-dR2%3D%3DrfWZ3zMGkm568Q%40mail.gmail.com Are there any objections in creating a wiki page to track all those bugs and issues? We don't want to lose track of all that. Thanks, -- Michael signature.asc Description: PGP signature
ldapi support
After the LDAP code was switched to use ldap_initialize() as part of the ldaps support, ldapi (LDAP over Unix-domain sockets) also works. I noticed an old bug report (#13625) that asked for it. So I suggest this patch to document this and add some tests. One flaw is that this only works when using the URL syntax. Providing a separate option would require coding URL escaping, since ultimately an URL must be composed and passed to ldap_initialize(). But since OpenLDAP apparently now considers URLs to be the preferred form for connection parameters, I'm comfortable just sticking to that format. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 4b24fa8aaf68327f773327269b23b6b129df858c Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 8 Feb 2018 10:21:43 -0500 Subject: [PATCH] Document support for LDAP over Unix-domain sockets (ldapi) After the LDAP code was switched to use ldap_initialize() as part of the ldaps support, ldapi also works, so we might as well document it and add some tests. Bug: #13625 --- doc/src/sgml/client-auth.sgml | 10 ++ src/test/ldap/t/001_auth.pl | 25 +++-- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index 53832d08e2..8ed95bef4e 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -1618,6 +1618,7 @@ LDAP Authentication other LDAP options in a more compact and standard form. The format is ldap[s]://host[:port]/basedn[?[attribute][?[scope][?[filter +ldapi://path/basedn[?[attribute][?[scope][?[filter scope must be one of base, one, sub, @@ -1640,6 +1641,15 @@ LDAP Authentication ldapurl. + + The URL scheme ldapi makes the LDAP connection + over a Unix-domain socket. The path must + be URL-encoded, for example + ldapi://%2Fusr%2Flocal%2Fvar%2Fldapi. This + connection method can only specified as a URL, not via separate + options. + + For non-anonymous binds, ldapbinddn and ldapbindpasswd must be specified as separate diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl index 5508da459f..71045a5866 100644 --- a/src/test/ldap/t/001_auth.pl +++ b/src/test/ldap/t/001_auth.pl @@ -2,7 +2,7 @@ use warnings; use TestLib; use PostgresNode; -use Test::More tests => 19; +use Test::More tests => 20; my ($slapd, $ldap_bin_dir, $ldap_schema_dir); @@ -32,6 +32,16 @@ $ENV{PATH} = "$ldap_bin_dir:$ENV{PATH}" if $ldap_bin_dir; +sub url_encode { + my $string = shift; + $string =~ s/([^a-zA-Z0-9\-_.!~*'()])/sprintf("%%%02X", ord($1))/ge; + $string =~ tr/ /+/; + return $string; +} + +# for Unix-domain socket +my $tempdir_short = TestLib::tempdir_short; + my $ldap_datadir = "${TestLib::tmp_check}/openldap-data"; my $slapd_certs = "${TestLib::tmp_check}/slapd-certs"; my $slapd_conf = "${TestLib::tmp_check}/slapd.conf"; @@ -41,7 +51,9 @@ my $ldap_server = 'localhost'; my $ldap_port = int(rand() * 16384) + 49152; my $ldaps_port = $ldap_port + 1; +my $ldapi_socket = "$tempdir_short/ldapi"; my $ldap_url = "ldap://$ldap_server:$ldap_port";; +my $ldapi_url = "ldapi://" . url_encode($ldapi_socket); my $ldaps_url = "ldaps://$ldap_server:$ldaps_port"; my $ldap_basedn = 'dc=example,dc=net'; my $ldap_rootdn = 'cn=Manager,dc=example,dc=net'; @@ -86,7 +98,7 @@ system_or_bail "openssl", "req", "-new", "-nodes", "-keyout", "$slapd_certs/server.key", "-out", "$slapd_certs/server.csr", "-subj", "/cn=server"; system_or_bail "openssl", "x509", "-req", "-in", "$slapd_certs/server.csr", "-CA", "$slapd_certs/ca.crt", "-CAkey", "$slapd_certs/ca.key", "-CAcreateserial", "-out", "$slapd_certs/server.crt"; -system_or_bail $slapd, '-f', $slapd_conf, '-h', "$ldap_url $ldaps_url"; +system_or_bail $slapd, '-f', $slapd_conf, '-h', "$ldap_url $ldapi_url $ldaps_url"; END { @@ -162,6 +174,15 @@ sub test_access $ENV{"PGPASSWORD"} = 'secret1'; test_access($node, 'test1', 0, 'search+bind with LDAP URL authentication succeeds'); +note "ldapi"; + +unlink($node->data_dir . '/pg_hba.conf'); +$node->append_conf('pg_hba.conf', qq{local all all ldap ldapurl="$ldapi_url/$ldap_basedn?uid?sub"}); +$node->reload; + +$ENV{"PGPASSWORD"} = 'secret1'; +test_access($node, 'test1', 0, 'authentication using ldapi URL succeeds'); + note "search filters"; unlink($node->data_dir . '/pg_hba.conf'); base-commit: b3a101eff0fd3747bebf547b1769e28f820f4515 -- 2.16.1
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly
(2018/02/09 4:32), Robert Haas wrote: On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane wrote: According to https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01 there's still an intermittent issue. I ran "make installcheck" in contrib/postgres_fdw in a loop, and got a similar failure on the 47th try --- my result duplicates the second plan change shown by rhinoceros, but not the first one. I speculate that the plan change is a result of autovacuum kicking in partway through the run. Will look into this. Hmm. Maybe inserting an ANALYZE command in the right place would fix it? VACUUM to the right tables in the right place might better fix it? Another idea would be to modify test cases added by that commit so that they don't modify the existing tables to not break the existing test cases? Best regards, Etsuro Fujita
Re: [HACKERS] MERGE SQL Statement for PG11
On Wed, Feb 7, 2018 at 7:51 PM, Pavan Deolasee wrote: > I understand getting EPQ semantics right is very important. Can you please > (once again) summarise your thoughts on what you think is the *most* > appropriate behaviour? I can then think how much efforts might be involved > in that. If the efforts are disproportionately high, we can discuss if > settling for some not-so-nice semantics, like we apparently did for > partition key updates. I personally believe that the existing EPQ semantics are already not-so-nice. They're what we know, though, and we haven't actually had any real world complaints, AFAIK. My concern is mostly just that MERGE manages to behave in a way that actually "provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements", as the docs put it. As long as MERGE manages to do something as close to that high level description as possible in READ COMMITTED mode (with our current semantics for multiple statements in RC taken as the baseline), then I'll probably be happy. Some novel new behavior -- "EPQ with a twist"-- is clearly necessary. I feel a bit uneasy about it because anything that anybody suggests is likely to be at least a bit arbitrary (EPQ itself is kind of arbitrary). We only get to make a decision on how "EPQ with a twist" will work once, and that should be a decision that is made following careful deliberation. Ambiguity is much more likely to kill a patch than a specific technical defect, at least in my experience. Somebody can usually just fix a technical defect. > I am sorry, I know you and Simon have probably done that a few times already > and I should rather study those proposals first. So it's okay if you don't > want to repeat those; I will work on them next week once I am back from > holidays. Unfortunately, I didn't get very far with Simon on this. I didn't really start talking about this until recently, though, so it's not like you missed much. The first time I asked Simon about this was January 23rd, and I first proposed something about 10 days ago. Something very tentative. (I did ask some questions about EPQ, and even WHEN ... AND quals much earlier, but that was in the specific context of a debate about MERGE's use of ON CONFLICT's speculative insertion mechanism. I consider this to be a totally different discussion, that ended before Simon even posted his V1 patch, and isn't worth spending your time on now.) > TBH I did not consider partitioning any less complex and it was indeed very > complex, requiring at least 3 reworks by me. And from what I understood, it > would have been a blocker too. So is subquery handling and RLS. That's why I > focused on addressing those items while you and Simon were still debating > EPQ semantics. Sorry if I came across as dismissive of that effort. That was certainly not my intention. I am pleasantly surprised that you've managed to move a number of things forward rather quickly. I'll rephrase: while it would probably have been a blocker in theory (I didn't actually weigh in on that), I doubted that it would actually end up doing so in practice (and it now looks like I was right to doubt that, since you got it done). It was a theoretical blocker, as opposed to an open item that could drag on indefinitely despite everyone's best efforts. Obviously details matter, and obviously there are a lot of details to get right outside of RC semantics, but it seems wise to focus on the big risk that is EPQ/RC conflict handling. The only other thing that comes close to that risk is the risk that we'll get stuck on RLS. Though even the RLS discussion may actually end up being blocked on this crucial question of EPQ/RC conflict handling. Did you know that the RLS docs [1] have a specific discussion of the implications of EPQ for users of RLS, and that it mentions doing things like using SELECT ... FOR SHARE to work around the problem? It has a whole example of a scenario that users actually kind of need to know about, at least in theory. RC conflict handling semantics could bleed into a number of other things. I'll need to think some more about RC conflict handling (deciding what "EPQ with a twist" actually means), since I haven't focused on MERGE recently. Bear with me. [1] https://www.postgresql.org/docs/current/static/ddl-rowsecurity.html -- Peter Geoghegan
Re: PostgreSQL 2018-02-08 Security Update Release
> Greetings Tatsuo, > > * Tatsuo Ishii (is...@sraoss.co.jp) wrote: >> In >> https://www.postgresql.org/about/news/1829/ >> >> URL links to both CVE-2018-1052 and CVE-2018-1053 give me a 404 error. >> I am the only one who are getting the error? > > Unfortunately, we don't have any control over when RedHat updates their > system for the CVEs. That means the URL on our page is correct, but their system not yet prepare the contens. Thanks for the clarification! -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: PostgreSQL 2018-02-08 Security Update Release
Greetings Tatsuo, * Tatsuo Ishii (is...@sraoss.co.jp) wrote: > In > https://www.postgresql.org/about/news/1829/ > > URL links to both CVE-2018-1052 and CVE-2018-1053 give me a 404 error. > I am the only one who are getting the error? Unfortunately, we don't have any control over when RedHat updates their system for the CVEs. Thanks! Stephen signature.asc Description: PGP signature
Re: PostgreSQL 2018-02-08 Security Update Release
In https://www.postgresql.org/about/news/1829/ URL links to both CVE-2018-1052 and CVE-2018-1053 give me a 404 error. I am the only one who are getting the error? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp From: Stephen Frost Subject: PostgreSQL 2018-02-08 Security Update Release Date: Thu, 8 Feb 2018 08:59:05 -0500 Message-ID: <20180208135905.gq2...@tamriel.snowman.net> > 2018-02-08 Security Update Release > == > > The PostgreSQL Global Development Group has released an update to all > supported > versions of our database system, including 10.2, 9.6.7, 9.5.11, 9.4.16, > 9.3.21. > This release fixes two security issues. This release also fixes issues with > VACUUM, GIN indexes, and hash indexes that could lead to data corruption, as > well as fixes for using parallel queries and logical replication. > > All users using the affected versions of PostgreSQL should update as soon as > possible. Please see the notes on "Updating" below for any post-update steps > that may be required. > > Please note that PostgreSQL changed its versioning scheme with the release of > version 10.0, so updating to version 10.2 from 10.0 or 10.1 is considered a > minor update. > > Security Issues > --- > > Two security vulnerabilities have been fixed by this release: > > * CVE-2018-1052: Fix the processing of partition keys containing multiple > expressions > * CVE-2018-1053: Ensure that all temporary files made with "pg_upgrade" are > non-world-readable > > Bug Fixes and Improvements > -- > > This update fixes over 60 bugs reported in the last few months. Some of these > issues affect only version 10, but many affect all supported versions: > > * Fix crash and potential disclosure of backend memory when processing > partition > keys containing multiple expressions > * Fix potential disclosure of temporary files containing database passwords > created by pg_upgrade by not allowing these files to be world-accessible > * Fix cases where VACUUM would not remove dead rows if they were updated while > "key-share" locked, leading to potential data corruption > * Fix for GIN indexes to prevent bloat by ensuring the pending-insertions list > is cleaned up by VACUUM > * Fix potential index corruption with hash indexes due to failure to mark > metapages as dirty > * Fix several potential crash scenarios for parallel queries, including when a > bitmap heap scan cannot allocate memory > * Fix several potential hang-ups in parallel queries, including when a > parallel > worker fails to start > * Fix collection of EXPLAIN statistics from parallel workers > * Prevent fake deadlock failures when multiple sessions are running > CREATE INDEX CONCURRENTLY > * Fix for trigger behavior when using logical replication > * Several fixes for "walsender" functionality to improve stability as well as > visibility into the replication process > * Fix logical decoding to correctly clean up disk files for crashed > transactions > * Several fixes for identity columns, including disallowing identity columns > on > tables derived from composite types and partitions > * Fix handling of list partitioning constraints for partition keys of boolean > and array types > * Fix incorrectly generated plans for UPDATE and DELETE queries when a table > has > a mix of inherited regular and foreign child tables > * Fix incorrect query results from cases involving GROUPING SETS when used > with > flattened subqueries > * Fix UNION/INTERSECT/EXCEPT over zero columns, e.g. "SELECT UNION SELECT;" > * Several fixes for subqueries within a LATERAL subquery > * Several improvements for query planning estimation > * Allow a client that supports SCRAM channel binding, such as a future version > of PostgreSQL or libpq, to connect to a PostgreSQL 10 server > * Fix sample INSTR() functions used to help transition from Oracle(r) PL/SQL > to > PostgreSQL PL/pgSQL to correctly match Oracle functional behavior > * Fix pg_dump to make permissions (ACL), security label, and comment entries > reliably identifiable in archive outputs > * Modify behavior for contrib/cube's "cube ~> int" operator to make it > compatible with KNN search. This is a backwards incompatible change and any > expression indexes or materialized views using this operator will need to be > reindexed and refreshed, respectively. > * Several fixes in contrib/postgres_fdw to prevent query planner errors > * Added modern examples of auto-start scripts for PostgreSQL on macOS in the > contrib/start-scripts/macos directory > * Several fixes for Windows, including postmaster startup and compatibility > with > libperl > * Spinlock fixes and support for Motorola 68K and 88K architectures > > This update also contains tzdata release 2018c, with updates for DST law > changes > in Brazil, Sao Tome and Principe, plus historical corrections for Bolivia, > Japan,
Re: Fix a typo in walsender.c
On 2018/02/09 4:40, Robert Haas wrote: On Thu, Feb 8, 2018 at 1:32 AM, atorikoshi wrote: Attached a minor patch for variable name in comment: Committed. Thank you!
Re: update tuple routing and triggers
On 2018/02/09 4:31, Robert Haas wrote: > On Wed, Feb 7, 2018 at 8:26 PM, Amit Langote > wrote: >> OK, done in the attached. > > Committed. Thanks. Thank you. Sorry, missed renaming leafrel that you did yourself. Regards, Amit
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Claudio Freire wrote: > I don't like looping, though, seems overly cumbersome. What's worse? > maintaining that fragile weird loop that might break (by causing > unexpected output), or a slight slowdown of the test suite? > > I don't know how long it might take on slow machines, but in my > machine, which isn't a great machine, while the vacuum test isn't fast > indeed, it's just a tiny fraction of what a simple "make check" takes. > So it's not a huge slowdown in any case. Well, what about a machine running tests under valgrind, or the weird cache-clobbering infuriatingly slow code? Or buildfarm members running on really slow hardware? These days, a few people have spent a lot of time trying to reduce the total test time, and it'd be bad to lose back the improvements for no good reason. I grant you that the looping I proposed is more complicated, but I don't see any reason why it would break. Another argument against the LOCK pg_class idea is that it causes an unnecessary contention point across the whole parallel test group -- with possible weird side effects. How about a deadlock? Other than the wait loop I proposed, I think we can make a couple of very simple improvements to this test case to avoid a slowdown: 1. the DELETE takes about 1/4th of the time and removes about the same number of rows as the one using the IN clause: delete from vactst where random() < 3.0 / 4; 2. Use a new temp table rather than vactst. Everything is then faster. 3. Figure out the minimum size for the table that triggers the behavior you want. Right now you use 400k tuples -- maybe 100k are sufficient? Don't know. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: JIT compiling with LLVM v10.0
On Fri, Feb 9, 2018 at 3:14 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote: > $ ./configure --prefix=/build/postgres-jit/ --with-llvm > --enable-debug --enable-depend --enable-cassert > /usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This > file requires compiler and library support for the ISO C++ 2011 > standard. This support must be enabled with the -std=c++11 or > -std=gnu++11 compiler options. Did you try passing CXXFLAGS="-std=c++11" to configure? -- Thomas Munro http://www.enterprisedb.com
Re: Fix a typo in walsender.c
On Thu, Feb 8, 2018 at 1:32 AM, atorikoshi wrote: > Attached a minor patch for variable name in comment: Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly
On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane wrote: > Etsuro Fujita writes: >> (2018/02/08 10:40), Robert Haas wrote: >>> Uggh, I missed the fact that they were doing that. It's probably >>> actually useful test coverage, but it's not surprising that it isn't >>> stable. > >> That was my purpose, but I agree with the instability. Thanks again, >> Robert! > > According to > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01 > > there's still an intermittent issue. I ran "make installcheck" in > contrib/postgres_fdw in a loop, and got a similar failure on the > 47th try --- my result duplicates the second plan change shown by > rhinoceros, but not the first one. I speculate that the plan change > is a result of autovacuum kicking in partway through the run. Hmm. Maybe inserting an ANALYZE command in the right place would fix it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: update tuple routing and triggers
On Wed, Feb 7, 2018 at 8:26 PM, Amit Langote wrote: > OK, done in the attached. Committed. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: it's a feature, but it feels like a bug
W dniu 08.02.2018 o 05:51, David Fetter pisze: > On Wed, Feb 07, 2018 at 10:26:50PM -0500, Tom Lane wrote: >> Rafal Pietrak writes: [-] > > CREATE TABLE foo(b BOOLEAN, i INTEGER NOT NULL, t TEXT NOT NULL) PARTITION BY > LIST (b); > CREATE TABLE foo_true PARTITION OF foo (PRIMARY KEY(i, t)) FOR VALUES IN > ('true'); > CREATE TABLE bar(foo_i INTEGER NOT NULL, foo_t TEXT NOT NULL, FOREIGN > KEY(foo_i, foo_t) REFERENCES foo_true); This is exactly my current setup. It creates other problems with manageing my dataset, so I'm looking for other ways to lay down the schema. thenx, -R
Re: it's a feature, but it feels like a bug
W dniu 08.02.2018 o 04:26, Tom Lane pisze: > Rafal Pietrak writes: [] > >> And it is sort of "couterintuitive" - as you can see, there is a UNIQUE >> index for test(a,b) target; admitedly partial, but why should that >> matter? > > Because the index fails to guarantee uniqueness of (a,b) in rows where d > isn't true. There could be many duplicates in such rows, possibly even of > (a,b) pairs that also appear --- though only once --- in rows where d is > true. > > If there were a way to say that the FK is only allowed to reference rows > where d is true, then this index could support an FK like that. But > there's no way to express such a thing in SQL. I sort of knew/expected that. But I'd like to approach the sources anyway. > > Personally I'd think about separating your rows-where-d-is-true into > their own table, which could have a normal PK index. You could still > create a union view over that table and the one with the other rows > to satisfy whatever queries want to think the two kinds of rows > are the same thing. But I'd offer that if one set of rows has (a,b) > as a PK and the other does not, they are not really the same kind > of thing. Actually, they are. the explanation of my schema would be lengthy, but in showt, lets say, I'm talking of a mail-hub, where : A=mbox-owner-id, B=message-UNIQUE-id, C=the-other-entity-id, D=flas-inbox-outbox'; the table contains every message anyone send or received. only sender assigns ID to a message. So: all outgoing messages have unique (A,B), and D=true all received messages have unique (B,C), and D=false those messages are parsed, digested, and they update columns of their respective rows. ... the tricky part is, that some of them must form explicit lists. This is column (E). This is why I need to have an FK (E,A) --> (B,A). Currently, to use FK in this dataset I have the main table split into: inbox, and outbox. Unfortunately this fires back as the entire schema effectively has to have twice the number of relations, and FK interlinking it growing almost as O(2) with tables. At the point that I am, this is already unmanagable. So I'm quite desperate to "do it some other way". Like patching postgresql. I was thinking, that: an attempt to "alter table add constraint .. foreign key..." could: a) identify if the target table has ANY sort of UNIQUE index covering provided list of columns (even if it is a partial index) b) if that index is only partial, locate the condition and use it during insert/update/etc and retrieval of target row. c) if that index is functional index, locate that function and use it during insert/update/etc. So I'd appreciate some guidence which part of the sources I should study first. regards, -R
Re: Proposal: partition pruning by secondary attributes
On 2018-02-08 14:48:34 -0300, Alvaro Herrera wrote: > Ildar Musin wrote: > > > The idea is to store min and max values of secondary attributes (like > > 'id' in the example above) for each partition somewhere in catalog and > > use it for partition pruning along with partitioning key. You can think > > of it as somewhat like BRIN index but for partitions. > > What is the problem with having a BRIN index? Building plans to scan the individual partitions, lock them, open the relevant files, etc is often going to be significantly more expensive than pruning at plan time. But there also seems to be a number of fairly nasty locking issues with this proposal, leaving the amount of required code aside. Greetings, Andres Freund
Re: [HACKERS] path toward faster partition pruning
Robert Haas wrote: > On Wed, Feb 7, 2018 at 3:42 AM, Ashutosh Bapat > wrote: > > partition.c seems to have two kinds of functions 1. that build and > > manage relcache, creates quals from bounds etc. which are metadata > > management kind 2. partition bound comparison functions, and other > > optimizer related functions. May be we should divide the file that > > way. The first category code remains in catalog/ as it is today. The > > second catagory functions move to optimizer/. > > It would be sensible to separate functions that build and manage data > in the relcache from other functions. I think we should consider > moving the existing functions of that type from partition.c to > src/backend/utils/cache/partcache.c. FWIW I've been thinking that perhaps we need some other separation of code better than statu quo. The current partition.c file includes stuff for several modules and ISTM all these new patches are making more and more of a mess. So +1 to the general idea of splitting things up. Maybe partcache.c is not ambitious enough, but it seems a good first step. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Query running for very long time (server hanged) with parallel append
On Wed, Feb 7, 2018 at 2:11 AM, Amit Khandekar wrote: > Attached is the patch accordingly. OK, I see. That makes sense; committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Proposal: partition pruning by secondary attributes
Ildar Musin wrote: > The idea is to store min and max values of secondary attributes (like > 'id' in the example above) for each partition somewhere in catalog and > use it for partition pruning along with partitioning key. You can think > of it as somewhat like BRIN index but for partitions. What is the problem with having a BRIN index? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] A design for amcheck heapam verification
On Thu, Feb 8, 2018 at 6:05 AM, Andrey Borodin wrote: > I do not see a reason behind hashing the seed. It made some sense when I was XOR'ing it to mix. A uniform distribution of bits seemed desirable then, since random() won't use the most significant bit -- it generates random numbers in the range of 0 to 2^31-1. It does seem unnecessary now. > Also, I'd like to reformulate this paragraph. I understand what you want to > say, but the sentence is incorrect. > + * The Bloom filter behaves non-deterministically when caller passes a random > + * seed value. This ensures that the same false positives will not occur > from > + * one run to the next, which is useful to some callers. > Bloom filter behaves deterministically, but differently. This does not > ensures any thing, but probably will give something with hight probability. I agree that that's unclear. I should probably cut it down, and say something like "caller can pass a random seed to make it unlikely that the same false positives will occur from one run to the next". -- Peter Geoghegan
Re: New gist vacuum.
Hi Andrey, On 2/7/18 10:46 AM, Andrey Borodin wrote: >> 7 февр. 2018 г., в 18:39, David Steele написал(а): >> >> Hi Andrey, >> >> On 1/21/18 5:34 AM, Andrey Borodin wrote: >>> Hello, Alexander! 16 янв. 2018 г., в 21:42, Andrey Borodin написал(а): Please find README patch attached. >>> >>> Here's v2 version. Same code, but x2 comments. Also fixed important typo in >>> readme BFS->DFS. Feel free to ping me any time with questions. >> >> This patch has not gotten review and does not seem like a good fit for >> the last PG11 CF so I am marking it Returned with Feedback. > > Why do you think this patch does not seem good fit for CF? Apologies for the brevity. I had about 40 patches to go through yesterday. The reason it does not seem a good fit is that it's a new, possibly invasive patch that has not gotten any review in the last three CFs since it was reintroduced. I'm not sure why that's the case and I have no opinion about the patch itself, but there it is. We try to avoid new patches in the last CF that could be destabilizing and this patch appears to be in that category. I know it has been around for a while, but the lack of review makes it "new" in the context of the last CF for PG11. > I've been talking with Alexander just yesterday at PgConf.Russia, and he was > going to provide review. Great! I'd suggest you submit this patch for the CF after 2018-03. However, that's completely up to you. Regards, -- -David da...@pgmasters.net
Re: SSL test names
On 2/7/18 23:18, Michael Paquier wrote: > You need to update the comment on top of test_connect_ok in > ServerSetup.pm. done and committed > Wouldn't it be better to use the expected result > as an argument and merge test_connect_ok and test_connect_fails? That doesn't seem to be the general style, and I think it's more readable the way it is now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Proposal: generic WAL compression
On 07/02/18 19:42, Stephen Frost wrote: >> really useful for my "OUZO" project (a fork of the RUM access method). > > Glad to hear that you're continuing to work on it. Yes, it will be available on Github eventually. >> One general comment I can already make is that enabling compression >> should be made optional, which appears to be a small and easy addition >> to the generic WAL API. > > The question would be- is there a way for us to determine when it should > be enabled or not enabled, or do we have to ask the user to tell us? My suggestion would be adding something like #define GENERIC_XLOG_MOVE_DETECT_ON 0x0002 /* search for moved parts of page image */ #define GENERIC_XLOG_MOVE_DETECT_OFF 0x0004 /* do not search for moved parts of page image */ This would be used for the 'flags' argument of GenericXLogRegisterBuffer(), allowing code using generic WAL (such as custom access methods) to control this kind of compression on a per buffer basis, if need be. Then it would be up to the author of any extension relying on generic WAL to decide if and how the user will be given control of WAL compression. However, > Asking the user is something we'd really not do unless we absolutely > have to. Much better is if we can figure out when it's best to enable > or disable (or use/not use) based on some criteria and do so > automatically. this is a point I did not think of before :-) Probably a logical choice would be having "GENERIC_XLOG_MOVE_DETECT" default to true for a 'wal_level' value of 'replica' (the default setting nowadays) or higher, and default to false for 'minimal'. In this way, generic WAL will be automatically compressed when it is potentially stored for some longer time, or consuming network bandwidth. -- Markus Nullmeierhttp://www.g-vo.org German Astrophysical Virtual Observatory (GAVO)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly
Etsuro Fujita writes: > (2018/02/08 10:40), Robert Haas wrote: >> Uggh, I missed the fact that they were doing that. It's probably >> actually useful test coverage, but it's not surprising that it isn't >> stable. > That was my purpose, but I agree with the instability. Thanks again, > Robert! According to https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01 there's still an intermittent issue. I ran "make installcheck" in contrib/postgres_fdw in a loop, and got a similar failure on the 47th try --- my result duplicates the second plan change shown by rhinoceros, but not the first one. I speculate that the plan change is a result of autovacuum kicking in partway through the run. regards, tom lane
Re: In logical replication concurrent update of partition key creates a duplicate record on standby.
On Thu, Feb 8, 2018 at 5:55 PM, Amit Kapila wrote: > On Wed, Feb 7, 2018 at 6:00 PM, Amit Kapila wrote: >> On Wed, Feb 7, 2018 at 3:42 PM, amul sul wrote: >>> On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar >>> wrote: On 7 February 2018 at 13:53, amul sul wrote: > Hi, > > If an update of partition key involves tuple movement from one partition > to > another partition then there will be a separate delete on one partition > and > insert on the other partition made. > > In the logical replication if an update performed on the master and > standby at > the same moment, then replication worker tries to replicate delete + > insert > operation on standby. While replying master changes on standby for the > delete > operation worker will log "concurrent update, retrying" message (because > the > update on standby has already deleted) and move forward to reply the next > insert operation. Standby update also did the same delete+insert is as > part of > the update of partition key in a result there will be two records > inserted on > standby. A quick thinking on how to resolve this makes me wonder if we can manage to pass some information through logical decoding that the delete is part of a partition key update. This is analogous to how we set some information locally in the tuple by setting tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber. >>> >>> +1, >>> >> >> I also mentioned the same thing in the other thread [1], but I think >> that alone won't solve the dual record problem as you are seeing. I >> think we need to do something for next insert as you are suggesting. >> > > Can you please once check what was the behavior before Update Tuple > routing patch (Commit-id: 2f178441) went in? > Before this commit such update will be failed with following error: postgres=# update foo set a=2, b='node1_update' where a=1; ERROR: new row for relation "foo1" violates partition constraint DETAIL: Failing row contains (2, node1_update). Regards, Amul
Re: In logical replication concurrent update of partition key creates a duplicate record on standby.
On Wed, Feb 7, 2018 at 6:00 PM, Amit Kapila wrote: > On Wed, Feb 7, 2018 at 3:42 PM, amul sul wrote: >> On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar >> wrote: >>> On 7 February 2018 at 13:53, amul sul wrote: Hi, If an update of partition key involves tuple movement from one partition to another partition then there will be a separate delete on one partition and insert on the other partition made. In the logical replication if an update performed on the master and standby at the same moment, then replication worker tries to replicate delete + insert operation on standby. While replying master changes on standby for the delete operation worker will log "concurrent update, retrying" message (because the update on standby has already deleted) and move forward to reply the next insert operation. Standby update also did the same delete+insert is as part of the update of partition key in a result there will be two records inserted on standby. >>> >>> A quick thinking on how to resolve this makes me wonder if we can >>> manage to pass some information through logical decoding that the >>> delete is part of a partition key update. This is analogous to how we >>> set some information locally in the tuple by setting >>> tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber. >>> >> >> +1, >> > > I also mentioned the same thing in the other thread [1], but I think > that alone won't solve the dual record problem as you are seeing. I > think we need to do something for next insert as you are suggesting. > >> also if worker failed to reply delete operation on standby then >> we need to decide what will be the next step, should we skip follow >> insert operation or error out or something else. >> > > That would be tricky, do you see any simple way of doing either of those. > Not really, like ExecUpdate for an update of partition key if delete is failed then the further insert will be skipped, but you are correct, it might be more tricky than I can think -- there is no guarantee that the next insert operation which replication worker trying to replicate is part of the update of partition key mechanism. How can one identify that an insert operation on one relation is related to previously deleting operation on some other relation? Regards, Amul
Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada wrote: > On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire wrote: >> I can look into doing 3, that *might* get rid of the need to do that >> initial FSM vacuum, but all other intermediate vacuums are still >> needed. > > Understood. So how about that this patch focuses only make FSM vacuum > more frequently and leaves the initial FSM vacuum and the handling > cancellation cases? The rest can be a separate patch. Ok. Attached split patches. All but the initial FSM pass is in 1, the initial FSM pass as in the original patch is in 2. I will post an alternative patch for 2 when/if I have one. In the meanwhile, we can talk about 1. From 0a980794c90973383e5a63f64839616a79542260 Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Thu, 8 Feb 2018 12:39:15 -0300 Subject: [PATCH 2/2] Vacuum: do a partial FSM vacuum at the beginning A partial FSM vacuum right at the start of the vacuum process helps ameliorate issues with cancelled autovacuums never updating the FSM. --- src/backend/commands/vacuumlazy.c | 24 1 file changed, 24 insertions(+) diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 9ad0f34..2965204 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -103,6 +103,19 @@ */ #define PREFETCH_SIZE ((BlockNumber) 32) +/* + * If autovacuums get regularly cancelled, the FSM may never be + * vacuumed. To work around that, we perform an initial partial + * FSM vacuum at the beginning of the vacuum process, to quickly + * make existing unmarked free space visible. To avoid useless + * redundant work, however, we avoid recursing into branches + * that already have a set amount of free space, we only try + * to discover unmarked free space. This value controls how + * much free space is enough free space in that context. The + * value is in the terms of the FSM map. + */ +#define INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD 64 + typedef struct LVRelStats { /* hasindex = true means two-pass strategy; false means one-pass */ @@ -250,6 +263,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, vacrelstats->pages_removed = 0; vacrelstats->lock_waiter_detected = false; + /* + * Vacuum the Free Space Map partially before we start. + * If an earlier vacuum was canceled, and that's likely in + * highly contended tables, we may need to finish up. If we do + * it now, we make the space visible to other backends regardless + * of whether we succeed in finishing this time around. + * Don't bother checking branches that already have usable space, + * though. + */ + FreeSpaceMapVacuum(onerel, INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD); + /* Open all indexes of the relation */ vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel); vacrelstats->hasindex = (nindexes > 0); -- 1.8.4.5 From fdb2e0ae486fc6160df48c6a668eafa50b32814a Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Fri, 28 Jul 2017 21:31:59 -0300 Subject: [PATCH 1/2] Vacuum: Update FSM more frequently Make Vacuum update the FSM more frequently, to avoid the case where autovacuum fails to reach the point where it updates the FSM in highly contended tables. Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids recursing into branches that already contain enough free space, to avoid having to traverse the whole FSM and thus induce quadratic costs. Intermediate FSM vacuums are only supposed to make enough free space visible to avoid extension until the final (non-partial) FSM vacuum. Run partial FSM vacuums after each index pass, which is a point at which whole ranges of the heap have been thorougly cleaned, and we can expect no further updates to those ranges of the FSM save for concurrent activity. When there are no indexes, and thus no index passes, run partial FSM vacuums every 8GB of dirtied pages or 1/8th of the relation, whichever is highest. This allows some partial work to be made visible without incurring quadratic cost. In any case, FSM are small in relation to the table, so even when quadratic cost is involved, it should not be problematic. Index passes already incur quadratic cost, and the addition of the FSM is unlikely to be measurable. Run a partial FSM vacuum with a low pruning threshold right at the beginning of the VACUUM to finish up any work left over from prior canceled vacuum runs, something that is common in highly contended tables when running only autovacuum. --- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 +++ src/backend/commands/vacuumlazy.c | 50 +-- src/backend/storage/freespace/freespace.c | 29 ++ src/backend/storage/freespace/indexfsm.c | 2 +- src/include/storage/freespace.h | 2 +- 6 files changed, 78 insertions(+), 17 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin
Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support
Michael Paquier writes: > In order to run tests consistently on the whole tree, I use a simple > alias which tests also things like src/test/ssl and src/test/ldap on the > way. > Lately, I am getting annoyed by $subject when working on OpenSSL stuff > as sometimes I need to test things with and without SSL support to make > sure that a patch is rightly shaped. However if configure is built > without SSL support then src/test/ssl just fails. The same applies to > src/test/ldap. > Could it be possible to disable them using something like the attached > if a build is done without SSL and/or LDAP? That seems like possibly not such a great idea. If somebody were using a run of src/test/ssl to check their build, they would now get no notification if they'd forgotten to build --with-openssl. Perhaps we could introduce some new targets, "check-if-enabled" or so, that act like you want. regards, tom lane
Re: JIT compiling with LLVM v10.0
On 2018-02-08 15:14:42 +0100, Dmitry Dolgov wrote: > > On 8 February 2018 at 10:29, Andreas Karlsson wrote: > >> On 02/07/2018 03:54 PM, Andres Freund wrote: > >> > >> I've pushed v10.0. The big (and pretty painful to make) change is that > >> now all the LLVM specific code lives in src/backend/jit/llvm, which is > >> built as a shared library which is loaded on demand. > > > > > > It does not seem to be possible build without LLVM anymore. Yea, wrong header included. Will fix. > Maybe I'm doing something wrong, but I also see some issues during compilation > even with llvm included (with gcc 5.4.0 and 7.1.0). Is it expected, do I need > to use another version to check it out? > > $ git rev-parse HEAD > e24cac5951575cf86f138080acec663a0a05983e > > $ ./configure --prefix=/build/postgres-jit/ --with-llvm > --enable-debug --enable-depend --enable-cassert Seems you need to provide a decent C++ compiler (via CXX=... to configure). Will test that it actually works with a recent clang. Greetings, Andres Freund
Re: non-bulk inserts and tuple routing
On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita wrote: > if (resultRelInfo == NULL); > { > /* Initialize partition info. */ > resultRelInfo = ExecInitPartitionInfo(mtstate, > saved_resultRelInfo, > proute, > estate, > leaf_part_index); > } I'm pretty sure that code has one semicolon too many. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
On Thu, Feb 8, 2018 at 9:04 AM, Amit Kapila wrote: > I guess workers need to wait till leader become active and processes > the error message. So if you kill a worker, it doesn't die but sits there waiting for someone to run a command in the leader? That sounds terrible. >> Also, if you're OK with not being able to do anything except fetch >> from the cursor until you reach the end, then couldn't you just issue >> the query without the cursor and use PQsetSingleRowMode? > > I think there is a lot of cursor usage via plpgsql in which it could be > useful. I don't see how. If you can't do anything but fetch from the cursor while the cursor is open, then you can't go start doing things in plpgsql code either. If you allow plpgsql code to run during that time, then it can do arbitrary stuff which breaks the design assumptions around parallel mode, like calling parallel-unsafe functions or trying to do transaction control or running DML. The whole point here is that the parallel-mode stuff is only designed to cleanly error out in cases that we think we can hit during the course of executing a query. In other cases, it may elog(), Assert(), crash, give wrong answers, etc. As soon as you let parallel-mode survive beyond the context of a query, you've opened a huge can of worms which you will not be easily able to shut. If we restrict what you can do outside of a query to a very narrow set of operations, then it might survive, but "execute arbitrary plpgsql code" is not a narrow set of operations. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: JIT compiling with LLVM v10.0
> On 8 February 2018 at 10:29, Andreas Karlsson wrote: >> On 02/07/2018 03:54 PM, Andres Freund wrote: >> >> I've pushed v10.0. The big (and pretty painful to make) change is that >> now all the LLVM specific code lives in src/backend/jit/llvm, which is >> built as a shared library which is loaded on demand. > > > It does not seem to be possible build without LLVM anymore. Maybe I'm doing something wrong, but I also see some issues during compilation even with llvm included (with gcc 5.4.0 and 7.1.0). Is it expected, do I need to use another version to check it out? $ git rev-parse HEAD e24cac5951575cf86f138080acec663a0a05983e $ ./configure --prefix=/build/postgres-jit/ --with-llvm --enable-debug --enable-depend --enable-cassert In file included from llvmjit_error.cpp:22:0: /usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:47:36: warning: identifier 'nullptr' is a keyword in C++11 [-Wc++0x-compat] void *user_data = nullptr); ^ In file included from /usr/include/c++/5/cinttypes:35:0, from /usr/lib/llvm-5.0/include/llvm/Support/DataTypes.h:39, from /usr/lib/llvm-5.0/include/llvm-c/Types.h:17, from ../../../../src/include/jit/llvmjit.h:13, from llvmjit_error.cpp:24: /usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options. #error This file requires compiler and library support \ ^ In file included from llvmjit_error.cpp:22:0: /usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:47:54: error: 'nullptr' was not declared in this scope void *user_data = nullptr); ^ /usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:57:56: error: 'nullptr' was not declared in this scope void *user_data = nullptr) { ^ /usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:98:56: error: 'nullptr' was not declared in this scope void *user_data = nullptr); ^ /usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:121:45: error: 'nullptr' was not declared in this scope llvm_unreachable_internal(const char *msg = nullptr, const char *file = nullptr, ^ /usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:121:73: error: 'nullptr' was not declared in this scope llvm_unreachable_internal(const char *msg = nullptr, const char *file = nullptr, ^ ../../../../src/Makefile.global:838: recipe for target 'llvmjit_error.o' failed make[2]: *** [llvmjit_error.o] Error 1 make[2]: Leaving directory '/postgres/src/backend/jit/llvm' Makefile:42: recipe for target 'all-backend/jit/llvm-recurse' failed make[1]: *** [all-backend/jit/llvm-recurse] Error 2 make[1]: Leaving directory '/postgres/src' GNUmakefile:11: recipe for target 'all-src-recurse' failed make: *** [all-src-recurse] Error 2
Re: [HACKERS] A design for amcheck heapam verification
Hi, Peter! > 8 февр. 2018 г., в 4:56, Peter Geoghegan написал(а): > > * Faster modulo operations. > > * Removed sdbmhash(). Thanks! I definitely like how Bloom filter is implemented now. I could not understand meaning of this, but apparently this will not harm + /* +* Caller will probably use signed 32-bit pseudo-random number, so hash +* caller's value to get 64-bit seed value +*/ + filter->seed = DatumGetUInt64(hash_uint32_extended(seed, 0)); I do not see a reason behind hashing the seed. Also, I'd like to reformulate this paragraph. I understand what you want to say, but the sentence is incorrect. + * The Bloom filter behaves non-deterministically when caller passes a random + * seed value. This ensures that the same false positives will not occur from + * one run to the next, which is useful to some callers. Bloom filter behaves deterministically, but differently. This does not ensures any thing, but probably will give something with hight probability. Thanks! Best regards, Andrey Borodin.
Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
On Wed, Feb 7, 2018 at 9:47 PM, Robert Haas wrote: > On Mon, Jan 22, 2018 at 7:05 AM, Amit Kapila wrote: >> On error, workers should be terminated. What kind of problem do you >> have in mind? > > Hmm. Yeah, I guess that makes sense. If the only thing you can do is > fetch from the cursor -- and you have to make sure to lock down > protocol messages as well as SQL commands -- and if any error kills > the workers -- then I guess this might be workable. However, I think > there are still problems. Say the worker hits an error while the > leader is idle; how do we report the error? > I guess workers need to wait till leader become active and processes the error message. > It's a protocol version > for the leader to send an unsolicited ErrorResponse, which is why we > have to use FATAL rather than ERROR in various places for recovery > conflicts, query cancellation, etc. > > Also, if you're OK with not being able to do anything except fetch > from the cursor until you reach the end, then couldn't you just issue > the query without the cursor and use PQsetSingleRowMode? > I think there is a lot of cursor usage via plpgsql in which it could be useful. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
autovacuum: change priority of the vacuumed tables
Hi, Attached patch adds 'autovacuum_table_priority' to the current list of automatic vacuuming settings. It's used in sorting of vacuumed tables in autovacuum worker before actual vacuum. The idea is to give possibility to the users to prioritize their tables in autovacuum process. -- --- Regards, Ildus Kurbangaliev diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index c45979dee4..b7383a7136 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6250,6 +6250,21 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + autovacuum_table_priority (integer) + + autovacuum_table_priority configuration parameter + + + + +Specifies the priority of the table in automatic +VACUUM operations. 0 by default, bigger value gives +more priority. + + + + diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 46276ceff1..2476dfe943 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -291,6 +291,15 @@ static relopt_int intRelOpts[] = }, -1, -1, INT_MAX }, + { + { + "autovacuum_table_priority", + "Sets the priority of the table in autovacuum process", + RELOPT_KIND_HEAP | RELOPT_KIND_TOAST, + ShareUpdateExclusiveLock + }, + 0, INT_MIN, INT_MAX + }, { { "toast_tuple_target", @@ -1353,6 +1362,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind) offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, multixact_freeze_table_age)}, {"log_autovacuum_min_duration", RELOPT_TYPE_INT, offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, log_min_duration)}, + {"autovacuum_table_priority", RELOPT_TYPE_INT, + offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, priority)}, {"toast_tuple_target", RELOPT_TYPE_INT, offsetof(StdRdOptions, toast_tuple_target)}, {"autovacuum_vacuum_scale_factor", RELOPT_TYPE_REAL, diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 702f8d8188..174de4a08c 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -174,15 +174,22 @@ typedef struct avw_dbase PgStat_StatDBEntry *adw_entry; } avw_dbase; -/* struct to keep track of tables to vacuum and/or analyze, in 1st pass */ +/* struct for vacuumed or analyzed relation */ typedef struct av_relation +{ + Oid ar_relid; + int ar_priority; /* bigger- more important, used for sorting */ +} av_relation; + +/* struct to keep track of toast tables to vacuum and/or analyze, in 1st pass */ +typedef struct av_toastrelation { Oid ar_toastrelid; /* hash key - must be first */ Oid ar_relid; bool ar_hasrelopts; AutoVacOpts ar_reloptions; /* copy of AutoVacOpts from the main table's * reloptions, or NULL if none */ -} av_relation; +} av_toastrelation; /* struct to keep track of tables to vacuum and/or analyze, after rechecking */ typedef struct autovac_table @@ -1919,6 +1926,16 @@ get_database_list(void) return dblist; } +/* qsort comparator for av_relation, using priority */ +static int +autovac_comparator(const void *a, const void *b) +{ + av_relation *r1 = (av_relation *) lfirst(*(ListCell **) a); + av_relation *r2 = (av_relation *) lfirst(*(ListCell **) b); + + return r2->ar_priority - r1->ar_priority; +} + /* * Process a database table-by-table * @@ -1932,7 +1949,7 @@ do_autovacuum(void) HeapTuple tuple; HeapScanDesc relScan; Form_pg_database dbForm; - List *table_oids = NIL; + List *optables = NIL; List *orphan_oids = NIL; HASHCTL ctl; HTAB *table_toast_map; @@ -2021,7 +2038,7 @@ do_autovacuum(void) /* create hash table for toast <-> main relid mapping */ MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); - ctl.entrysize = sizeof(av_relation); + ctl.entrysize = sizeof(av_toastrelation); table_toast_map = hash_create("TOAST to main relid map", 100, @@ -2101,9 +2118,14 @@ do_autovacuum(void) effective_multixact_freeze_max_age, &dovacuum, &doanalyze, &wraparound); - /* Relations that need work are added to table_oids */ + /* Relations that need work are added to optables */ if (dovacuum || doanalyze) - table_oids = lappend_oid(table_oids, relid); + { + av_relation *rel = (av_relation *) palloc(sizeof(av_relation)); + rel->ar_relid = relid; + rel->ar_priority = relopts != NULL ? relopts->priority : 0; + optables = lappend(optables, rel); + } /* * Remember TOAST associations for the second pass. Note: we must do @@ -2112,7 +2134,7 @@ do_autovacuum(void) */ if (OidIsValid(classForm->reltoastrelid)) { - av_relation *hentry; + av_toastrelation *hentry; bool found; hentry = hash_search(table_toast_map, @@ -2168,7 +2190,7 @@ do_autovacuum(void) relopts = extract_autovac_opts(tuple, pg_cl
Re: PDF Builds on borka (Debian/stretch) broken - 9.6
Peter, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 2/7/18 17:21, Stephen Frost wrote: > > Looks like Nov 15 (which I believe is when the stretch upgrade was done) > > it was upgraded: > > > > 2017-11-15 17:38:55 upgrade openjade:amd64 1.4devel1-21.1 1.4devel1-21.3+b1 > > > > That doesn't look like a terribly large version bump to me tho.. > > You probably had the openjade1.3 package installed, and maybe it got > deleted or the alternative uninstalled during the upgrade. Thanks for that, yes, that got uninstalled but I've now re-installed it and the PDF builds for 9.6-9.3 are working and will be included in today's release. Thanks again! Stephen signature.asc Description: PGP signature
Re: [HACKERS] Partition-wise aggregation/grouping
Hi, In this attached version, I have rebased my changes over new design of partially_grouped_rel. The preparatory changes of adding partially_grouped_rel are in 0001. Also to minimize finalization code duplication, I have refactored them into two separate functions, finalize_sorted_partial_agg_path() and finalize_hashed_partial_agg_path(). I need to create these two functions as current path creation order in like, Sort Agg Path Sort Agg Path - Parallel Aware (Finalization needed here) Hash Agg Path Hash Agg Path - Parallel Aware (Finalization needed here) And if we club those finalizations together, then path creation order will be changed and it may result in the existing plan changes. Let me know if that's OK, I will merge them together as they are distinct anyways. These changes are part of 0002. 0003 - 0006 are refactoring patches as before. 0007 is the main patch per new design. I have removed create_partition_agg_paths() altogether as finalization code is reused. Also, renamed preferFullAgg with forcePartialAgg as we forcefully needed a partial path from nested level if the parent is doing a partial aggregation. add_single_path_to_append_rel() is no more exists and also there is no need to pass OtherUpperPathExtraData to add_paths_to_append_rel(). 0008 - 0009, testcase and postgres_fdw changes. Please have a look at new changes and let me know if I missed any. Thanks On Fri, Feb 2, 2018 at 7:29 PM, Robert Haas wrote: > On Fri, Feb 2, 2018 at 8:25 AM, Jeevan Chalke > wrote: > >> The problem is that create_partition_agg_paths() is doing *exactly* > >> same thing that add_paths_to_grouping_rel() is already doing inside > >> the blocks that say if (grouped_rel->partial_pathlist). We don't need > >> two copies of that code. Both of those places except to take a > >> partial path that has been partially aggregated and produce a > >> non-partial path that is fully aggregated. We do not need or want two > >> copies of that code. > > > > OK. Got it. > > > > Will try to find a common place for them and will also check how it goes > > with your suggested design change. > > > >> Here's another way to look at it. We have four kinds of things. > >> > >> 1. Partially aggregated partial paths > >> 2. Partially aggregated non-partial paths > >> 3. Fully aggregated partial paths > >> 4. Fully aggregated non-partial paths > > So in the new scheme I'm proposing, you've got a partially_grouped_rel > and a grouped_rel. So all paths of type #1 go into > partially_grouped_rel->partial_pathlist, paths of type #2 go into > partially_grouped_rel->pathlist, type #3 (if we have any) goes into > grouped_rel->partial_pathlist, and type #4 goes into > grouped_rel->pathlist. > > > add_paths_to_append_rel() -> generate_mergeappend_paths() does not > consider > > partial_pathlist. Thus we will never see MergeAppend over parallel scan > > given by partial_pathlist. And thus plan like: > > -> Gather Merge > > -> MergeAppend > > is not possible with current HEAD. > > > > Are you suggesting we should implement that here? I think that itself is > a > > separate task. > > Oh, I didn't realize that wasn't working already. I agree that it's a > separate task from this patch, but it's really too bad that it doesn't > already work. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company partition-wise-agg-v13.tar.gz Description: GNU Zip compressed data
Re: SSL test names
> On 07 Feb 2018, at 17:54, Peter Eisentraut > wrote: > I have found the old way very confusing while working with several > SSL-related patches recently. Agreed. I had similar, but way uglier, hacks in my Secure Transport branch. +1 on something like this. cheers ./daniel
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Tue, Feb 6, 2018 at 3:53 PM, Robert Haas wrote: > On Tue, Feb 6, 2018 at 2:11 PM, Tom Lane wrote: >> Robert Haas writes: >>> Unfortunately valgrind does not work at all on my laptop -- the server >>> appears to start, but as soon as you try to connect, the whole thing >>> dies with an error claiming that the startup process has failed. So I >>> can't easily test this at the moment. I'll try to get it working, >>> here or elsewhere, but thought I'd send the above reply first. >> >> Do you want somebody who does have a working valgrind installation >> (ie me) to take responsibility for pushing this patch? > > I committed it before seeing this. It probably would've been better > if you had done it, but I assume Peter tested it, so let's see what > the BF thinks. skink and lousyjack seem happy now, so I think it worked. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] logical decoding of two-phase transactions
Hi! Thanks for working on this patch. Reading through patch I’ve noticed that you deleted call to SnapBuildCommitTxn() in DecodePrepare(). As you correctly spotted upthread there was unnecessary code that marked transaction as running after decoding of prepare. However call marking it as committed before decoding of prepare IMHO is still needed as SnapBuildCommitTxn does some useful thing like setting base snapshot for parent transactions which were skipped because of SnapBuildXactNeedsSkip(). E.g. current code will crash in assert for following transaction: BEGIN; SAVEPOINT one; CREATE TABLE test_prepared_savepoints (a int); PREPARE TRANSACTION 'x'; COMMIT PREPARED 'x'; :get_with2pc_nofilter :get_with2pc_nofilter <- second call will crash decoder With following backtrace: frame #3: 0x00010dc47b40 postgres`ExceptionalCondition(conditionName="!(txn->ninvalidations == 0)", errorType="FailedAssertion", fileName="reorderbuffer.c", lineNumber=1944) at assert.c:54 frame #4: 0x00010d9ff4dc postgres`ReorderBufferForget(rb=0x7fe1ab832318, xid=816, lsn=35096144) at reorderbuffer.c:1944 frame #5: 0x00010d9f055c postgres`DecodePrepare(ctx=0x7fe1ab81b918, buf=0x7ffee2650408, parsed=0x7ffee2650088) at decode.c:703 frame #6: 0x00010d9ef718 postgres`DecodeXactOp(ctx=0x7fe1ab81b918, buf=0x7ffee2650408) at decode.c:310 That can be fixed by calling SnapBuildCommitTxn() in DecodePrepare() which I believe is safe because during normal work prepared transaction holds relation locks until commit/abort and in between nobody can access altered relations (or just I don’t know such situations — that was the reason why i had marked that xids as running in previous versions). > On 6 Feb 2018, at 15:20, Nikhil Sontakke wrote: > > Hi all, > >> >> PFA, patch which applies cleanly against latest git head. I also >> removed unwanted newlines and took care of the cleanup TODO about >> making ReorderBufferTXN structure using a txn_flags field instead of >> separate booleans for various statuses like has_catalog_changes, >> is_subxact, is_serialized etc. The patch uses this txn_flags field for >> the newer prepare related info as well. >> >> "make check-world" passes ok, including the additional regular and tap >> tests that we have added as part of this patch. >> > > PFA, latest version of this patch. > > This latest version takes care of the abort-while-decoding issue along > with additional test cases and documentation changes. > > -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: In logical replication concurrent update of partition key creates a duplicate record on standby.
On Wed, Feb 7, 2018 at 6:00 PM, Amit Kapila wrote: > On Wed, Feb 7, 2018 at 3:42 PM, amul sul wrote: >> On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar >> wrote: >>> On 7 February 2018 at 13:53, amul sul wrote: Hi, If an update of partition key involves tuple movement from one partition to another partition then there will be a separate delete on one partition and insert on the other partition made. In the logical replication if an update performed on the master and standby at the same moment, then replication worker tries to replicate delete + insert operation on standby. While replying master changes on standby for the delete operation worker will log "concurrent update, retrying" message (because the update on standby has already deleted) and move forward to reply the next insert operation. Standby update also did the same delete+insert is as part of the update of partition key in a result there will be two records inserted on standby. >>> >>> A quick thinking on how to resolve this makes me wonder if we can >>> manage to pass some information through logical decoding that the >>> delete is part of a partition key update. This is analogous to how we >>> set some information locally in the tuple by setting >>> tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber. >>> >> >> +1, >> > > I also mentioned the same thing in the other thread [1], but I think > that alone won't solve the dual record problem as you are seeing. I > think we need to do something for next insert as you are suggesting. > Can you please once check what was the behavior before Update Tuple routing patch (Commit-id: 2f178441) went in? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Proposal: partition pruning by secondary attributes
Hello, hackers! Sorry if this have already been discussed. I've had this idea some time ago and then successfully forgot about it until pgconf.ru, where I had a conversation with one of postgres users. His situation could be described as this: they have a table with id, timestamp and some other attributes, which is partitioned by (let's say) timestamp column. In different contexts they may want to filter the table either by id or by timestamp attribute (but not both). In this case pruning will only work for timestamp column. The idea is to store min and max values of secondary attributes (like 'id' in the example above) for each partition somewhere in catalog and use it for partition pruning along with partitioning key. You can think of it as somewhat like BRIN index but for partitions. And it will have similar limitations. For example, we may benefit if secondary attribute values are monotonically increase or decrease, but would be unhelpful if they are scattered, or if table wasn't partitioned by range. I wanted to ask community's opinion would it be worth considering. Thanks! -- Ildar Musin i.mu...@postgrespro.ru
Re: non-bulk inserts and tuple routing
(2018/02/07 19:36), Etsuro Fujita wrote: (2018/02/05 19:43), Etsuro Fujita wrote: (2018/02/05 14:34), Amit Langote wrote: Here is the updated version that contains two patches as described above. Here are some minor comments: o On changes to ExecInsert * This might be just my taste, but I think it would be better to (1) change ExecInitPartitionInfo so that it creates and returns a newly-initialized ResultRelInfo for an initialized partition, and (2) rewrite this bit: + /* Initialize partition info, if not done already. */ + ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate, + leaf_part_index); + /* * Save the old ResultRelInfo and switch to the one corresponding to * the selected partition. */ saved_resultRelInfo = resultRelInfo; resultRelInfo = proute->partitions[leaf_part_index]; + Assert(resultRelInfo != NULL); to something like this (I would say the same thing to the copy.c changes): /* * Save the old ResultRelInfo and switch to the one corresponding to * the selected partition. */ saved_resultRelInfo = resultRelInfo; resultRelInfo = proute->partitions[leaf_part_index]; if (resultRelInfo == NULL); { /* Initialize partition info. */ resultRelInfo = ExecInitPartitionInfo(mtstate, saved_resultRelInfo, proute, estate, leaf_part_index); } This would make ExecInitPartitionInfo more simple because it can assume that the given partition has not been initialized yet. o On changes to execPartition.h * Please add a brief decsription about partition_oids to the comments for this struct. @@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting { PartitionDispatch *partition_dispatch_info; int num_dispatch; + Oid*partition_oids; That's it. Best regards, Etsuro Fujita
Re: JIT compiling with LLVM v10.0
On 02/07/2018 03:54 PM, Andres Freund wrote: I've pushed v10.0. The big (and pretty painful to make) change is that now all the LLVM specific code lives in src/backend/jit/llvm, which is built as a shared library which is loaded on demand. It does not seem to be possible build without LLVM anymore. Error: In file included from planner.c:32:0: ../../../../src/include/jit/llvmjit.h:13:10: fatal error: llvm-c/Types.h: No such file or directory #include ^~~~ Options: ./configure --prefix=/home/andreas/dev/postgresql-inst --enable-tap-tests --enable-cassert --enable-debug I also noticed the following typo: diff --git a/configure.in b/configure.in index b035966c0a..b89c4a138a 100644 --- a/configure.in +++ b/configure.in @@ -499,7 +499,7 @@ fi if test "$enable_coverage" = yes; then if test "$GCC" = yes; then CFLAGS="$CFLAGS -fprofile-arcs -ftest-coverage" -CFLAGS="$CXXFLAGS -fprofile-arcs -ftest-coverage" +CXXFLAGS="$CXXFLAGS -fprofile-arcs -ftest-coverage" else AC_MSG_ERROR([--enable-coverage is supported only when using GCC]) fi Andreas
Re: [HACKERS] More stats about skipped vacuums
At Thu, 08 Feb 2018 18:04:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20180208.180415.112312013.horiguchi.kyot...@lab.ntt.co.jp> > > > I suggest we remove support for dynamic_shared_memory_type = none first, > > > and see if we get any complaints. If we don't, then future patches can > > > rely on it being present. > > > > If we remove it in v11, it'd still be maybe a year from now before we'd > > have much confidence from that alone that nobody cares. I think the lack > > of complaints about it in 9.6 and 10 is a more useful data point. > > So that means that we are assumed to be able to rely on the > existence of DSM at the present since over a year we had no > complain despite the fact that DSM is silently turned on? And > apart from that we are ready to remove 'none' from the options of > dynamic_shared_memory_type right now? I found the follwoing commit related to this. | commit d41ab71712a4457ed39d5471b23949872ac91def | Author: Robert Haas | Date: Wed Oct 16 09:41:03 2013 -0400 | | initdb: Suppress dynamic shared memory when probing for max_connections. | | This might not be the right long-term solution here, but it will | hopefully turn the buildfarm green again. | | Oversight noted by Andres Freund The discussion is found here. https://www.postgresql.org/message-id/ca+tgmoyhiigrcvssjhmbsebmof2zx_9_9rwd75cwvu99yrd...@mail.gmail.com I suppose that the problem has not been resolved yet.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] More stats about skipped vacuums
Hello, At Wed, 07 Feb 2018 16:59:20 -0500, Tom Lane wrote in <3246.1518040...@sss.pgh.pa.us> > Robert Haas writes: > > It seems to me that there was a thread where Tom proposed removing > > support for dynamic_shared_memory_type = none. > > I think you're recalling <32138.1502675...@sss.pgh.pa.us>, wherein > I pointed out that > > >>> Whether that's worth the trouble is debatable. The current code > >>> in initdb believes that every platform has some type of DSM support > >>> (see choose_dsm_implementation). Nobody's complained about that, > >>> and it certainly works on every buildfarm animal. So for all we know, > >>> dynamic_shared_memory_type = none is broken already. > > (That was in fact in the same thread Kyotaro-san just linked to about > reimplementing the stats collector.) > > It's still true that we've no reason to believe there are any supported > platforms that haven't got some sort of DSM. Performance might be a > different question, of course ... but it's hard to believe that > transferring stats through DSM wouldn't be better than writing them > out to files. Good to hear. Thanks. > > I suggest we remove support for dynamic_shared_memory_type = none first, > > and see if we get any complaints. If we don't, then future patches can > > rely on it being present. > > If we remove it in v11, it'd still be maybe a year from now before we'd > have much confidence from that alone that nobody cares. I think the lack > of complaints about it in 9.6 and 10 is a more useful data point. So that means that we are assumed to be able to rely on the existence of DSM at the present since over a year we had no complain despite the fact that DSM is silently turned on? And apart from that we are ready to remove 'none' from the options of dynamic_shared_memory_type right now? If I may rely on DSM, fallback stuff would not be required. > regards, tom lane regards, -- Kyotaro Horiguchi NTT Open Source Software Center