[HACKERS] Inlining functions with "expensive" parameters
All, As we try and make PostGIS more "parallel sensitive" we have been added costs to our functions, so that their relative CPU cost is more accurately reflected in parallel plans. This has resulted in an odd side effect: some of our "wrapper" functions stop giving index scans in plans [1]. This is a problem! An example of a "wrapper" is ST_Intersects(geom1, geom2). It combines an index operation (geom1 && geom2) with an exact spatial test (_ST_Intersects(geom1, geom2). This is primarily for user convenience, and has worked for us well for a decade and more. Having this construct stop working is definitely a problem. As we add costs to our functions, the odds increase that one of the parameters to a wrapper might be a costed function. It's not uncommon to see: ST_Interects(geom, ST_SetSRID('POLYGON(...)', 4326)) It's fair to say that we really do depend on our wrappers getting inlined basically all the time. They are simple functions, they do nothing other than 'SELECT func1() AND func2() AND arg1 && arg2'. However, once costs are added to the parameters, the inlining can be turned off relatively quickly. Here's a PgSQL native example: -- Create data table and index. Analyze. DROP TABLE IF EXISTS boxen; CREATE TABLE boxen AS SELECT row_number() OVER() As gid, box(point(x, y),point(x+1, y+1)) AS b, x, y FROM generate_series(-100,100) As y, generate_series(-100,100) As x; CREATE INDEX idx_b_geom_gist ON boxen USING gist(b); ANALYZE boxen; -- An inlined function -- When set 'STRICT' it breaks index access -- However 'IMMUTABLE' doesn't seem to bother it CREATE OR REPLACE FUNCTION good_box(box, box) RETURNS boolean AS 'SELECT $1 OPERATOR(&&) $2 AND length(lseg(point($1),point($2))) < 3' LANGUAGE 'sql'; -- Start with a low cost circle() ALTER FUNCTION circle(point, double precision) COST 1; -- [A] Query plan hits index EXPLAIN SELECT gid FROM boxen WHERE good_box( boxen.b, box(circle(point(20.5, 20.5), 2)) ); -- [B] Query plan hits index EXPLAIN SELECT gid FROM boxen, (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c WHERE good_box( boxen.b, box(circle(point(c.x, c.y), 2)) ); -- Increase cost of circle ALTER FUNCTION circle(point, double precision) COST 100; -- [B] Query plan does not hit index! EXPLAIN SELECT gid FROM boxen, (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c WHERE good_box( boxen.b, box(circle(point(c.x, c.y), 2)) ); The inlining is getting tossed out on a test of how expensive the function parameters are [2]. As a result, we lose what is really the correct plan, and get a sequence scan instead of an index scan. The test of parameter cost seems quite old (15+ years) and perhaps didn't anticipate highly variable individual function costs (or maybe it did). As it stands though, PostGIS is currently stuck choosing between having costs on our functions or having our inlined wrappers, because we cannot have both at the same time. Any thoughts? Thanks! P. [1] https://trac.osgeo.org/postgis/ticket/3675#comment:18 [2] https://github.com/postgres/postgres/blob/ae20b23a9e7029f31ee902da08a464d968319f56/src/backend/optimizer/util/clauses.c#L4581-L4584
Re: [HACKERS] Parallel Plans and Cost of non-filter functions
>From my perspective, this is much much better. For sufficiently large tables, I get parallel behaviour without jimmying with the defaults on parallel_setup_cost and parallel_tuple_cost. *And*, the parallel behaviour *is* sensitive to the costs of functions in target lists, so reasonably chosen costs will flip us into a parallel mode for expensive functions against smaller tables too. Hopefully some variant of this finds it's way into core! Is there any way I can productively help? P. On Sat, Nov 4, 2017 at 10:02 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Paul Ramsey <pram...@cleverelephant.ca> writes: > >>> Whether I get a parallel aggregate seems entirely determined by the > number > >>> of rows, not the cost of preparing those rows. > > > >> This is true, as far as I can tell and unfortunate. Feeding tables with > >> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no > >> matter how costly the work going on within. That's true of changing > costs > >> on the subquery select list, and on the aggregate transfn. > > > > This sounds like it might be the same issue being discussed in > > > > https://www.postgresql.org/message-id/flat/CAMkU= > 1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com > > > > I have rebased the patch being discussed on that thread. > > Paul, you might want to once check with the recent patch [1] posted on > the thread mentioned by Tom. > > [1] - https://www.postgresql.org/message-id/CAA4eK1%2B1H5Urm0_ > Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com >
Re: [HACKERS] Parallel Plans and Cost of non-filter functions
On Sat, Nov 4, 2017 at 10:02 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Paul Ramsey <pram...@cleverelephant.ca> writes: > >>> Whether I get a parallel aggregate seems entirely determined by the > number > >>> of rows, not the cost of preparing those rows. > > > >> This is true, as far as I can tell and unfortunate. Feeding tables with > >> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no > >> matter how costly the work going on within. That's true of changing > costs > >> on the subquery select list, and on the aggregate transfn. > > > > This sounds like it might be the same issue being discussed in > > > > https://www.postgresql.org/message-id/flat/CAMkU= > 1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com > > > Thanks Tom, Amit; yes, this issue (expensive things in target lists not affecting plans) seems like what I'm talking about in this particular case and something that shows up a lot in PostGIS use cases: a function on a target list like ST_Buffer() or ST_Intersection() will be a couple orders of magnitude more expensive than anything in the filters. > I have rebased the patch being discussed on that thread. > > Paul, you might want to once check with the recent patch [1] posted on > the thread mentioned by Tom. > > [1] - https://www.postgresql.org/message-id/CAA4eK1%2B1H5Urm0_ > Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com Awesome! I will compare and report back, Thanks much! P > > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com >
Re: [HACKERS] Parallel Plans and Cost of non-filter functions
Just clarifying myself a little, since I made a dumb error partway through. On Thu, Nov 2, 2017 at 10:09 AM, Paul Ramsey <pram...@cleverelephant.ca> wrote: > I'm working on a custom aggregate, that generates a serialized data > format. The preparation of the geometry before being formatted is pretty > intense, so it is probably a good thing for that work to be done in > parallel, in partial aggregates. Here's an example SQL call: > > EXPLAIN analyze > SELECT length(ST_AsMVT(a)) FROM ( > SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid, > fed_num > FROM pts_10 p > WHERE p.geom && ::geometry_literal > AND p.geom IS NOT NULL > ) a; > > The ST_AsMVTGeom() function can be comically expensive, it's really good > when it's in partial aggregates. But the cost of the function seems to be > ignored. > > (First note that, in order to consistently get parallel plans I have to > brutally suppress parallel_tuple_cost, as described here http://blog. > cleverelephant.ca/2017/10/parallel-postgis-2.html) > > Whether I get a parallel aggregate seems entirely determined by the number > of rows, not the cost of preparing those rows. > This is true, as far as I can tell and unfortunate. Feeding tables with 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no matter how costly the work going on within. That's true of changing costs on the subquery select list, and on the aggregate transfn. > When changing the number of rows in the subquery, with a LIMIT, I can > change from a seq scan to a paralllel seq scan and finally to a parallel > aggregate, as the number of rows goes up. > I see now that as soon as I brought the LIMIT in, the plans had to go sequential, just due to the nature of a LIMIT in a subquery. Ignore the below, sorry. Thanks! P > > An odd effect: when I have enough rows to get a paralllel seq scan, I get > flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That > seems odd and backwards. > > Is there anywhere a guide or rough description to how costs are used in > determining parallel plans? The empirical approach starts to wear one down > after a while :) > > P. > >
[HACKERS] Parallel Plans and Cost of non-filter functions
I'm working on a custom aggregate, that generates a serialized data format. The preparation of the geometry before being formatted is pretty intense, so it is probably a good thing for that work to be done in parallel, in partial aggregates. Here's an example SQL call: EXPLAIN analyze SELECT length(ST_AsMVT(a)) FROM ( SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid, fed_num FROM pts_10 p WHERE p.geom && ::geometry_literal AND p.geom IS NOT NULL ) a; The ST_AsMVTGeom() function can be comically expensive, it's really good when it's in partial aggregates. But the cost of the function seems to be ignored. (First note that, in order to consistently get parallel plans I have to brutally suppress parallel_tuple_cost, as described here http://blog.cleverelephant.ca/2017/10/parallel-postgis-2.html) Whether I get a parallel aggregate seems entirely determined by the number of rows, not the cost of preparing those rows. When changing the number of rows in the subquery, with a LIMIT, I can change from a seq scan to a paralllel seq scan and finally to a parallel aggregate, as the number of rows goes up. An odd effect: when I have enough rows to get a paralllel seq scan, I get flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That seems odd and backwards. Is there anywhere a guide or rough description to how costs are used in determining parallel plans? The empirical approach starts to wear one down after a while :) P.
Re: [HACKERS] generate_series regression 9.6->10
Thanks Tom. This showed up in a regression test of ours that built the test data using generate_series, so it's not a critical production issue or anything, just a surprise change in behaviour. P. On Wed, May 24, 2017 at 10:28 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Paul Ramsey <pram...@cleverelephant.ca> writes: > > The behaviour of generate_series seems to have changed a little, at least > > in conjunction w/ CTEs. > > What's changed is the behavior of multiple SRFs in a SELECT's targetlist, > cf > > https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff= > 69f4b9c85f168ae006929eec44fc44d569e846b9 > > specifically this comment: > > While moving SRF evaluation to ProjectSet would allow to retain the old > "least common multiple" behavior when multiple SRFs are present in one > targetlist (i.e. continue returning rows until all SRFs are at the > end of > their input at the same time), we decided to instead only return rows > till > all SRFs are exhausted, returning NULL for already exhausted ones. We > deemed the previous behavior to be too confusing, unexpected and > actually > not particularly useful. > > I see the current v10 release notes have failed miserably at documenting > this :-(. Will try to improve that. > > regards, tom lane >
[HACKERS] generate_series regression 9.6->10
The behaviour of generate_series seems to have changed a little, at least in conjunction w/ CTEs. Under 9.6 (and prior) this query returns 2127 rows, with no nulls: with ij as ( select i, j from generate_series(1, 10) i, generate_series(1, 10) j), iijj as (select generate_series(1, i) as a, generate_series(1, j) b from ij) select a, b from iijj; Under 10, it returns only 715 rows, with many nulls.
Re: [HACKERS] Retiring from the Core Team
On Wed, Jan 11, 2017 at 4:29 PM, Josh Berkuswrote: > > For that reason, as of today, I am stepping down from the PostgreSQL > Core Team. > Massive thanks Josh, you've been a great advocate and a wonderful bridge into the PgSQL community for those of us finding our way towards the warm centre of PgSQL. ATB, P
Re: [HACKERS] User-defined Operator Pushdown and Collations
On Sun, Nov 27, 2016 at 11:50 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Paul Ramsey <pram...@cleverelephant.ca> writes: > > On Sun, Nov 27, 2016 at 9:31 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Why doesn't hs_fdw.h have a collation? > > > I think I'm missing something, I cannot find a file like that anywhere. > > I was referring to the variable shown in your EXPLAIN. > Ah right. Why would hs_fdw.h have a collation, it's an hstore. It's not declared with a collation (the CREATE TYPE call doesn't set the COLLATEABLE attribue to true). Again my ignorance is running ahead of me: does every object in the database necessarily have a collation? CREATE FOREIGN TABLE hs_fdw ( id integer, h hstore collate "en_CA.UTF-8") server foreign_server OPTIONS (table_name 'hs'); ERROR: collations are not supported by type hstore > > With respect to this particular example, is this a case of a very large > > collation hammer getting in the way? Both '->' and '=' are operators that > > would be unaffected by collation, right? They are both just > equality-based > > tests. But all operators are getting tested for coherent collation > > behaviour, so they get caught up in the net? > > Yeah, we don't know whether the operator actually cares about its input > collation. It'd be possible to be much more liberal if we knew it did > not, but such labeling was not included in the design for the collation > facility. That might've been a mistake ... > In this case the hammer seems very large, since only one side of the '<-' operator is even collatable. Mind you, if it *did* work it would still bubble up to a case of 'text = text' at the top node, so the problem would still remain. Although it seems unfair: I can definite declare a table with a text column and run a query with Const = Var and it'll ship that OpExpr over, which seems no more fishy than what I'm asking the hstore to do. hstore=# explain (verbose) select * from txt_fdw where txt = 'this'; QUERY PLAN --- Foreign Scan on public.txt_fdw (cost=100.00..127.20 rows=7 width=36) Output: id, txt Remote SQL: SELECT id, txt FROM public.txt WHERE ((txt = 'this'::text)) (3 rows) The fdw table can't know much about what the remote collation is, it only knows what I've told it locally which is that it's the default, so everything matches up. Sounds like the problem is hstore lacking a collation? Or, that lacking a collation is not considered equivalent to the default in the testing code. P P > > regards, tom lane >
Re: [HACKERS] User-defined Operator Pushdown and Collations
On Sun, Nov 27, 2016 at 9:31 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Paul Ramsey <pram...@cleverelephant.ca> writes: > > On Fri, Nov 25, 2016 at 11:30 AM, Paul Ramsey <pram...@cleverelephant.ca > > > > wrote: > >> I've been trying to figure out an issue with operators not being pushed > >> down for user defined types, in this case "hstore". TL;DR: > >> > >> hstore=# explain (verbose) select * from hs_fdw where h -> 'a' = '1'; > >> QUERY PLAN > >> -- > >> Foreign Scan on public.hs_fdw (cost=100.00..157.78 rows=7 width=36) > >> Output: id, h > >> Filter: ((hs_fdw.h -> 'a'::text) = '1'::text) > >> Remote SQL: SELECT id, h FROM public.hs > >> (4 rows) > >> > >> In terms of "shippability" the "->" operator passes fine. It ends up not > >> being shipped because its collation bubbles up as FDW_COLLATE_NONE, and > >> gets kicked back as not deparseable around here: > >> > >> https://github.com/postgres/postgres/blob/ > 4e026b32d4024b03856b4981b26c74 > >> 7b7fef7afb/contrib/postgres_fdw/deparse.c#L499 > > > I'm finding this piece of code a little suspect, but that may just be my > > not fully understanding why/what determines when a collation is > shippable. > > > In the case of my example above, the OpExpr '->' has an input collation > of > > 100 (DEFAULT_COLLATION_ID). The Var below has a collation of 0 > (InvalidOid) > > and state of FDW_COLLATE_NONE, and the Const has collation of 100 > > (DEFAULT_COLLATION_ID ) and state of FDW_COLLATE_NONE. > > Why doesn't hs_fdw.h have a collation? > I think I'm missing something, I cannot find a file like that anywhere. > The intuition behind the rules in this area is that we'll only push down > expressions whose collation is traceable to a foreign Var. Assuming that > you've correctly declared your foreign table with column collations that > match the column collations of the real table on the remote server, this > should ensure that you get the same collation-dependent behavior as you > would have gotten locally. In this example, the expression's collation > behavior would be per DEFAULT_COLLATION_ID on both servers ... but they > might have different default collations. OK, so there's a potential workaround w/ explicitly declared collations, I am hearing? With respect to this particular example, is this a case of a very large collation hammer getting in the way? Both '->' and '=' are operators that would be unaffected by collation, right? They are both just equality-based tests. But all operators are getting tested for coherent collation behaviour, so they get caught up in the net? Thanks! P > So without this rule the > expression would be pushed down and could then give different results. > > regards, tom lane >
Re: [HACKERS] User-defined Operator Pushdown and Collations
On Fri, Nov 25, 2016 at 11:30 AM, Paul Ramsey <pram...@cleverelephant.ca> wrote: > I've been trying to figure out an issue with operators not being pushed > down for user defined types, in this case "hstore". TL;DR: > > hstore=# explain (verbose) select * from hs_fdw where h -> 'a' = '1'; > QUERY PLAN > -- > Foreign Scan on public.hs_fdw (cost=100.00..157.78 rows=7 width=36) >Output: id, h >Filter: ((hs_fdw.h -> 'a'::text) = '1'::text) >Remote SQL: SELECT id, h FROM public.hs > (4 rows) > > In terms of "shippability" the "->" operator passes fine. It ends up not > being shipped because its collation bubbles up as FDW_COLLATE_NONE, and > gets kicked back as not deparseable around here: > > https://github.com/postgres/postgres/blob/4e026b32d4024b03856b4981b26c74 > 7b7fef7afb/contrib/postgres_fdw/deparse.c#L499 > > I'm finding this piece of code a little suspect, but that may just be my not fully understanding why/what determines when a collation is shippable. In the case of my example above, the OpExpr '->' has an input collation of 100 (DEFAULT_COLLATION_ID). The Var below has a collation of 0 (InvalidOid) and state of FDW_COLLATE_NONE, and the Const has collation of 100 (DEFAULT_COLLATION_ID ) and state of FDW_COLLATE_NONE. In other parts of the code, the default collation and collection 0 are treated as both leading to a state of FDW_COLLATE_NONE. But the OpExpr instead of bubbling FDW_COLLATE_NONE up the chain, returns false and ends the shippability of the Node. What are the issues around shipping nodes of different collations to the remote? All the nodes in my example are foreign Var, or local Const, and all either are collation 0 or DEFAULT_COLLATION_ID. Surely it should be shippable? P > I'm still working at wrapping my head around why this is good or not, but > if there's an obvious explanation and/or workaround, I'd love to know. > > Thanks! > > P > >
[HACKERS] User-defined Operator Pushdown and Collations
I've been trying to figure out an issue with operators not being pushed down for user defined types, in this case "hstore". TL;DR: hstore=# explain (verbose) select * from hs_fdw where h -> 'a' = '1'; QUERY PLAN -- Foreign Scan on public.hs_fdw (cost=100.00..157.78 rows=7 width=36) Output: id, h Filter: ((hs_fdw.h -> 'a'::text) = '1'::text) Remote SQL: SELECT id, h FROM public.hs (4 rows) In terms of "shippability" the "->" operator passes fine. It ends up not being shipped because its collation bubbles up as FDW_COLLATE_NONE, and gets kicked back as not deparseable around here: https://github.com/postgres/postgres/blob/4e026b32d4024b03856b4981b26c747b7fef7afb/contrib/postgres_fdw/deparse.c#L499 I'm still working at wrapping my head around why this is good or not, but if there's an obvious explanation and/or workaround, I'd love to know. Thanks! P
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 8:59 AM, Jim Nasby <jim.na...@bluetreble.com> wrote: > On 6/1/16 10:03 AM, Paul Ramsey wrote: >> >> We don't depend on these, we have our own :/ >> The real answer for a GIS system is to have an explicit tolerance >> parameter for calculations like distance/touching/containment, but >> unfortunately we didn't do that so now we have our own >> compatibility/boil the ocean problem if we ever wanted/were funded to >> add one. > > > Well it sounds like what's currently happening in Postgres is probably going > to change, so how might we structure that to help PostGIS? Would simply > lopping off the last few bits of the significand/mantissa work, or is that > not enough when different GRSes are involved? PostGIS doesn't look at all at what the PgSQL geotypes do, so go forward w/o fear. Tolerance in geo world is more than vertex rounding though, it's things like saying that when distance(pt,line) < epsilon then distance(pt,line) == 0, or similarly for shape touching, etc. One of the things people find annoying about postgis is that ST_Intersects(ST_Intersection(a, b), a) can come out as false (a derived point at a crossing of lines may not exactly intersect either of the input lines), which is a direct result of our use of exact math for the boolean intersects test. Anyways, go forth and do whatever makes sense for PgSQL P > > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 7:52 AM, Jim Nasbywrote: > > I suspect another wrinkle here is that in the GIS world a single point can > be represented it multiple reference/coordinate systems, and it would have > different values in each of them. AIUI the transforms between those systems > can be rather complicated if different projection methods are involved. I > don't know if PostGIS depends on what these macros are doing or not. If it > doesn't, perhaps it would be sufficient to lop of the last few bits of the > significand. ISTM that'd be much better than what the macros currently do. We don't depend on these, we have our own :/ The real answer for a GIS system is to have an explicit tolerance parameter for calculations like distance/touching/containment, but unfortunately we didn't do that so now we have our own compatibility/boil the ocean problem if we ever wanted/were funded to add one. P. > BTW, I suspect the macro values were chosen specifically for dealing with > LAT/LONG. > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Queries and PostGIS
On Fri, Apr 22, 2016 at 11:44 AM, Stephen Frost <sfr...@snowman.net> wrote: > Paul, > > * Paul Ramsey (pram...@cleverelephant.ca) wrote: >> On Mon, Mar 28, 2016 at 9:45 AM, Stephen Frost <sfr...@snowman.net> wrote: >> > Would you agree that it'd be helpful to have for making the st_union() >> > work better in parallel? >> >> For our particular situation w/ ST_Union, yes, it would be ideal to be >> able to run a worker-side combine function as well as the master-side >> one. Although the cascaded union would be less effective spread out >> over N nodes, doing it only once per worker, rather than every N >> records would minimize the loss of effectiveness. > > I chatted with Robert a bit about this and he had an interesting > suggestion. I'm not sure that it would work for you, but the > serialize/deserialize functions are used to transfer the results from > the worker process to the main process. You could possibly do the > per-worker finalize work in the serialize function to get the benefit of > running that in parallel. > > You'll need to mark the aggtranstype as 'internal' to have the > serialize/deserialize code called. Hopefully that's not too much of an > issue. Thanks Stephen. We were actually thinking that it might make more sense to just do the parallel processing in our own threads in the finalfunc. Not as elegant and magical as bolting into the PgSQL infra, but if we're doing something hacky anyways, might as well be our own hacky. ATB, P > > Thanks! > > Stephen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol buffer support for Postgres
On Tue, Apr 26, 2016 at 6:40 AM, Tom Lanewrote: > Craig Ringer writes: >> On 26 April 2016 at 14:06, 陈天舟 wrote: >>> (1) Since each protocol buffer column requires a schema. I am not sure >>> where is the best place to store that schema info. Should it be in a >>> CONSTRAINT (but I am not able to find the doc referring any custom >>> constraint), or should it be in the COMMENT or somewhere else? > >> I can't really imagine how you'd do that without adding a new catalog like >> we have for enum members. A typmod isn't sufficient since you need a whole >> lot more than an integer, and typmods aren't tracked throughout the server >> that well. > >> That'll make it hard to do it with an extension. > > PostGIS manages to reference quite a lot of schema-like information via > a geometry column's typmod. Maybe there's a reason why their approach > wouldn't be a good fit for this, but it'd be worth investigating. We pack a short type number, two flags and 24 bits of SRID number into an integer. The SRID number is in turn a foreign key into the spatial_ref_sys table where the fully spelled out spatial reference definition lives. It's not very nice, and it's quite breakable since there's no foreign key integrity between the typmod and the spatial_ref_sys pk. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Choosing parallel_degree
On Fri, Apr 8, 2016 at 9:06 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > On 8 April 2016 at 17:00, Paul Ramsey <pram...@cleverelephant.ca> wrote: >> >> On Fri, Apr 8, 2016 at 8:23 AM, Robert Haas <robertmh...@gmail.com> wrote: >> > On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapil...@gmail.com> >> > wrote: >> >> Other than that, patch looks good and I have marked it as Ready For >> >> Committer. Hope, we get this for 9.6. >> > >> > Committed. I think this is likely to make parallel query >> > significantly more usable in 9.6. >> >> I'm kind of worried that it will make it yet less usable for PostGIS, >> since approaches that ignore costs in favour of relpages will >> dramatically under-resource our queries. I can spin a query for >> multiple seconds on a table with less than 100K records, not even >> trying very hard. > > Doesn't sound good. I admit, it's not a "usual" database thing, but it's right in the meaty middle of use cases that parallelism can crushingly awesomely defeat. It's also probably not too unusual for extension use cases, where complex data are held in user defined types, whether they be image fragments, music samples, genetic data, raster data or LIDAR point clouds. PostGIS is just one voice of many in the Symphony of Crazy Shit in the Database. >> Functions have very unequal CPU costs, and we're talking here about >> using CPUs more effectively, why are costs being given the see-no-evil >> treatment? This is as true in core as it is in PostGIS, even if our >> case is a couple orders of magnitude more extreme: a filter based on a >> complex combination of regex queries will use an order of magnitude >> more CPU than one that does a little math, why plan and execute them >> like they are the same? > > Functions have user assignable costs. We have done a relatively bad job of globally costing our functions thus far, because it mostly didn't make any difference. In my testing [1], I found that costing could push better plans for parallel sequence scans and parallel aggregates, though at very extreme cost values (1000 for sequence scans and 1 for aggregates) Obviously, if costs can make a difference for 9.6 and parallelism we'll rigorously ensure we have good, useful costs. I've already costed many functions in my parallel postgis test branch [2]. Perhaps the avoidance of cost so far is based on the relatively nebulous definition it has: about the only thing in the docs is "If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary." So what about C functions then? Should a string comparison be 5 and a multiplication 1? An image histogram 1000? >> As it stands now, it seems like out of the box PostGIS users will >> actually not see much benefit from parallelism unless they manhandle >> their configuration settings to force it. > > Does this concern apply to this patch, or to the general situation for 9.6. Insofar as the patch is throttling how many parallel workers you get based solely on your relsize, it does concern this patch, but it's a general issue in both the extreme and not obviously related costings needed to trip parallel sequence and parallel aggregate plans. The parallel join seems to not take function/operator costs into account at all [3], at least I couldn't plump up a high enough cost to trip it without also adjusting the global parallel tuple cost configuration. I've seen a number of asides to the effect that "yes, costs are important, but we probably can't do anything about that for 9.6" in parallel patch threads, including this one, so I'm getting concerned that the core improvement we've been hoping for for years won't actually address our use cases when it is first released. That may just be the way it is, c'est la vie, but it would be unfortunate. P [1] http://blog.cleverelephant.ca/2016/03/parallel-postgis.html [2] https://github.com/pramsey/postgis/tree/parallel [3] http://blog.cleverelephant.ca/2016/03/parallel-postgis-joins.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Choosing parallel_degree
On Fri, Apr 8, 2016 at 8:23 AM, Robert Haaswrote: > On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila wrote: >> Other than that, patch looks good and I have marked it as Ready For >> Committer. Hope, we get this for 9.6. > > Committed. I think this is likely to make parallel query > significantly more usable in 9.6. I'm kind of worried that it will make it yet less usable for PostGIS, since approaches that ignore costs in favour of relpages will dramatically under-resource our queries. I can spin a query for multiple seconds on a table with less than 100K records, not even trying very hard. Functions have very unequal CPU costs, and we're talking here about using CPUs more effectively, why are costs being given the see-no-evil treatment? This is as true in core as it is in PostGIS, even if our case is a couple orders of magnitude more extreme: a filter based on a complex combination of regex queries will use an order of magnitude more CPU than one that does a little math, why plan and execute them like they are the same? As it stands now, it seems like out of the box PostGIS users will actually not see much benefit from parallelism unless they manhandle their configuration settings to force it. ATB, P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Queries and PostGIS
On Tue, Mar 29, 2016 at 12:51 PM, Paul Ramsey <pram...@cleverelephant.ca> wrote: > On Tue, Mar 29, 2016 at 12:48 PM, Paul Ramsey <pram...@cleverelephant.ca> > wrote: > >>> On the join case, I wonder if it's possible that _st_intersects is not >>> marked parallel-safe? If that's not the problem, I don't have a >>> second guess, but the thing to do would be to figure out whether >>> consider_parallel is false for the RelOptInfo corresponding to either >>> of pd and pts, or whether it's true for both but false for the >>> joinrel's RelOptInfo, or whether it's true for all three of them but >>> you don't get the desired path anyway. >> >> _st_intersects is definitely marked parallel safe, and in fact will >> generate a parallel plan if used alone (without the operator though, >> it's impossibly slow). It's the && operator that is the issue... and I >> just noticed that the PROCEDURE bound to the && operator >> (geometry_overlaps) is *not* marked parallel safe: could be the >> problem? > > Asked and answered: marking the geometry_overlaps as parallel safe > gets me a parallel plan! Now to play with costs and see how it behaves > when force_parallel_mode is not set. For the record I can get a non-forced parallel join plan, *only* if I reduce the parallel_join_cost by a factor of 10, from 0.1 to 0.01. http://blog.cleverelephant.ca/2016/03/parallel-postgis-joins.html This seems non-optimal. No amount of cranking up the underlying function COST seems to change this, perhaps because the join cost is entirely based on the number of expected tuples in the join relation? In general it seems like function COST values have been considered a relatively unimportant input to planning in the past, but with parallel processing it seems like they are now a lot more determinative about what makes a good plan. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Queries and PostGIS
On Tue, Mar 29, 2016 at 1:14 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Mar 29, 2016 at 3:48 PM, Paul Ramsey <pram...@cleverelephant.ca> > wrote: >>> I have no idea why the cost adjustments that you need are different >>> for the scan case and the aggregate case. That does seem problematic, >>> but I just don't know why it's happening. >> >> What might be a good way to debug it? Is there a piece of code I can >> look at to try and figure out the contribution of COST in either case? > > Well, the cost calculations are mostly in costsize.c, but I dunno how > much that helps. Maybe it would help if you posted some EXPLAIN > ANALYZE output for the different cases, with and without parallelism? > > One thing I noticed about this output (from your blog)... > > Finalize Aggregate > (cost=16536.53..16536.79 rows=1 width=8) > (actual time=2263.638..2263.639 rows=1 loops=1) >-> Gather >(cost=16461.22..16461.53 rows=3 width=32) >(actual time=754.309..757.204 rows=4 loops=1) > Number of Workers: 3 > -> Partial Aggregate > (cost=15461.22..15461.23 rows=1 width=32) > (actual time=676.738..676.739 rows=1 loops=4) >-> Parallel Seq Scan on pd >(cost=0.00..13856.38 rows=64 width=2311) >(actual time=3.009..27.321 rows=42 loops=4) > Filter: (fed_num = 47005) > Rows Removed by Filter: 17341 > Planning time: 0.219 ms > Execution time: 2264.684 ms > > ...is that the finalize aggregate phase is estimated to be very cheap, > but it's actually wicked expensive. We get the results from the > workers in only 750 ms, but it takes another second and a half to > aggregate those 4 rows??? This is probably a vivid example of the bad behaviour of the naive union approach. If we have worker states 1,2,3,4 and we go combine(combine(combine(1,2),3),4) Then we get kind of a worst case complexity situation where we three times union an increasingly complex object on the left with a simpler object on the right. Also, if the objects went into the transfer functions in relatively non-spatially correlated order, the polygons coming out of the transfer functions could be quite complex, and each merge would only add complexity to the output until the final merge which melts away all the remaining internal boundaries. I'm surprised it's quite so awful at the end though, and less awful in the worker stage... how do the workers end up getting rows to work on? 1,2,3,4,1,2,3,4,1,2,3,4? or 1,1,1,2,2,2,3,3,3,4,4,4? The former could result in optimally inefficient unions, given a spatially correlated input (surprisingly common in load-once GIS tables) P. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Queries and PostGIS
On Tue, Mar 29, 2016 at 12:48 PM, Paul Ramsey <pram...@cleverelephant.ca> wrote: >> On the join case, I wonder if it's possible that _st_intersects is not >> marked parallel-safe? If that's not the problem, I don't have a >> second guess, but the thing to do would be to figure out whether >> consider_parallel is false for the RelOptInfo corresponding to either >> of pd and pts, or whether it's true for both but false for the >> joinrel's RelOptInfo, or whether it's true for all three of them but >> you don't get the desired path anyway. > > _st_intersects is definitely marked parallel safe, and in fact will > generate a parallel plan if used alone (without the operator though, > it's impossibly slow). It's the && operator that is the issue... and I > just noticed that the PROCEDURE bound to the && operator > (geometry_overlaps) is *not* marked parallel safe: could be the > problem? Asked and answered: marking the geometry_overlaps as parallel safe gets me a parallel plan! Now to play with costs and see how it behaves when force_parallel_mode is not set. P. > > Thanks, > > P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Queries and PostGIS
> First, I beg to differ with this statement: "Some of the execution > results output are wrong! " The point is that > line has loops=4, so as in any other case where loops>1, you're seeing > the number of rows divided by the number of loops. It is the > *average* number of rows that were processed by each loop - one loop > per worker, in this case. Thanks for the explanation, let my reaction be a guide to what the other unwashed will think :) > Now, on to your actual question: > > I have no idea why the cost adjustments that you need are different > for the scan case and the aggregate case. That does seem problematic, > but I just don't know why it's happening. What might be a good way to debug it? Is there a piece of code I can look at to try and figure out the contribution of COST in either case? > On the join case, I wonder if it's possible that _st_intersects is not > marked parallel-safe? If that's not the problem, I don't have a > second guess, but the thing to do would be to figure out whether > consider_parallel is false for the RelOptInfo corresponding to either > of pd and pts, or whether it's true for both but false for the > joinrel's RelOptInfo, or whether it's true for all three of them but > you don't get the desired path anyway. _st_intersects is definitely marked parallel safe, and in fact will generate a parallel plan if used alone (without the operator though, it's impossibly slow). It's the && operator that is the issue... and I just noticed that the PROCEDURE bound to the && operator (geometry_overlaps) is *not* marked parallel safe: could be the problem? Thanks, P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Queries and PostGIS
On Mon, Mar 28, 2016 at 9:18 AM, Paul Ramsey <pram...@cleverelephant.ca> wrote: > Parallel join would be a huge win, so some help/pointers on figuring > out why it's not coming into play when our gist operators are in > effect would be helpful. Robert, do you have any pointers on what I should look for to figure out why the parallel join code doesn't fire if I add a GIST operator to my join condition? Thanks, P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Queries and PostGIS
On Mon, Mar 28, 2016 at 9:45 AM, Stephen Frost <sfr...@snowman.net> wrote: > Paul, > > * Paul Ramsey (pram...@cleverelephant.ca) wrote: >> I spent some time over the weekend trying out the different modes of >> parallel query (seq scan, aggregate, join) in combination with PostGIS >> and have written up the results here: >> >> http://blog.cleverelephant.ca/2016/03/parallel-postgis.html > > Neat! > > Regarding aggregate parallelism and the cascaded union approach, though > I imagine in other cases as well, it seems like having a > "final-per-worker" function for aggregates would be useful. > > Without actually looking at the code at all, it seems like that wouldn't > be terribly difficult to add. > > Would you agree that it'd be helpful to have for making the st_union() > work better in parallel? For our particular situation w/ ST_Union, yes, it would be ideal to be able to run a worker-side combine function as well as the master-side one. Although the cascaded union would be less effective spread out over N nodes, doing it only once per worker, rather than every N records would minimize the loss of effectiveness. P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Queries and PostGIS
I spent some time over the weekend trying out the different modes of parallel query (seq scan, aggregate, join) in combination with PostGIS and have written up the results here: http://blog.cleverelephant.ca/2016/03/parallel-postgis.html The TL:DR; is basically * With some adjustments to function COST both parallel sequence scan and parallel aggregation deliver very good parallel performance results. * The cost adjustments for sequence scan and aggregate scan are not consistent in magnitude. * Parallel join does not seem to work for PostGIS indexes yet, but perhaps there is some magic to learn from PostgreSQL core on that. The two findings at the end are ones that need input from parallel query masters... We recognize we'll have to adjust costs to that our particular use case (very CPU-intensive calculation per function) is planned better, but it seems like different query modes are interpreting costs in order-of-magnitude different ways in building plans. Parallel join would be a huge win, so some help/pointers on figuring out why it's not coming into play when our gist operators are in effect would be helpful. Happy Easter to you all, P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Sun, Mar 13, 2016 at 7:31 PM, David Rowleywrote: > On 14 March 2016 at 14:52, James Sewell wrote: >> One question - how is the upper limit of workers chosen? > > See create_parallel_paths() in allpaths.c. Basically the bigger the > relation (in pages) the more workers will be allocated, up until > max_parallel_degree. Does the cost of the aggregate function come into this calculation at all? In PostGIS land, much smaller numbers of rows can generate loads that would be effective to parallelize (worker time much >> than startup cost). P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanded Objects and Order By
Thank the Maker, it is reproduceable: returning an expanded header in the _in function is not appreciated in a very narrow number of cases. Edit arrayfuncs.c:array_in(), change the return line thus: // PG_RETURN_ARRAYTYPE_P(retval); PG_RETURN_DATUM(expand_array(PointerGetDatum(retval), CurrentMemoryContext, my_extra)); And here is a small test case that exercises it: CREATE TABLE orderby_expanded ( id integer, a integer[] ); INSERT INTO orderby_expanded (id, a) VALUES (1, ARRAY[1,2]); -- works SELECT id, a FROM orderby_expanded ORDER BY a = '{1,2}'::integer[]; -- ERROR: could not find pathkey item to sort SELECT id FROM orderby_expanded ORDER BY a = '{1,2}'::integer[]; -- works SELECT id FROM orderby_expanded ORDER BY a = ARRAY[1,2]; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Expanded Objects and Order By
I'm starting to think this might not actually be my mistake, but be a very narrow issue w/ the expanded object code. So, I've added support for converting postgis in-memory objects into expanded outputs, and have overwritten the usual lwgeom_to_gserialized() function with one that creates an expanded object. I haven't done anything to actually handle expanded objects on input, but as I understand it, that's fine, everything is supposed to work as normal when handed expanded objects. And thus far, that has been true, almost all regression tests work fine. But here's my narrow case. This works fine with the old lwgeom_to_gserialized that uses standard flat varlena outputs. It does the following w/ expanded outputs. SELECT gid FROM test_table ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5; ERROR: could not find pathkey item to sort Tracing through the debugger, I see the lwgeom_to_gserialized() function get hit once for the geometry literal, via parse_analyze. After that, the error shows up shortly. If I change the query ever so slightly, adding a geometry output column: SELECT gid, geom FROM test_table ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5; It runs through to completion, five records returned. As long as the query includes a geometry output on the output line, it works. A function that uses a geometry, but not doesn't actually output it, will still fail, with the same pathkey error. This will fail, for example. SELECT gid, ST_AsText(geom) FROM test_table ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5; So I'm wondering what I should debug next. My code is hardly being touched at all at this point, and I'm very confident it's memory clean, etc (the flattener raises hell if the output is shorter or longer than the expected allocation). Any pointers as to what to look for, much appreciated. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanded Objects and Order By
I have a size/flatten callback setup (and they are very careful not to write outside their boundaries), so that’s all OK. Since you’re not seeing anything “aha” in the error pattern, I’ll go back to the mats on memory… is there a good page on valgriding postgresql? I thought the memory manager papered over things so much that valgrind couldn’t see what was going on under the covers. Thanks! P > On Jan 18, 2016, at 8:36 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Paul Ramsey <pram...@cleverelephant.ca> writes: >> So, I've added support for converting postgis in-memory objects into >> expanded outputs, and have overwritten the usual >> lwgeom_to_gserialized() function with one that creates an expanded >> object. I haven't done anything to actually handle expanded objects on >> input, but as I understand it, that's fine, everything is supposed to >> work as normal when handed expanded objects. And thus far, that has >> been true, almost all regression tests work fine. > > Hmm ... you do need to be able to flatten them again. In the given > example, the parser is going to want to form a Const node whose Datum > value is a geometry object, and that Const node needs to be copiable > by copyObject(), which means datumCopy() has to work, and if you look > at that it will exercise EOH_get_flat_size/EOH_flatten_into when the > input routine originally produced an expanded object. > > The error message is very strange; it's hard to see how toying with the > internal representation of Consts could cause that. I think the > hypothesis of a memory clobber is stronger than you give it credit for, > especially since you see the behavior changing for seemingly unrelated > reasons. Valgrind might be a useful tool here. > > regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Expanded Object Header and Flat Cache
I've been working through the expanded object code to try and get a demonstration of it working with PostGIS (still having some problems, but it's a learning experience). On an unrelated from, I noticed in the array expanded code, that the array code is managing its own copy of a cache of the flat representation and flat size, https://github.com/postgres/postgres/blob/cf7dfbf2d6c5892747cd6fca399350d86c16f00f/src/backend/utils/adt/array_expanded.c#L247-L253 This seems like generic code, which any implementor is going to end up copying (after seeing it, and seeing how often the flatten size callback is being hit while debugging code, it seems like an obvious next thing to add to my expanded representation, once I get things working). Why isn't caching the flat representation and size (and short circuiting when the cache is already filled) part of the generic functionality PgSQL provides? Should it be? I guess it would imply a required function to dirty the EOH cache when changes are made to the in-memory data, but that seems no worse as part of the generic API than in all the client code. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On December 21, 2015 at 2:33:56 AM, Haribabu Kommi (kommi.harib...@gmail.com) wrote: Yes the query is producing more groups according to the selectivity. For example - scan selectivity - 40, the number of groups - 400 Following is the query: SELECT tenpoCord, SUM(yokinZandaka) AS yokinZandakaxGOUKEI, SUM(kashikoshiZandaka) AS kashikoshiZandakaxGOUKEI, SUM(kouzasuu) AS kouzasuuxGOUKEI, SUM(sougouKouzasuu) AS sougouKouzasuuxGOUKEI FROM public.test01 WHERE tenpoCord <= '001' AND kamokuCord = '01' AND kouzaKatujyoutaiCord = '0' GROUP BY kinkoCord,tenpoCord; Shouldn’t parallel aggregate come into play regardless of scan selectivity? I know in PostGIS land there’s a lot of stuff like: SELECT ST_Union(geom) FROM t GROUP BY areacode; Basically, in the BI case, there’s often no filter at all. Hoping that’s considered a prime case for parallel agg :) P
Re: [HACKERS] parallel joins, and better parallel explain
On Wed, Dec 2, 2015 at 1:55 PM, Robert Haaswrote: > Oops. The new version I've attached should fix this. I've been trying to see if parallel join has any effect on PostGIS spatial join queries, which are commonly CPU bound. (My tests [1] on simple parallel scan were very positive, though quite limited in that they only parallelized such a small part of the work). Like Amit, I found the current patches are missing a change to src/include/nodes/relation.h, but just adding in "Relids extra_lateral_rels" to JoinPathExtraData allowed a warning-free build. The assumptions on parallel code in generally are that setting up parallel workers is very costly compared to the work to be done, so to get PostGIS code to parallelize I've been reduced to shoving parallel_setup_cost very low (1) and parallel_tuple_cost as well. Otherwise I just end up with ordinary plans. I did redefine all the relevant functions as "parallel safe" and upped their declared costs significantly. I set up a 8000 record spatial table, with a spatial index, and did a self-join on it. explain analyze select a.gid, b.gid from vada a join vada b on st_intersects(a.geom, b.geom) where a.gid != b.gid; With no parallelism, I got this: set max_parallel_degree = 0; QUERY PLAN -- Nested Loop (cost=0.15..227332.48 rows=1822243 width=8) (actual time=0.377..5528.461 rows=52074 loops=1) -> Seq Scan on vada a (cost=0.00..1209.92 rows=8792 width=1189) (actual time=0.027..5.004 rows=8792 loops=1) -> Index Scan using vada_gix on vada b (cost=0.15..25.71 rows=1 width=1189) (actual time=0.351..0.622 rows=6 loops=8792) Index Cond: (a.geom && geom) Filter: ((a.gid <> gid) AND _st_intersects(a.geom, geom)) Rows Removed by Filter: 3 Planning time: 3.976 ms Execution time: 5533.573 ms With parallelism, I got this: QUERY PLAN - Nested Loop (cost=0.15..226930.05 rows=1822243 width=8) (actual time=0.840..5462.029 rows=52074 loops=1) -> Gather (cost=0.00..807.49 rows=8792 width=1189) (actual time=0.335..39.326 rows=8792 loops=1) Number of Workers: 1 -> Parallel Seq Scan on vada a (cost=0.00..806.61 rows=5861 width=1189) (actual time=0.015..10.167 rows=4396 loops=2) -> Index Scan using vada_gix on vada b (cost=0.15..25.71 rows=1 width=1189) (actual time=0.353..0.609 rows=6 loops=8792) Index Cond: (a.geom && geom) Filter: ((a.gid <> gid) AND _st_intersects(a.geom, geom)) Rows Removed by Filter: 3 Planning time: 4.019 ms Execution time: 5467.126 ms Given that it's a CPU-bound process, I was hoping for something closer to the results of the sequence tests, about 50% time reduction, based on the two cores in my test machine. In general either the parallel planner is way too conservative (it seems), or we need to radically increase the costs of our PostGIS functions (right now, most "costly" functions are cost 100, but I had to push costs up into the 10 range to get parallelism to kick in sometimes). Some guidelines on cost setting would be useful, something that says, "this function run against this kind of data is cost level 1, compare the time your function takes on 'standard' data to the baseline function to get a cost ratio to use in the function definition" ATB, P. [1] https://gist.github.com/pramsey/84e7a39d83cccae692f8 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Thu, Dec 10, 2015 at 10:42 PM, Haribabu Kommiwrote: > > Here I attached a POC patch of parallel aggregate based on combine > aggregate patch. This patch contains the combine aggregate changes > also. This patch generates and executes the parallel aggregate plan > as discussed in earlier threads. I tried this out using PostGIS with no great success. I used a very simple aggregate for geometry union because the combine function is just the same as the transfer function for this case (I also mark ST_Area() as parallel safe, so that the planner will attempt to parallelize the query).. CREATE AGGREGATE ST_MemUnion ( basetype = geometry, sfunc = ST_Union, cfunc = ST_Union, stype = geometry ); Unfortunately attempting a test causes memory corruption and death. select riding, st_area(st_memunion(geom)) from vada group by riding; The explain looks OK: QUERY PLAN --- Finalize HashAggregate (cost=220629.47..240380.26 rows=79 width=1189) Group Key: riding -> Gather (cost=0.00..807.49 rows=8792 width=1189) Number of Workers: 1 -> Partial HashAggregate (cost=220628.59..220629.38 rows=79 width=1189) Group Key: riding -> Parallel Seq Scan on vada (cost=0.00..806.61 rows=8792 width=1189) But the run dies. NOTICE: SRID value -32897 converted to the officially unknown SRID value 0 ERROR: Unknown geometry type: 2139062143 - Invalid type >From the message it looks like geometry gets corrupted at some point, causing a read to fail on very screwed up metadata. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] PostGIS Doc Urls
The attached patch changes web references to PostGIS to point to the actual community site (postgis.net) which is active, rather than the historical site (postgis.org). P. -- Paul Ramsey http://cleverelephant.ca http://postgis.net postgis_doc.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
Thanks everyone for the held and feedback on this patch! -- Paul Ramsey http://cleverelephant.ca http://postgis.net On November 3, 2015 at 3:47:37 PM, Tom Lane (t...@sss.pgh.pa.us) wrote: Robert Haas <robertmh...@gmail.com> writes: > On Tue, Nov 3, 2015 at 2:57 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Paul Ramsey <pram...@cleverelephant.ca> writes: >>> [ 20151006b_postgres_fdw_extensions.patch ] >> There might be a case for raising a WARNING during >> postgres_fdw_validator(), but no more than that, IMO. Certainly ERROR >> during regular use of the server is right out. > Agreed. I don't know whether it's better to emit a WARNING or some > lower-level message (INFO, DEBUG), but I think an ERROR will suck due > to the pg_dump issues you mention. I've committed this with a WARNING during validation and no comment otherwise. I left out the proposed regression tests because they fail in "make installcheck" mode, unless you've previously built and installed cube and seg, which seems like an unacceptable requirement to me. I don't think that leaving the code untested is a good final answer, of course. The idea I was toying with was to create a dummy extension for testing purposes by means of doing a direct INSERT into pg_extension --- which is ugly and would only work for superusers, but the latter is true of "CREATE EXTENSION cube" too. Anybody have a better idea? regards, tom lane
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Tue, Oct 6, 2015 at 8:52 AM, Andres Freundwrote: > I think it'd be good to add a test exercising two servers with different > extensions marked as shippable. Done, P 20151006b_postgres_fdw_extensions.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On October 4, 2015 at 9:56:10 PM, Michael Paquier (michael.paqu...@gmail.com(mailto:michael.paqu...@gmail.com)) wrote: > On Sun, Oct 4, 2015 at 11:40 AM, Paul Ramsey wrote: > > I put all changes relative to your review here if you want a nice colorized > > place to check > > > > https://github.com/pramsey/postgres/commit/ed33e7489601e659f436d6afda3cce28304eba50 > > - /* updatable is available on both server and table */ > + /* updatable option is available on both server and table */ > This is just noise (perhaps I am the one who introduced it, oops). No, I did, it matches the change to the comment below that addresses Andres note. The principle of least change is butting up against the principle of language usage consistency here. Can remove, of course. P -- http://postgis.net http://cleverelephant.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Tue, Oct 6, 2015 at 5:32 AM, Andres Freundwrote: > The problem is basically that cache invalidations can arrive while > you're building new cache entries. Everytime you e.g. open a relation > cache invalidations can arrive. Assume you'd written the code like: > You're avoiding that by only entering into the hashtable *after* the > lookup. And I think that deserves a comment. Thanks, revised patch attached. P. 20151006_postgres_fdw_extensions.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On October 6, 2015 at 6:32:36 AM, Andres Freund (and...@anarazel.de(mailto:and...@anarazel.de)) wrote: > On 2015-10-06 06:28:34 -0700, Paul Ramsey wrote: > > +/* > > + * is_shippable > > + * Is this object (proc/op/type) shippable to foreign server? > > + * Check cache first, then look-up whether (proc/op/type) is > > + * part of a declared extension if it is not cached. > > + */ > > +bool > > +is_shippable(Oid objnumber, Oid classnumber, List *extension_list) > > +{ > > + ShippableCacheKey key; > > + ShippableCacheEntry *entry; > > + > > + /* Always return false if the user hasn't set the "extensions" option */ > > + if (extension_list == NIL) > > + return false; > > + > > + /* Find existing cache, if any. */ > > + if (!ShippableCacheHash) > > + InitializeShippableCache(); > > + > > + /* Zero out the key */ > > + memset(, 0, sizeof(key)); > > + > > + key.objid = objnumber; > > + key.classid = classnumber; > > + > > + entry = (ShippableCacheEntry *) > > + hash_search(ShippableCacheHash, > > + (void *) , > > + HASH_FIND, > > + NULL); > > + > > + /* Not found in ShippableCacheHash cache. Construct new entry. */ > > + if (!entry) > > + { > > + /* > > + * Right now "shippability" is exclusively a function of whether > > + * the obj (proc/op/type) is in an extension declared by the user. > > + * In the future we could additionally have a whitelist of functions > > + * declared one at a time. > > + */ > > + bool shippable = lookup_shippable(objnumber, classnumber, extension_list); > > + > > + /* > > + * Don't create a new hash entry until *after* we have the shippable > > + * result in hand, as the shippable lookup might trigger a cache > > + * invalidation. > > + */ > > + entry = (ShippableCacheEntry *) > > + hash_search(ShippableCacheHash, > > + (void *) , > > + HASH_ENTER, > > + NULL); > > + > > + entry->shippable = shippable; > > + } > > + > > + if (!entry) > > + return false; > > + else > > + return entry->shippable; > > +} > > Maybe I'm missing something major here. But given that you're looking up > solely based on Oid objnumber, Oid classnumber, how would this cache > work if there's two foreign servers with different extension lists? *sigh*, no you’re not missing anything. In moving to the cache and marking things just as “shippable” I’ve lost the test that ensures they are shippable for this *particular* server (which only happens in the lookup stage). So yeah, the cache will be wrong in cases where different servers have different extension opti ons. Thanks, P. -- http://postgis.net http://cleverelephant.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Tue, Oct 6, 2015 at 6:55 AM, Andres Freund <and...@anarazel.de> wrote: > On 2015-10-06 06:42:17 -0700, Paul Ramsey wrote: >> *sigh*, no you’re not missing anything. In moving to the cache and >> marking things just as “shippable” I’ve lost the test that ensures >> they are shippable for this *particular* server (which only happens in >> the lookup stage). So yeah, the cache will be wrong in cases where >> different servers have different extension opti ons. > > Should be easy enough to fix - just add the server's oid to the key. > > But I guess you're already on that. Yep, just a big chastened :) Thanks for the catch, here's the patch, P 20151006a_postgres_fdw_extensions.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
Andres, Thanks so much for the review! I put all changes relative to your review here if you want a nice colorized place to check https://github.com/pramsey/postgres/commit/ed33e7489601e659f436d6afda3cce28304eba50 On October 3, 2015 at 8:49:04 AM, Andres Freund (and...@anarazel.de) wrote: > + /* this must have already-installed extensions */ I don't understand that comment. Fixed, I hope. > + /* extensions is available on server */ > + {"extensions", ForeignServerRelationId, false}, awkward spelling in comment. Fixed, I hope. > + * throw up an error. > + */ s/throw up an error/throw an error/ or raise an error. But “throw up” is so evocative :) fixed. > + /* Optional extensions to support (list of oid) */ *oids Fixed. > + /* Always return false if we don't have any declared extensions */ Imo there's nothing declared here... Changed... > + if (extension_list == NIL) > + return false; > + > + /* We need this relation to scan */ Not sure what that means. Me neither, removed. > + if (foundDep->deptype == DEPENDENCY_EXTENSION && > + list_member_oid(extension_list, foundDep->refobjid)) > + { > + is_shippable = true; > + break; > + } > + } Hm. I think this “hm” is addressed lower down. > + /* Always return false if we don't have any declared extensions */ > + if (extension_list == NIL) > + return false; I again dislike declared here ;) Altered. > + key.objid = objnumber; Hm. Oids can conflict between different system relations. Shouldn't the key be over class (i.e. pg_proc, pg_type etc.) and object id? I’ve changed the lookup to use class/obj instead. I’m *hoping* I don’t get burned by it, but it regresses fine at least. Each call to is_shippable now has a hard-coded class oid in it depending on the context of the call. It seemed like the right way to do it. > + /* > + * Right now "shippability" is exclusively a function of whether > + * the obj (proc/op/type) is in an extension declared by the user. > + * In the future we could additionally have a whitelist of functions > + * declared one at a time. > + */ > + bool shippable = lookup_shippable(objnumber, extension_list); > + > + entry = (ShippableCacheEntry *) > + hash_search(ShippableCacheHash, > + (void *) , > + HASH_ENTER, > + NULL); > + > + entry->shippable = shippable; > + } I suggest adding a comment that we consciously are only HASH_ENTERing the key after doing the lookup_shippable(), to avoid the issue that the lookup might accept cache invalidations and thus delete the entry again. I’m not understanding this one. I only ever delete cache entries in bulk, when InvalidateShippableCacheCallback gets called on updates to the foreign server definition. I must not be quite understanding your gist. Thanks! P 20151003_postgres_fdw_extensions.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On September 30, 2015 at 7:06:58 AM, Tom Lane (t...@sss.pgh.pa.us) wrote: Paul Ramsey <pram...@cleverelephant.ca> writes: > Hm. Wouldn't it be just fine if only the server is able to define a > list of extensions then? It seems to me that all the use-cases of this > feature require to have a list of extensions defined per server, and > not per fdw type. This would remove a level of complexity in your > patch without impacting the feature usability as well. I would > personally go without it but I am fine to let a committer (Tom?) put a > final judgement stamp on this matter. Thoughts? Maybe I'm missing something, but I had envisioned the set of safe-to-transmit extensions as typically being defined at the foreign-server level. The reason being that you are really declaring two things: one is that the extension's operations are reproducible remotely, and the other is that the extension is in fact installed on this particular remote server. Perhaps there are use-cases for specifying it as an FDW option or per-table option, but per-server option seems by far the most plausible case. Fair enough. Declaring it for the whole database as an option to CREATE FOREIGN DATA WRAPPER was just a convenience really, so you could basically say “I expect this extension on all my servers”. But you’re right, logically “having the extension” is an attribute of the servers, so restricting it to the server definitions only has a nice simple logic to it. P. -- http://postgis.net http://cleverelephant.ca
Re: [HACKERS] [PATCH] postgres_fdw extension support
On September 30, 2015 at 3:32:21 PM, Michael Paquier (michael.paqu...@gmail.com) wrote: OK. Once you can get a new patch done with a reworked extractExtensionList, I'll get a new look at it in a timely fashion and then let's move it to a committer's hands. Done and thanks! P -- http://postgis.net http://cleverelephant.ca fdw-extension-support8.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On September 30, 2015 at 12:54:44 AM, Michael Paquier (michael.paqu...@gmail.com) wrote: >> +extern bool extractExtensionList(char *extensionString, >> + List **extensionOids); >> What's the point of the boolean status in this new routine? The return >> value of extractExtensionList is never checked, and it would be more >> simple to just return the parsed list as return value, no? > > I started changing it, then found out why it is the way it is. During > the options parsing, the list of current extensionOids is passed in, > so that extra ones can be added, since both the wrapper and the server > can be declared with extensionOids. It's also doubling as a flag on > whether the function should bother to try and populate the list, or > just do a sanity check on the options string. I can change the > signature to > > extern List* extractExtensionList(char *extensionString, List > *currentExtensionOids, bool populateList); > to be more explicit if necessary. Hm. Wouldn't it be just fine if only the server is able to define a list of extensions then? It seems to me that all the use-cases of this feature require to have a list of extensions defined per server, and not per fdw type. This would remove a level of complexity in your patch without impacting the feature usability as well. I would personally go without it but I am fine to let a committer (Tom?) put a final judgement stamp on this matter. Thoughts? Simon wanted to be able to define extensions at the wrapper level as well as the server level. It’s a nice enough little feature to warrant a small amount of added complexity, IMO. I’m going to change the signature, since verbose clarity > terse obscurity for my wee brain (only took me a couple weeks to forget why that signature was what it wa s). >> +-- === >> +-- clean up >> +-- === >> Perhaps here you meant dropping the schema and the foreign tables >> created previously? > > I did, but since postgres_fdw doesn't clean up after itself, perhaps > just leaving the room messy is the appropriate behaviour? That's appropriate when keeping around objects that are aimed to be tested with for example pg_upgrade, some regression tests of src/test/regress do that on purpose for example. Now it is true that an extension regression database will be normally reused by another extension just after, and that there is no general rule on this I’ll clean up then, I was just being lazy. P. -- http://postgis.net http://cleverelephant.ca
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Sat, Sep 26, 2015 at 5:41 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Sat, Sep 26, 2015 at 4:29 AM, Paul Ramsey wrote: >> On Thu, Aug 20, 2015 at 6:01 PM, Michael Paquier wrote: >> src/backend/utils/adt/format_type.c >> +/* >> + * This version allows a nondefault typemod to be specified and fully >> qualified. >> + */ >> +char * >> +format_type_with_typemod_qualified(Oid type_oid, int32 typemod) >> +{ >> + return format_type_internal(type_oid, typemod, true, false, true); >> +} > > Patch 0001 in this email has a routine called format_type_detailed > that I think is basically what we need for this stuff: > http://www.postgresql.org/message-id/CACACo5Q_UXYwF117LBhjZ3xaMPyrgqnqE=mxvrhefjj51ac...@mail.gmail.com > Could you look at it? I'm not sure it helps much. I'd still need a function to turn the output into a formatted string, but more importantly the big question for me to avoid breaking regression is: if it's built-in, don't schema qualitfy it; if it's extended do so. I'm not seeing why ignoring the typmod in the case of deparsing extended type constants is going to be a problem? All the old behaviour is untouched in the current patch. > + * shippable.c > + * Non-built-in objects cache management and utilities. > + * > + * Is a non-built-in shippable to the remote server? Only if > + * the object is in an extension declared by the user in the > + * OPTIONS of the wrapper or the server. > This is rather unclear. It would be more simple to describe that as > "Facility to track database objects shippable to a foreign server". Done > +extern bool extractExtensionList(char *extensionString, > + List **extensionOids); > What's the point of the boolean status in this new routine? The return > value of extractExtensionList is never checked, and it would be more > simple to just return the parsed list as return value, no? I started changing it, then found out why it is the way it is. During the options parsing, the list of current extensionOids is passed in, so that extra ones can be added, since both the wrapper and the server can be declared with extensionOids. It's also doubling as a flag on whether the function should bother to try and populate the list, or just do a sanity check on the options string. I can change the signature to extern List* extractExtensionList(char *extensionString, List *currentExtensionOids, bool populateList); to be more explicit if necessary. > -REGRESS = postgres_fdw > +REGRESS = postgres_fdw shippable > +EXTRA_INSTALL = contrib/seg > The order of the tests is important and should be mentioned, > shippable.sql using the loopback server created by postgres_fdw. Done > +-- === > +-- clean up > +-- === > Perhaps here you meant dropping the schema and the foreign tables > created previously? I did, but since postgres_fdw doesn't clean up after itself, perhaps just leaving the room messy is the appropriate behaviour? > +CREATE SCHEMA "SH 1"; > Is there a reason to use double-quoted relations? There is no real > reason to not use them, this is just to point out that this is not a > common practice in the other regression tests. Just following the pattern from postgres_fdw. And since I found things to be sensitive to full qualification of function names, etc, it seemed like a good idea to try out odd named tables/schemas since the pattern was there to follow. I also changed the extension being tested from 'seg' to 'cube', since cube had some functions I could test as well as operators. P. > -- > Michael diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile index d2b98e1..f78fc64 100644 --- a/contrib/postgres_fdw/Makefile +++ b/contrib/postgres_fdw/Makefile @@ -1,7 +1,7 @@ # contrib/postgres_fdw/Makefile MODULE_big = postgres_fdw -OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES) +OBJS = postgres_fdw.o option.o deparse.o connection.o shippable.o $(WIN32RES) PGFILEDESC = "postgres_fdw - foreign data wrapper for PostgreSQL" PG_CPPFLAGS = -I$(libpq_srcdir) @@ -10,7 +10,9 @@ SHLIB_LINK = $(libpq) EXTENSION = postgres_fdw DATA = postgres_fdw--1.0.sql -REGRESS = postgres_fdw +# Note: shippable tests depend on postgres_fdw tests setup +REGRESS = postgres_fdw shippable +EXTRA_INSTALL = contrib/cube ifdef USE_PGXS PG_CONFIG = pg_config diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 697de60..897ecdb 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -233,6 +233,9 @@ foreign_expr_walker(Node *node, Oid collation; FDWCollateStat
Re: [HACKERS] [PATCH] postgres_fdw extension support
Back from summer and conferencing, and finally responding, sorry for the delay... On Thu, Aug 20, 2015 at 6:01 PM, Michael Paquierwrote: > > > if (needlabel) > appendStringInfo(buf, "::%s", > - > format_type_with_typemod(node->consttype, > - > node->consttypmod)); > + > format_type_be_qualified(node->consttype)); > Pondering more about this one, I think that we are going to need a new > routine in format_type.c to be able to call format_type_internal as > format_type_internal(type_oid, typemod, true/false, false, true). If typemod > is -1, then typemod_given (the third argument) is false, otherwise > typemod_given is true. That's close to what the C function format_type at > the SQL level can do except that we want it to be qualified. Regression > tests will need an update as well. I ended up switching on whether the type being formatted was an extension type or not. Extension types need to be fully qualified or they won't get found by the remote. Conversely if you fully qualify the built-in types, the regression test for postgres_fdw isn't happy. This still isn't quite the best/final solution, perhaps something as simple as this would work, but I'm not sure: src/backend/utils/adt/format_type.c +/* + * This version allows a nondefault typemod to be specified and fully qualified. + */ +char * +format_type_with_typemod_qualified(Oid type_oid, int32 typemod) +{ + return format_type_internal(type_oid, typemod, true, false, true); +} > Comment format is incorrect, this should be written like that: Comments fixed. > + if (!extension_list) > + return false; > I would rather mark that as == NIL instead, NIL is what an empty list is. Done > +extern bool is_shippable(Oid procnumber, PgFdwRelationInfo *fpinfo); > There is no need to pass PgFdwRelationInfo to is_shippable. Only the list of > extension OIDs is fine. Done > That's nitpicking, but normally we avoid more than 80 characters per line of > code. Done. > When attempting to create a server when an extension is not installed we get > an appropriate error: > =# CREATE SERVER postgres_server > FOREIGN DATA WRAPPER postgres_fdw > OPTIONS (host 'localhost', port '5432', dbname 'postgres', extensions > 'seg'); > ERROR: 42601: the "seg" extension must be installed locally before it can > be used on a remote server > LOCATION: extractExtensionList, shippable.c:245 > However, it is possible to drop an extension listed in a postgres_fdw > server. Shouldn't we invalidate cache as well when pg_extension is updated? > Thoughts? For extensions with types, it's pretty hard to pull the extension out from under the FDW, because the tables using the type will depend on it. For simpler function-only extensions, it's possible, but as soon as you pull the extension out and run a query, your FDW will notice the extension is missing and complain. And then you'll have to update the foreign server or foreign table entries and the cache will get flushed. So there's no way to get a stale cache. > + list_free(extlist); > It does not matter much to call list_free here. The list is going to be > freed in the error context with ERROR. Done. > + foreach(ext, extension_list) > + { > + Oid extension_oid = (Oid) lfirst(ext); > + if (foundDep->refobjid == extension_oid) > + { > + nresults++; > + } > + } > You could just use list_member_oid here. And why not just breaking the loop > once there is a match? This is doing unnecessary work visibly Done. > +By default only built-in operators and functions will be sent from the > +local to the foreign server. This may be overridden using the following > option: > Missing a mention to "data types" here? Actually extension data types traverse the postgres_fdw without trouble without this patch, as long as both sides have the extension installed. So not strictly needed to mention data types. > It would be good to get some regression tests for this feature, which is > something doable with the flag EXTRA_INSTALL and some tests with EXPLAIN > VERBOSE. Note that you will need to use CREATE EXTENSION to make the > extension available in the new test, which should be I imagine a new file > like shippable.sql. I've put a very very small regression file in that tests the shippable feature, which can be fleshed out further as needed. Thanks so much! P. > Regards, > -- > Michael diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile index d2b98e1..63576c4 100644 --- a/contrib/postgres_fdw/Makefile +++ b/contrib/postgres_fdw/Makefile @@ -1,7 +1,7 @@ # contrib/postgres_fdw/Makefile MODULE_big = postgres_fdw -OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES) +OBJS =
Re: [HACKERS] Extension upgrade and GUCs
On August 20, 2015 at 2:17:31 AM, Simon Riggs (si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote: On 18 August 2015 at 21:03, Paul Ramsey wrote: So I need a way to either (a) notice when I already have a (old) copy of the library loaded and avoid trying to setup the GUC in that case or (b) set-up the GUC in a somewhat less brittle way than DefineCustomStringVariable() allows, something that can overwrite things instead of just erroring out. Are you trying to preserve the in-memory state across upgrade as well? It sounds unlikely we can support that directly in the general case. I’m not sure what you mean by this. Sounds like we need RedefineCustomStringVariable() Yes, if that had existed we would not have had any problems (as long as it delegated back to Define..() in the case where the variable hadn’t been created yet…, since one of the problems is knowing if/to-what-extent a custom variable has already been defined). We do now have a fix, which involved copying about 100 lines of core code (find_option, guc_var_compare, guc_name_compare) up, that does a low level search to see if there is a config_generic for a particular variable name, and if so whether it’s a placeholder or not. The presence of a non-placeholding definition is used as a “uh oh, there’s already a library in here” warning which keeps us from re-defining the variable and causing trouble. P. -- Simon Riggs http://www.2ndQuadrant.com/(http://www.2ndquadrant.com/) PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension upgrade and GUCs
On August 20, 2015 at 7:21:10 AM, Tom Lane (t...@sss.pgh.pa.us(mailto:t...@sss.pgh.pa.us)) wrote: I'm not sure that the situation you describe can be expected to work reliably; the problems are far wider than just GUC variables. If two different .so's are exposing broadly the same set of C function names, how can we be sure which one will get called by the dynamic linker? Isn't it likely that we'd end up continuing to call some functions out of the old .so, probably leading to disaster? Well, when you put it that way it sounds pretty scary :) For whatever it’s worth, we’ve had versioned .so’s for a very long time, and every time an upgrade happens there is a period during which a backend will have two versions loaded simultaneously. But we only noticed recently when we got the GUC collision (our first GUC was only introduced in PostGIS 2.1). Perhaps because after upgrading most people disconnect, and then the next time they connect they only get the new library henceforth. In some cases (start a fresh backend and then do the upgrade immediately) it’s even possible to do an upgrade without triggering the double-load situation. I don't necessarily object to providing some solution for GUCs, but I think first we need to question whether that isn't just the tip of a large iceberg. There’s no doubt that the GUC issue is just a symptom of a potentially awful larger disease, but so far it’s the only symptom we’ve observed. Maybe because only a small % of functionality ever changes in a given release, combined with the relatively short lifespan of the double-loaded condition, and the fact the problem goes away immediately on re-connect, any problems have always just been ignored. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Extension upgrade and GUCs
Hi hackers, I've been wrestling with this one for a while and gone down a couple blind alleys, so time to ask the experts. PostGIS has a couple things different from the extensions that live in contrib, - it has some GUCs - it has a versioned loadable library (postgis-2.1.so, postgis-2.2.so, etc) We've found that, when we run the following SQL, CREATE EXTENSION postgis VERSION '2.1.9'; ALTER EXTENSION postgis UPDATE TO '2.2.0'; The update fails, because of a collision in the GUC. When the extension is CREATEd, postgis-2.1.so is loaded, _PG_init() is called, and that in turn calls DefineCustomStringVariable() to create our GUC. When the ALTER is called, the first time a C function definition is called, the new library, postgis-2.2.so is loaded, the _PG_init() of *that* library is called, and DefineCustomStringVariable() is called, but this time it runs into the GUC definition from the first library load, and the EXTENSION update process stops as an ERROR is thrown. My initial attempt at avoiding this problem involved looking at GetConfigOption() before running DefineCustomStringVariable() to see if the GUC was already defined. This did not work, as it's possible to define a GUC before loading the library. So an otherwise innocent sequence of commands like: SET my_guc = 'foo'; -- no library loaded yet SELECT my_library_function(); -- causes library load and _PG_init() to fire Would now fail, as it hit my test for a pre-existing GUC. Unfortunately, the GUC we are using is not one where we simply read a value now and a again. It switches the backend geometry library that various functions use, so performance is a big deal: instead of reading the GUC value, the code expects that GUC changes will flip a global variable, using the GUC assign callback. So I need a way to either (a) notice when I already have a (old) copy of the library loaded and avoid trying to setup the GUC in that case or (b) set-up the GUC in a somewhat less brittle way than DefineCustomStringVariable() allows, something that can overwrite things instead of just erroring out. The ugly code in question is here https://github.com/postgis/postgis/blob/svn-trunk/postgis/lwgeom_backend_api.c#L105 Discussion is here https://trac.osgeo.org/postgis/ticket/2382 Thanks, P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
Thanks so much Michael! Let me know when you have further feedback I should incorporate. ATB, P. -- http://postgis.net http://cleverelephant.ca On July 25, 2015 at 4:52:11 AM, Michael Paquier (michael.paqu...@gmail.com) wrote: On Sat, Jul 25, 2015 at 2:19 AM, Paul Ramsey pram...@cleverelephant.ca wrote: On Thu, Jul 23, 2015 at 7:48 AM, Paul Ramsey pram...@cleverelephant.ca wrote: Here's an updated patch that clears the cache on changes to foreign wrappers and servers. Any chance one of you folks could by my official commitfest reviewer? Appreciate all the feedback so far! https://commitfest.postgresql.org/6/304/ That's an interesting topic, I will register as such. -- Michael
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Thu, Jul 23, 2015 at 7:48 AM, Paul Ramsey pram...@cleverelephant.ca wrote: Here's an updated patch that clears the cache on changes to foreign wrappers and servers. Any chance one of you folks could by my official commitfest reviewer? Appreciate all the feedback so far! https://commitfest.postgresql.org/6/304/ Thanks, P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Wed, Jul 22, 2015 at 12:19 PM, Paul Ramsey pram...@cleverelephant.ca wrote: I’ll have a look at doing invalidation for the case of changes to the FDW wrappers and servers. Here's an updated patch that clears the cache on changes to foreign wrappers and servers. In testing it I came across an unrelated issue which could make it hard for users to manage the options on their wrappers/servers fdw=# ALTER SERVER foreign_server OPTIONS ( extensions 'postgis' ); ALTER SERVER fdw=# ALTER SERVER foreign_server OPTIONS ( extensions 'postgis,seg' ); ERROR: option extensions provided more than once Once set, an option seems to be effectively immutable. diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile index d2b98e1..3543312 100644 --- a/contrib/postgres_fdw/Makefile +++ b/contrib/postgres_fdw/Makefile @@ -1,7 +1,7 @@ # contrib/postgres_fdw/Makefile MODULE_big = postgres_fdw -OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES) +OBJS = postgres_fdw.o option.o deparse.o connection.o shippable.o $(WIN32RES) PGFILEDESC = postgres_fdw - foreign data wrapper for PostgreSQL PG_CPPFLAGS = -I$(libpq_srcdir) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 81cb2b4..d6fff30 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -229,6 +229,9 @@ foreign_expr_walker(Node *node, Oid collation; FDWCollateState state; + /* Access extension metadata from fpinfo on baserel */ + PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *)(glob_cxt-foreignrel-fdw_private); + /* Need do nothing for empty subexpressions */ if (node == NULL) return true; @@ -361,7 +364,7 @@ foreign_expr_walker(Node *node, * can't be sent to remote because it might have incompatible * semantics on remote side. */ - if (!is_builtin(fe-funcid)) + if (!is_builtin(fe-funcid) !is_shippable(fe-funcid, fpinfo)) return false; /* @@ -407,7 +410,7 @@ foreign_expr_walker(Node *node, * (If the operator is, surely its underlying function is * too.) */ - if (!is_builtin(oe-opno)) + if (!is_builtin(oe-opno) !is_shippable(oe-opno, fpinfo)) return false; /* @@ -445,7 +448,7 @@ foreign_expr_walker(Node *node, /* * Again, only built-in operators can be sent to remote. */ - if (!is_builtin(oe-opno)) + if (!is_builtin(oe-opno) !is_shippable(oe-opno, fpinfo)) return false; /* @@ -591,7 +594,7 @@ foreign_expr_walker(Node *node, * If result type of given expression is not built-in, it can't be sent to * remote because it might have incompatible semantics on remote side. */ - if (check_type !is_builtin(exprType(node))) + if (check_type !is_builtin(exprType(node)) !is_shippable(exprType(node), fpinfo)) return false; /* @@ -1404,8 +1407,7 @@ deparseConst(Const *node, deparse_expr_cxt *context) } if (needlabel) appendStringInfo(buf, ::%s, - format_type_with_typemod(node-consttype, - node-consttypmod)); + format_type_be_qualified(node-consttype)); } /* diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c index 7547ec2..9aeaf1a 100644 --- a/contrib/postgres_fdw/option.c +++ b/contrib/postgres_fdw/option.c @@ -15,6 +15,7 @@ #include postgres_fdw.h #include access/reloptions.h +#include catalog/pg_foreign_data_wrapper.h #include catalog/pg_foreign_server.h #include catalog/pg_foreign_table.h #include catalog/pg_user_mapping.h @@ -124,6 +125,10 @@ postgres_fdw_validator(PG_FUNCTION_ARGS) errmsg(%s requires a non-negative numeric value, def-defname))); } + else if (strcmp(def-defname, extensions) == 0) + { + extractExtensionList(defGetString(def), NULL); + } } PG_RETURN_VOID(); @@ -153,6 +158,9 @@ InitPgFdwOptions(void) /* updatable is available on both server and table
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Thu, Jul 23, 2015 at 7:48 AM, Paul Ramsey pram...@cleverelephant.ca wrote: In testing it I came across an unrelated issue which could make it hard for users to manage the options on their wrappers/servers fdw=# ALTER SERVER foreign_server OPTIONS ( extensions 'postgis' ); ALTER SERVER fdw=# ALTER SERVER foreign_server OPTIONS ( extensions 'postgis,seg' ); ERROR: option extensions provided more than once Once set, an option seems to be effectively immutable. Whoops, I see I just didn't read the fully syntax on providing options, using ALTER SERVER foreign_server OPTIONS ( SET extensions 'postgis,seg' ); works just fine. Sorry for noise, P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On July 22, 2015 at 12:15:14 PM, Andres Freund (and...@anarazel.de) wrote: It doesn't seem that unlikely that somebody does an ALTER SERVER OPTIONS SET .. to add an extension to be shippable while connections are already using the fdw. It'll be confusing if some clients are fast and some others are really slow. This seems more likely than anyone mucking around with extension stuff (adding new functions (and working with FDW in production at the same time?)) or adding/dropping whole extensions (you’ll have more problems than a stale cache, whole columns will disappear if you DROP EXTENSION postgis CASCADE), and has the added benefit of not needing me to muck into core stuff for my silly feature. I’ll have a look at doing invalidation for the case of changes to the FDW wrappers and servers. P.
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Tue, Jul 21, 2015 at 7:45 AM, Andres Freund and...@anarazel.de wrote: + + /* We need this relation to scan */ + depRel = heap_open(DependRelationId, RowExclusiveLock); + + /* Scan the system dependency table for a all entries this operator */ + /* depends on, then iterate through and see if one of them */ + /* is a registered extension */ + ScanKeyInit(key[0], + Anum_pg_depend_objid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(procnumber)); + + scan = systable_beginscan(depRel, DependDependerIndexId, true, + GetCatalogSnapshot(depRel-rd_id), nkeys, key); + + while (HeapTupleIsValid(tup = systable_getnext(scan))) + { + Form_pg_depend foundDep = (Form_pg_depend) GETSTRUCT(tup); + + if ( foundDep-deptype == DEPENDENCY_EXTENSION ) + { + List *extlist = fpinfo-extensions; + ListCell *ext; + + foreach(ext, extlist) + { + Oid extension_oid = (Oid) lfirst(ext); + if ( foundDep-refobjid == extension_oid ) + { + nresults++; + } + } + } + if ( nresults 0 ) break; + } + + systable_endscan(scan); + relation_close(depRel, RowExclusiveLock); + + return nresults 0; +} Phew. That's mighty expensive to do at frequency. I guess it'll be more practical to expand this list once and then do a binary search on the result for the individual functions So, right after reading the options in postgresGetForeignRelSize, expand the extension list into a list of all ops/functions, in a sorted list, and let that carry through to the deparsing instead? That would happen once per query, right? But the deparsing also happens once per query too, yes? Is the difference going to be that big? (I speak not knowing the overhead of things like systable_beginscan, etc) Thanks! P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On July 21, 2015 at 11:07:36 AM, Tom Lane (t...@sss.pgh.pa.us) wrote: I'm inclined to think that it's not really necessary to worry about invalidating a per-connection cache of is this function safe to ship determinations. So: yes to a local cache of all forwardable functions/ops, populated in full the first time through (does that speak maybe to using a binary search on a sorted list instead of a hash, since I only pay the sort price once and am not doing any insertions?). And then we just hold it until the connection goes away. Yes? P.
Re: [HACKERS] [PATCH] postgres_fdw extension support
On July 21, 2015 at 8:26:31 AM, Andres Freund (and...@anarazel.de(mailto:and...@anarazel.de)) wrote: On 2015-07-21 17:00:51 +0200, Andres Freund wrote: On 2015-07-21 07:55:17 -0700, Paul Ramsey wrote: On Tue, Jul 21, 2015 at 7:45 AM, Andres Freund wrote: So, right after reading the options in postgresGetForeignRelSize, expand the extension list into a list of all ops/functions, in a sorted list, and let that carry through to the deparsing instead? I'd actually try to make it longer lived, i.e. permanently. And just deallocate when a catcache callback says it needs to be invalidated; IIRC there is a relevant cache. On second thought I'd not use a binary search but a hash table. If you choose the right key a single table is enough for the lookup. If you need references for invalidations you might want to look for CacheRegisterSyscacheCallback callers. E.g. attoptcache.c On 2015-07-21 08:32:34 -0700, Paul Ramsey wrote: Thanks! Reading some of the syscache callback stuff, one thing I’m not sure of is which SysCacheIdentifier(s) I should be registering callbacks against to ensure my list of funcs/procs that reside in particular extensions is kept fresh. I don’t see anything tracking the dependencies there FOREIGNSERVEROID, and a new syscache on pg_extension.oid should suffice I think. pg_foreign_server will be changed upon option changes and pg_extension.oid on extension upgrades. Since dependencies won't change independently of extension versions I don't think we need to care otherwise. There's ALTER EXTENSION ... ADD but I'm rather prepared to ignore that; if that's not ok it's trivial to make it emit an invalidation. This hole just keeps getting deeper :) So, - a HASH in my own code to hold all the functions that I consider “safe to forward” (which I’ll derive by reading the contents of the extensions the users declare) - callbacks in my code registered using CacheRegisterSyscacheCallback on FOREIGNSERVEROID and __see_below__ that refresh my cache when called - since there is no syscache for extensions right now, a new syscache entry for pg_extension.oid (and I ape things in syscache.h and syscache.c and Magic Occurs?) - which means I can also CacheRegisterSyscacheCallback on the new EXTENSIONOID syscache - and finally change my is_in_extension() to efficiently check my HASH instead of querying the system catalog Folks are going to be OK w/ me dropping in new syscache entries so support my niche little feature? ATB, P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On July 21, 2015 at 11:22:12 AM, Tom Lane (t...@sss.pgh.pa.us) wrote: So: yes to a local cache of all forwardable functions/ops, populated in full the first time through (does that speak maybe to using a binary search on a sorted list instead of a hash, since I only pay the sort price once and am not doing any insertions?). And then we just hold it until the connection goes away. No, *not* populated first-time-through, because that won't handle any of the CREATE, DROP, or UPGRADE cases. It's also doing a lot of work you might never need. I was thinking of populate on demand, that is, first time you need to know whether function X is shippable, you find that out and then cache the answer (whether it be positive or negative). Roger that. Off to the races.. P
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Tue, Jul 21, 2015 at 11:29 AM, Paul Ramsey pram...@cleverelephant.ca wrote: On July 21, 2015 at 11:22:12 AM, Tom Lane (t...@sss.pgh.pa.us) wrote: No, *not* populated first-time-through, because that won't handle any of the CREATE, DROP, or UPGRADE cases. It's also doing a lot of work you might never need. I was thinking of populate on demand, that is, first time you need to know whether function X is shippable, you find that out and then cache the answer (whether it be positive or negative). Roger that. Off to the races.. Attached, reworked with a local cache. I felt a little dirty sticking the cache entry right in postgres_fdw.c, so I broke out all my nonsense into shippable.c. Thanks! P diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile index d2b98e1..3543312 100644 --- a/contrib/postgres_fdw/Makefile +++ b/contrib/postgres_fdw/Makefile @@ -1,7 +1,7 @@ # contrib/postgres_fdw/Makefile MODULE_big = postgres_fdw -OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES) +OBJS = postgres_fdw.o option.o deparse.o connection.o shippable.o $(WIN32RES) PGFILEDESC = postgres_fdw - foreign data wrapper for PostgreSQL PG_CPPFLAGS = -I$(libpq_srcdir) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 81cb2b4..d6fff30 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -229,6 +229,9 @@ foreign_expr_walker(Node *node, Oid collation; FDWCollateState state; + /* Access extension metadata from fpinfo on baserel */ + PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *)(glob_cxt-foreignrel-fdw_private); + /* Need do nothing for empty subexpressions */ if (node == NULL) return true; @@ -361,7 +364,7 @@ foreign_expr_walker(Node *node, * can't be sent to remote because it might have incompatible * semantics on remote side. */ - if (!is_builtin(fe-funcid)) + if (!is_builtin(fe-funcid) !is_shippable(fe-funcid, fpinfo)) return false; /* @@ -407,7 +410,7 @@ foreign_expr_walker(Node *node, * (If the operator is, surely its underlying function is * too.) */ - if (!is_builtin(oe-opno)) + if (!is_builtin(oe-opno) !is_shippable(oe-opno, fpinfo)) return false; /* @@ -445,7 +448,7 @@ foreign_expr_walker(Node *node, /* * Again, only built-in operators can be sent to remote. */ - if (!is_builtin(oe-opno)) + if (!is_builtin(oe-opno) !is_shippable(oe-opno, fpinfo)) return false; /* @@ -591,7 +594,7 @@ foreign_expr_walker(Node *node, * If result type of given expression is not built-in, it can't be sent to * remote because it might have incompatible semantics on remote side. */ - if (check_type !is_builtin(exprType(node))) + if (check_type !is_builtin(exprType(node)) !is_shippable(exprType(node), fpinfo)) return false; /* @@ -1404,8 +1407,7 @@ deparseConst(Const *node, deparse_expr_cxt *context) } if (needlabel) appendStringInfo(buf, ::%s, - format_type_with_typemod(node-consttype, - node-consttypmod)); + format_type_be_qualified(node-consttype)); } /* diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c index 7547ec2..9aeaf1a 100644 --- a/contrib/postgres_fdw/option.c +++ b/contrib/postgres_fdw/option.c @@ -15,6 +15,7 @@ #include postgres_fdw.h #include access/reloptions.h +#include catalog/pg_foreign_data_wrapper.h #include catalog/pg_foreign_server.h #include catalog/pg_foreign_table.h #include catalog/pg_user_mapping.h @@ -124,6 +125,10 @@ postgres_fdw_validator(PG_FUNCTION_ARGS) errmsg(%s requires a non-negative numeric value, def-defname))); } + else if (strcmp(def-defname, extensions) == 0) + { + extractExtensionList(defGetString(def), NULL); + } } PG_RETURN_VOID(); @@ -153,6 +158,9 @@ InitPgFdwOptions(void) /* updatable
Re: [HACKERS] [PATCH] postgres_fdw extension support
Here's a third revision that allows the 'extensions' option on the wrapper as well, so that supported extensions can be declared once in one place. Since the CREATE FOREIGN DATA WRAPPER statement is actually called inside the CREATE EXTENSION script for postgres_fdw, the way to get this option is actually to alter the wrapper, ALTER FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( extensions 'seg' ); Right now declaring extensions at different levels is additive, I didn't add the option to revoke an extension for a particular server definition (the cube,-postgis entry for example). If that's a deal-breaker, I can add it too, but it felt like something that could wait for a user to positively declare I must have that feature! P. On Fri, Jul 17, 2015 at 5:58 AM, Paul Ramsey pram...@cleverelephant.ca wrote: On July 17, 2015 at 5:57:42 AM, Simon Riggs (si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote: Options already exist on CREATE FOREIGN DATA WRAPPER, so it should be easy to support that. I'd rather add it once on the wrapper than be forced to list all the options on every foreign server, unless required to do so. Gotcha, that does make sense. P. diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 81cb2b4..9c5136e 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -34,11 +34,15 @@ #include postgres_fdw.h +#include access/genam.h #include access/heapam.h #include access/htup_details.h #include access/sysattr.h #include access/transam.h +#include catalog/dependency.h +#include catalog/indexing.h #include catalog/pg_collation.h +#include catalog/pg_depend.h #include catalog/pg_namespace.h #include catalog/pg_operator.h #include catalog/pg_proc.h @@ -49,8 +53,10 @@ #include optimizer/var.h #include parser/parsetree.h #include utils/builtins.h +#include utils/fmgroids.h #include utils/lsyscache.h #include utils/rel.h +#include utils/snapmgr.h #include utils/syscache.h @@ -136,6 +142,7 @@ static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod, deparse_expr_cxt *context); static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod, deparse_expr_cxt *context); +static bool is_in_extension(Oid procid, PgFdwRelationInfo *fpinfo); /* @@ -229,6 +236,9 @@ foreign_expr_walker(Node *node, Oid collation; FDWCollateState state; + /* Access extension metadata from fpinfo on baserel */ + PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *)(glob_cxt-foreignrel-fdw_private); + /* Need do nothing for empty subexpressions */ if (node == NULL) return true; @@ -361,7 +371,7 @@ foreign_expr_walker(Node *node, * can't be sent to remote because it might have incompatible * semantics on remote side. */ - if (!is_builtin(fe-funcid)) + if (!is_builtin(fe-funcid) !is_in_extension(fe-funcid, fpinfo)) return false; /* @@ -407,7 +417,7 @@ foreign_expr_walker(Node *node, * (If the operator is, surely its underlying function is * too.) */ - if (!is_builtin(oe-opno)) + if (!is_builtin(oe-opno) !is_in_extension(oe-opno, fpinfo)) return false; /* @@ -445,7 +455,7 @@ foreign_expr_walker(Node *node, /* * Again, only built-in operators can be sent to remote. */ - if (!is_builtin(oe-opno)) + if (!is_builtin(oe-opno) !is_in_extension(oe-opno, fpinfo)) return false; /* @@ -591,7 +601,7 @@ foreign_expr_walker(Node *node, * If result type of given expression is not built-in, it can't be sent to * remote because it might have incompatible semantics on remote side. */ - if (check_type !is_builtin(exprType(node))) + if (check_type !is_builtin(exprType(node)) !is_in_extension(exprType(node), fpinfo)) return false; /* @@ -669,6 +679,65 @@ is_builtin(Oid oid) /* + * Returns true if given operator/function is part of an extension declared in the + * server options. + */ +static bool +is_in_extension(Oid procnumber, PgFdwRelationInfo *fpinfo) +{ + static int nkeys = 1; + ScanKeyData key[nkeys]; + HeapTuple tup; + Relation depRel; + SysScanDesc scan
Re: [HACKERS] [PATCH] postgres_fdw extension support
On July 17, 2015 at 12:49:04 AM, Simon Riggs (si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote: On 17 July 2015 at 01:23, Michael Paquier wrote: Well, as I see it there’s three broad categories of behavior available: 1- Forward nothing non-built-in (current behavior) 2- Use options to forward only specified non-built-in things (either in function chunks (extensions, as in this patch) or one-by-one (mark your desired functions / ops) 3- Forward everything if a “forward everything” option is set Then what you are describing here is a parameter able to do a switch among this selection: - nothing, which is to check on built-in stuff - extension list. - all. all seems to be a very blunt instrument but is certainly appropriate in some cases I see an intermediate setting, giving four categories in total 0. Nothing, as now 1. Extension list option on the Foreign Server 2. Extension list option on the Foreign Data Wrapper, applies to all Foreign Servers of that type 3. All extensions allowed I feel like a lot of knobs are being discussed for maximum configurability, but without knowing if people are going to show up with the desire to fiddle them :) if marking extensions is not considered a good API, then I’d lean towards (a) being able to toggle global fowarding on and off combined with (b) the ability to mark individual objects forwardable or not. Which I see, is almost what you’ve described. There’s no facility to add OPTIONS to an EXTENSION right now, so this capability seems to be very much server-by-server (adding a FDW-specific capability to the EXTENSION mechanism seems like overkill for a niche feature addition). But within the bounds of the SERVER, being able to build combinations of allowed/restricted forwarding is certainly powerful, CREATE SERVER foo FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host ‘localhost’, dbname ‘foo’, forward ‘all -postgis’ ); CREATE SERVER foo FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host ‘localhost’, dbname ‘foo’, forward ’none +postgis’ ); CREATE SERVER foo FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host ‘localhost’, dbname ‘foo’, forward ’none +’ ); Once we get to individual functions or operators it breaks down, since of course ‘’ refers to piles of different operators for different types. Their descriptions would be unworkably verbose once you have more than a tiny handful. I’m less concerned with configurability than just the appropriateness of forwarding particular functions. In an earlier thread on this topic the problem of forwarding arbitrary user-defined PL/PgSQL functions was brought up. In passing it was mentioned that maybe VOLATILE functions should *never* be forwarded ever. Or, that only IMMUTABLE functions should be *ever* be forwarded. Since PostGIS includes a generous mix of all kinds of functions, discussing whether that kind of policy is required for any kind of additional forwarding would be interesting. Maybe IMMUTABLE/STABLE/VOLATILE doesn’t actually capture what it is that makes a function/op FORWARDABLE or not. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On July 17, 2015 at 5:57:42 AM, Simon Riggs (si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote: Options already exist on CREATE FOREIGN DATA WRAPPER, so it should be easy to support that. I'd rather add it once on the wrapper than be forced to list all the options on every foreign server, unless required to do so. Gotcha, that does make sense. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
Michael, thanks so much for the review! On July 15, 2015 at 7:35:11 PM, Michael Paquier (michael.paqu...@gmail.com) wrote: This patch includes some diff noise, it would be better to remove that. Done. - if (!is_builtin(fe-funcid)) + if (!is_builtin(fe-funcid) (!is_in_extension(fe-funcid, fpinfo))) Extra parenthesis are not needed. OK, will remove. + if ( (!is_builtin(oe-opno)) (!is_in_extension(oe-opno, fpinfo)) ) ... And this does not respect the project code format. See here for more details for example: http://www.postgresql.org/docs/devel/static/source.html I’m sorry, that link doesn’t clarify for me what’s stylistically wrong here (it’s almost all about error messages), could you help (is it the padding around the conditionals? removed that) + /* PostGIS metadata */ + List *extensions; + bool use_postgis; + Oid postgis_oid; This addition in PgFdwRelationInfo is surprising. What the point of keeping use_postgis and postgres_oid that are actually used nowhere? Whoops, a couple old pieces from my proof-of-concept survived the conversion to a generic features. Removed. appendStringInfo(buf, ::%s, - format_type_with_typemod(node-consttype, - node-consttypmod)); + format_type_be_qualified(node-consttype)); What's the reason for this change? Good question. As I recall, the very sparse search path the FDW connection makes can sometimes leave remote function failing to find other functions they need, so we want to force the calls to be schema-qualified. Unfortunately there’s no perfect public call for that, ideally it would be return format_type_internal(type_oid, typemod, true, false, true), but there’s no function like that, so I settled for format_type_be_qualified(), which forces qualification at the expense of ignoring the typmod. Thinking a bit wider, why is this just limited to extensions? You may have as well other objects defined locally and remotely like functions or operators that are not defined in extensions, but as normal objects. Hence I think that what you are looking for is not this option, but a boolean option enforcing the behavior of code paths using is_builtin() in foreign_expr_walker such as the sanity checks on existing objects are not limited to FirstBootstrapObjectId but to other objects in the catalogs. Well, as I see it there’s three broad categories of behavior available: 1- Forward nothing non-built-in (current behavior) 2- Use options to forward only specified non-built-in things (either in function chunks (extensions, as in this patch) or one-by-one (mark your desired functions / ops) 3- Forward everything if a “forward everything” option is set I hadn’t actually considered the possibility of option 3, but for my purposes it would work just as well, with the added efficiency bonus of not having to check whether particular funcs/ops are inside declared extensions. Both the current state of FDW and the patch I’m providing expect a *bit* of smarts on the part of users, to make sure their remote/local environments are in sync (in particular versions of pgsql and of extensions). Option 3 just ups the ante on that requirement. I’d be fine w/ this, makes the patch very simple and fast. For option 2, marking things one at a time really isn’t practical for a package like PostGIS, with several hundred functions and operators. Using the larger block of “extension” makes more sense. I think in general, expecting users to itemize every func/op they want to forward, particular if they just want an extension to “work” over FDW is too big an expectation. That’s not to minimize the utility of being able to mark individual functions/ops for forwarding, but I think that’s a separate use case that doesn’t eliminate the need for extension-level forwarding. Thanks again for the review! P. That's a risky option because it could lead to inconsistencies among objects, so obviously the default is false but by documenting correctly the risks of using this option we may be able to get something integrated (aka be sure that each object is defined consistently across the servers queried or you'll have surprising results!). In short, it seems to me that you are taking the wrong approach. -- fdw-extension-support-2.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] postgres_fdw extension support
Hi all, Attached is a patch that implements the extension support discussed at PgCon this year during the FDW unconference sesssion. Highlights: * Pass extension operators and functions to the foreign server * Only send ops/funcs if the foreign server is declared to support the relevant extension, for example: CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', port '5432', dbname 'my_db', extensions 'cube, seg'); Github branch is here: https://github.com/pramsey/postgres/tree/fdw-extension-suppport Synthetic pull request for easy browsing/commenting is here: https://github.com/pramsey/postgres/pull/1 Thanks! Paul diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 81cb2b4..bbe3c9d 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -34,11 +34,15 @@ #include postgres_fdw.h +#include access/genam.h #include access/heapam.h #include access/htup_details.h #include access/sysattr.h #include access/transam.h +#include catalog/dependency.h +#include catalog/indexing.h #include catalog/pg_collation.h +#include catalog/pg_depend.h #include catalog/pg_namespace.h #include catalog/pg_operator.h #include catalog/pg_proc.h @@ -49,8 +53,10 @@ #include optimizer/var.h #include parser/parsetree.h #include utils/builtins.h +#include utils/fmgroids.h #include utils/lsyscache.h #include utils/rel.h +#include utils/snapmgr.h #include utils/syscache.h @@ -136,6 +142,7 @@ static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod, deparse_expr_cxt *context); static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod, deparse_expr_cxt *context); +static bool is_in_extension(Oid procid, PgFdwRelationInfo *fpinfo); /* @@ -167,6 +174,7 @@ classifyConditions(PlannerInfo *root, } } + /* * Returns true if given expr is safe to evaluate on the foreign server. */ @@ -177,7 +185,7 @@ is_foreign_expr(PlannerInfo *root, { foreign_glob_cxt glob_cxt; foreign_loc_cxt loc_cxt; - + /* * Check that the expression consists of nodes that are safe to execute * remotely. @@ -207,6 +215,8 @@ is_foreign_expr(PlannerInfo *root, return true; } + + /* * Check if expression is safe to execute remotely, and return true if so. * @@ -229,6 +239,9 @@ foreign_expr_walker(Node *node, Oid collation; FDWCollateState state; + /* Access extension metadata from fpinfo on baserel */ + PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *)(glob_cxt-foreignrel-fdw_private); + /* Need do nothing for empty subexpressions */ if (node == NULL) return true; @@ -361,7 +374,7 @@ foreign_expr_walker(Node *node, * can't be sent to remote because it might have incompatible * semantics on remote side. */ - if (!is_builtin(fe-funcid)) + if (!is_builtin(fe-funcid) (!is_in_extension(fe-funcid, fpinfo))) return false; /* @@ -407,7 +420,7 @@ foreign_expr_walker(Node *node, * (If the operator is, surely its underlying function is * too.) */ - if (!is_builtin(oe-opno)) + if ( (!is_builtin(oe-opno)) (!is_in_extension(oe-opno, fpinfo)) ) return false; /* @@ -445,7 +458,7 @@ foreign_expr_walker(Node *node, /* * Again, only built-in operators can be sent to remote. */ - if (!is_builtin(oe-opno)) + if (!is_builtin(oe-opno) (!is_in_extension(oe-opno, fpinfo))) return false; /* @@ -591,7 +604,7 @@ foreign_expr_walker(Node *node, * If result type of given expression is not built-in, it can't be sent to * remote because it might have incompatible semantics on remote side. */ - if (check_type !is_builtin(exprType(node))) + if (check_type !is_builtin(exprType(node)) (!is_in_extension(exprType(node), fpinfo)) ) return false; /* @@ -643,6 +656,8 @@ foreign_expr_walker(Node *node, return true; } + + /* * Return true if given object is one of PostgreSQL's built-in objects. * @@ -669,6 +684,67 @@ is_builtin(Oid oid) /* + * Returns true if given operator/function is part of an extension declared in the
Re: [HACKERS] Hashable custom types
On July 8, 2015 at 1:36:49 PM, Tom Lane (t...@sss.pgh.pa.us) wrote: Paul Ramsey pram...@cleverelephant.ca writes: UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have this problem? What I'm asking is why it would possibly be sensible to have different notions of equality for hash and btree. In every existing type that has both btree and hash opclasses, the equality members of those opclasses are *the same operator*. You don't really want UNION giving different answers depending on which implementation the planner happens to pick, do you? Well, that’s where things get pretty darn ugly. For reasons of practicality at the time, our equality btree operator ‘=‘ got tied to ‘bounding box equals’ way back in the mists of pre-history. (Basically the reasoning was “all our index work is about bounding boxes!”. I must admit, given my understanding of the zen of PostgreSQL now (and the features that PostgreSQL now has) that would not happen now.) So… yeah. Probably ‘=‘ should be something else, probably something quite expensive but logically defensible like a geometric equality test, but it’s not, and we have lots of third part stuff layered on top of us that expects existing semantics. If getting the objects to UNION means that the btree and hash ops have to be the same, then that just means I’m SOL and will revert to my hack of just casting the objects to bytea to force them to UNION in the recursive CTE. P.
Re: [HACKERS] Hashable custom types
It still says I lack the secret sauce... ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have this problem?
Re: [HACKERS] Hashable custom types
On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca wrote: Is it possible to make custom types hashable? There's no hook in the CREATE TYPE call for a hash function, but can one be hooked up somewhere else? In an operator? See 35.14.6., System Dependencies on Operator Classes Coming back to this, I created an appropriate opclass... CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry) RETURNS boolean AS '$libdir/postgis-2.2', 'lwgeom_hash_eq' LANGUAGE 'c' IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION geometry_hash(geom1 geometry) RETURNS integer AS '$libdir/postgis-2.2', 'lwgeom_hash' LANGUAGE 'c' IMMUTABLE STRICT; -- Availability: 0.9.0 CREATE OPERATOR == ( LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = geometry_hash_eq, COMMUTATOR = '==', RESTRICT = contsel, JOIN = contjoinsel ); CREATE OPERATOR CLASS hash_geometry_ops DEFAULT FOR TYPE geometry USING hash AS OPERATOR 1 == (geometry, geometry), FUNCTION 1 geometry_hash(geometry); I even tested that it as an index! create index hashidx on points using hash ( the_geom_webmercator); CREATE INDEX But when I run my recursive query... WITH RECURSIVE find_cluster(cartodb_id, cluster_id, geom) AS ( (SELECT points.cartodb_id, points.cartodb_id as cluster_id, points.the_geom_webmercator as geom FROM points WHERE points.cartodb_id in (select cartodb_id from points)) UNION (SELECT pts.cartodb_id, n.cluster_id, pts.the_geom_webmercator as geom FROM points pts JOIN find_cluster n ON ST_DWithin(n.geom, pts.the_geom_webmercator, 2) WHERE n.cartodb_id pts.cartodb_id) ) SELECT * FROM find_cluster; It still says I lack the secret sauce... ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. What's the sauce? Thanks! P -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Extension support for postgres_fdw
I would like to enhance the postgres_fdw to allow more complete support for user-defined types. Right now, postgres_fdw already does a good job of passing user-defined type data back and forth, which is pretty nice. However, it will not pass functions or operators that use user-defined types to the remote host. For a extension like PostGIS, that means that spatial filters cannot be executed on remote servers, which makes FDW not so useful for PostGIS. I think the postgres_fdw extension should pass user-defined functions and operators, but only when it knows those functions and operators exist at the remote. One way would be to ask the remote what extensions it has, but the overhead of doing that is a bit high. A simpler way would be to just have the DBA declare what extensions the remote will have, when she creates the server definition, for example: CREATE SERVER fire_department_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host 'fire.city.gov', dbname 'infrastructure', port '5432', extensions 'postgis, seg' ); Once the local host knows what extensions to expect on the remote side, it can retain functions and operators in those extensions in the set of remote restrictions and deparse them for use in the query of the remote. Basically, everywhere there is a call to is_builtin(Oid oid) now, there's also be a call to is_allowed_extension() (or somesuch) as well. There is a PostGIS-specific implementation of this concept here: https://github.com/pramsey/postgres/blob/9.4-postgres-fdw-postgis/contrib/postgres_fdw If the approach above sounds OK, I'll genericize my PostGIS specific code to hand arbitrary extensions and submit a patch. Thanks! Paul -- Paul Ramsey http://cleverelephant.ca http://postgis.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GiST KNN Crasher
I'm implementing the recheck functionality for PostGIS so we can support it when 9.5 comes out, and came across this fun little crasher. This works: select id, name from geonames order by geom - 'SRID=4326;POINT(-75.6163 39.746)'::geometry limit 10; This crashes (just reversing the argument order to the - operator): select id, name from geonames order by 'SRID=4326;POINT(-75.6163 39.746)'::geometry - geom limit 10; The stack trace on crash looks like this: * thread #1: tid = 0x8d2bb, 0x000100455247 postgres`create_indexscan_plan(root=0x7fbf8d005c80, best_path=0x7fbf8b823600, tlist=0x7fbf8d0088d0, scan_clauses=0x, indexonly='\0') + 1063 at createplan.c:1354, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x24) * frame #0: 0x000100455247 postgres`create_indexscan_plan(root=0x7fbf8d005c80, best_path=0x7fbf8b823600, tlist=0x7fbf8d0088d0, scan_clauses=0x, indexonly='\0') + 1063 at createplan.c:1354 frame #1: 0x0001004518c9 postgres`create_scan_plan(root=0x7fbf8d005c80, best_path=0x7fbf8b823600) + 377 at createplan.c:360 frame #2: 0x00010044e749 postgres`create_plan_recurse(root=0x7fbf8d005c80, best_path=0x7fbf8b823600) + 73 at createplan.c:248 frame #3: 0x00010044e691 postgres`create_plan(root=0x7fbf8d005c80, best_path=0x7fbf8b823600) + 113 at createplan.c:209 frame #4: 0x000100460770 postgres`grouping_planner(root=0x7fbf8d005c80, tuple_fraction=0.048008487900660838) + 4560 at planner.c:1736 frame #5: 0x00010045e1dd postgres`subquery_planner(glob=0x7fbf8b823950, parse=0x7fbf8b823060, parent_root=0x, hasRecursion='\0', tuple_fraction=0, subroot=0x7fff5faf7028) + 2461 at planner.c:619 frame #6: 0x00010045d4a2 postgres`standard_planner(parse=0x7fbf8b823060, cursorOptions=0, boundParams=0x) + 450 at planner.c:229 frame #7: 0x00010045d2d1 postgres`planner(parse=0x7fbf8b823060, cursorOptions=0, boundParams=0x) + 81 at planner.c:157 frame #8: 0x00010054ab6c postgres`pg_plan_query(querytree=0x7fbf8b823060, cursorOptions=0, boundParams=0x) + 140 at postgres.c:809 frame #9: 0x00010054ac43 postgres`pg_plan_queries(querytrees=0x7fbf8d006230, cursorOptions=0, boundParams=0x) + 115 at postgres.c:868 frame #10: 0x00010054d920 postgres`exec_simple_query(query_string=0x7fbf8b821a38) + 800 at postgres.c:1033 frame #11: 0x00010054cda2 postgres`PostgresMain(argc=1, argv=0x7fbf8b8066d0, dbname=0x7fbf8b806538, username=0x7fbf8b806518) + 2546 at postgres.c:4025 frame #12: 0x0001004b17fe postgres`BackendRun(port=0x7fbf8b4079d0) + 686 at postmaster.c:4162 frame #13: 0x0001004b0d90 postgres`BackendStartup(port=0x7fbf8b4079d0) + 384 at postmaster.c:3838 frame #14: 0x0001004ad497 postgres`ServerLoop + 663 at postmaster.c:1594 frame #15: 0x0001004aab9c postgres`PostmasterMain(argc=3, argv=0x7fbf8b407760) + 5644 at postmaster.c:1241 frame #16: 0x0001003e649d postgres`main(argc=3, argv=0x7fbf8b407760) + 749 at main.c:221 frame #17: 0x7fff8ca915fd libdyld.dylib`start + 1 And my SQL declarations look like this: #if POSTGIS_PGSQL_VERSION = 95 -- Availability: 2.2.0 CREATE OR REPLACE FUNCTION geography_knn_distance(geography, geography) RETURNS float8 AS 'MODULE_PATHNAME','geography_distance' LANGUAGE 'c' IMMUTABLE STRICT COST 100; -- Availability: 2.2.0 CREATE OPERATOR - ( LEFTARG = geography, RIGHTARG = geography, PROCEDURE = geography_knn_distance, COMMUTATOR = '-' ); -- Availability: 2.2.0 CREATE OR REPLACE FUNCTION geography_gist_distance(internal, geography, int4) RETURNS float8 AS 'MODULE_PATHNAME' ,'gserialized_gist_distance' LANGUAGE 'c'; #endif -- Availability: 1.5.0 CREATE OPERATOR CLASS gist_geography_ops DEFAULT FOR TYPE geography USING GIST AS STORAGE gidx, OPERATOR3 , -- OPERATOR6~= , -- OPERATOR7~ , -- OPERATOR8@ , #if POSTGIS_PGSQL_VERSION = 95 -- Availability: 2.2.0 OPERATOR13 - FOR ORDER BY pg_catalog.float_ops, FUNCTION8geography_gist_distance (internal, geography, int4), #endif FUNCTION1geography_gist_consistent (internal, geography, int4), FUNCTION2geography_gist_union (bytea, internal), FUNCTION3geography_gist_compress (internal), FUNCTION4geography_gist_decompress (internal), FUNCTION5geography_gist_penalty (internal, internal, internal), FUNCTION6geography_gist_picksplit (internal, internal), FUNCTION7geography_gist_same (box2d, box2d, internal); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] compress method for spgist - 2
On Wed, Feb 25, 2015 at 6:13 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: In the original post on this, you mentioned that the PostGIS guys planned to use this to store polygons, as bounding boxes (http://www.postgresql.org/message-id/5447b3ff.2080...@sigaev.ru). Any idea how would that work? Poorly, by hanging boxes that straddled dividing lines off the parent node in a big linear list. The hope would be that the case was sufficiently rare compared to the overall volume of data, to not be an issue. Oddly enough this big hammer has worked in other implementations at least passable well (https://github.com/mapserver/mapserver/blob/branch-7-0/maptree.c#L261) P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hashable custom types
When trying to write a recursive CTE using the PostGIS geometry type, I was told this: ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. Is it possible to make custom types hashable? There's no hook in the CREATE TYPE call for a hash function, but can one be hooked up somewhere else? In an operator? Thanks, P -- Paul Ramsey http://cleverelephant.ca http://postgis.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Traffic jams in fn_extra
On Sunday, November 24, 2013 at 4:42 PM, Tom Lane wrote: The real question of course is whether transaction-level caching is appropriate for what they're storing. If they want only statement-level caching then using fn_extra is often the right thing. I'm not certain it is... we get some great effects out of the statement level stuff, and it really works great except for those cases where somebody else is already taking the slot (SRF_*) Also note that having the cache go away is the easy part. The hard part is knowing whether you've created it yet in the current transaction, and finding it if you have. The usual method is to keep a static variable pointing to it, and plugging into the transaction cleanup callback mechanism with a routine that'll reset the pointer to NULL at transaction end. For examples, look for callers of RegisterXactCallback(). I'm glad you said that, because I felt too stoopid to ask :) my previous spelunking through memory contexts showed that it's easy to get the memory, hard to find it again. Which is why fn_extra is so appealing, it's just sitting there, and the parent context goes away at the end of the query, so it's wonderfully simple to use... if there's not already someone in it. Thanks for the advice, it could be very helpful for my pointcloud work, since having access to a cache object during SRF_* stuff could improve performance quite a bit, so the complexity trade-off of using the transactional context could be well worth it. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Traffic jams in fn_extra
Hi Simon, We do the dance because it’s how we always have and don’t know any other way, any better way. :) The usual explanation. Is there any place you can point to that demonstrates your technique? Thanks! P -- Paul Ramsey http://cleverelephant.ca/ http://postgis.net/ On Sunday, November 24, 2013 at 8:21 AM, Simon Riggs wrote: On 19 November 2013 23:08, Paul Ramsey pram...@cleverelephant.ca (mailto:pram...@cleverelephant.ca) wrote: On the solution, I wasn't suggesting another void* slot, but rather a slot that holds a hash table, so that an arbitrary number of things can be stuffed in. Overkill, really, since in 99.9% of times only one thing would be in there, and in the other 0.1% of times two things. In our own GenericCacheCollection, we just statically allocate 16 slots. Why do you need to do this dance with fn_extra? It's possible to allocate a hash table in a Transaction-lifetime memory context on first call into a function then cache things there. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Traffic jams in fn_extra
As an extension with a lot of CPU load, we (postgis) tend to use flinfo-fn_extra a lot, for caching things that are intensive to calculate at the start of a query and reuse throughout subsequent functions calls. - coordinate projection objects - indexes of the edges of large geometries - other kinds of indexes of the edges of large geometries :) - schemas of lidar pointcloud collections As we've added different kinds of caching, in our own project, we've banged up against problems of multiple functions trying to stuff information into the same pointer, and ended up putting an extra container of our own into fn_extra, to hold the different kinds of stuff we might want to store, a GenericCacheCollection https://github.com/postgis/postgis/blob/svn-trunk/libpgcommon/lwgeom_cache.c#L46-L48 As (by now) a connoisseur of fn_extra caching, I've noticed while reading bits of PgSQL code that there are far more places that stuff state into fn_extra than I ever knew, and that they do so without any substantial concern that other state might already be there. (Well, that's not true, they usually check for NULL and they give up if fn_extra is already in use.) The range types, I was surprised to find doing some performance caching in fn_extra. The set-returning function macros use it to hold state. And many others I'm sure. Would it be good/wise to add another, better managed, slot to flinfo, one that isn't just void* but is a hash? (Or, has some management macros to handle it and make it a hash* if it's null, whatever API makes sense) so that multiple bits of code can cache state over function calls without banging into one another? flinfo-fn_extra_hash perhaps? If this sounds OK, I'd be honored to try and make it my first submission to PgSQL. P. -- Paul Ramsey http://cleverelephant.ca http://postgis.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Traffic jams in fn_extra
On Tue, Nov 19, 2013 at 7:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Paul Ramsey pram...@cleverelephant.ca writes: As we've added different kinds of caching, in our own project, we've banged up against problems of multiple functions trying to stuff information into the same pointer, and ended up putting an extra container of our own into fn_extra, to hold the different kinds of stuff we might want to store, a GenericCacheCollection TBH, I fail to understand what you're on about here. Any one function owns the value of fn_extra in calls to it, and is solely responsible for its usage; furthermore, there's no way for any other code to mangle that pointer unless the owner explicitly makes it available. So where is the problem? And if there is a problem, how does adding another field of exactly the same kind make it better? Right, sorry, I'm reasoning overly aggressively from the specific to the general. The specific problems have been - two lines of geometry caching code, either of which can be called within a single function, depending on the inputs, which mostly didn't result in conflicts, except when they did, which was eventually rectified by layering a GenericCacheCollection into the function - a cached lidar schema object which would have been very useful to have around in an SRF, but couldn't because the SRF needed the fn_extra slot The first case is an application-specific problem, and since we've coded around it, the only advantage to a pgsql-specific fix would be to allow others who also need to cache several independent things to not reinvent that wheel. The second case is one of the instances where the pgsql code itself is getting in the way and cannot be worked around at the application level. My solution was just not to do caching for that function. So, that's what I perceive the problem to be. Now that you point it out to me, yes, it's pretty small bore stuff. On the solution, I wasn't suggesting another void* slot, but rather a slot that holds a hash table, so that an arbitrary number of things can be stuffed in. Overkill, really, since in 99.9% of times only one thing would be in there, and in the other 0.1% of times two things. In our own GenericCacheCollection, we just statically allocate 16 slots. P. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I just got out of a meeting that included Oracle Spatial folks, who were boasting of big performance increases in enabling parallel query on their spatial queries. Basically the workloads on things like big spatial joins are entirely CPU bound, so they are seeing that adding 15 processors makes things 15x faster. Spatial folks would love love love to see parallel query execution. -- Paul Ramsey http://cleverelephant.ca http://postgis.net
Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash
Further notes, from Andrew (RhodiumToad) on IRC about the cause of this crasher: [12:03pm] RhodiumToad: what happens is this [12:04pm] RhodiumToad: postquel_start know this statement doesn't return the result, so it supplies None_Receiver as the dest-receiver for the query [12:04pm] RhodiumToad: however, it knows it's a plannedStmt, so it fires up the full executor to run it [12:05pm] RhodiumToad: and the executor allocates a new destreceiver in its own memory context, replaces es-qd-dest with it, [12:05pm] RhodiumToad: (the new destreceiver is the one that writes tuples to the created table) [12:06pm] RhodiumToad: then at executorEnd (called from postquel_end), executor shutdown closes the new rel, _and then frees the executor's memory context, including the destreceiver it created [12:07pm] RhodiumToad: postquel_end doesn't know that its setting of -dest was clobbered, so it goes to try and destroy it again, and gets garbage (if assertions are on) [12:07pm] RhodiumToad: if assertions weren't on, then the rDestroy call is harmless [12:07pm] RhodiumToad: well, mostly harmless [12:07pm] RhodiumToad: sneaky one, that [12:09pm] RhodiumToad: you can confirm it by tracing through that second call to postquel_end and confirming that it's the call to ExecutorEnd that stomps the content of qd-dest [12:12pm] pramsey: confirmed, the pass through ExecutorEnd has clobbered the value so there's garbage when it arrives at line 638 [12:14pm] RhodiumToad: if you trace through ExecutorEnd itself, it should be the FreeExecutorState that does it [12:15pm] RhodiumToad: wonder how far back this bug goes [12:16pm] RhodiumToad: actually not very far [12:17pm] RhodiumToad: older versions just figured that qd-dest was always None_Receiver and therefore did not need an rDestroy call [12:17pm] RhodiumToad: (which is a no-op for None_Receiver) [12:17pm] pramsey: kills my 8.4 [12:17pm] RhodiumToad: so this is broken in 8.4+ [12:17pm] pramsey: ah [12:18pm] RhodiumToad: 8.4 introduced the lazy-eval of selects in sql functions [12:19pm] RhodiumToad: prior to that they were always run immediately to completion [12:19pm] RhodiumToad: that requires juggling the destreceiver a bit, hence the bug [12:20pm] RhodiumToad: btw, the first statement of the function shouldn't be needed [12:21pm] RhodiumToad: just ... as $f$ create table foo as select 1 as x; $f$; should be enough to break it [12:31pm] RhodiumToad: there's no trivial fix On Wed, Jan 4, 2012 at 11:32 AM, Paul Ramsey pram...@cleverelephant.ca wrote: One extra detail, my PostgreSQL is compiled with --enable-cassert. This is required to set off the killer function. On Wed, Jan 04, 2012 at 07:17:17PM +, pram...@cleverelephant.ca wrote: The following bug has been logged on the website: Bug reference: 6379 Logged by: Paul Ramsey Email address: pram...@cleverelephant.ca PostgreSQL version: 9.1.2 Operating system: OSX 10.6.8 Description: CREATE OR REPLACE FUNCTION kill_backend() RETURNS VOID AS $$ DROP TABLE if EXISTS foo; CREATE TABLE foo AS SELECT * FROM pg_class LIMIT 1; $$ LANGUAGE 'sql'; SELECT kill_backend(); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
(And, if we are going to break everything in sight, now would be a good time to think about changing typmod to something more flexible than one int32.) As someone who is jamming geometry type, spatial reference number and dimensionality into said 32bit typmod, let me say emphatically ... Amen! P
Re: [HACKERS] knngist - 0.8
On Sat, Oct 16, 2010 at 10:17 AM, Peter Eisentraut pete...@gmx.net wrote: On lör, 2010-10-16 at 09:23 -0700, Paul Ramsey wrote: (And, if we are going to break everything in sight, now would be a good time to think about changing typmod to something more flexible than one int32.) As someone who is jamming geometry type, spatial reference number and dimensionality into said 32bit typmod, let me say emphatically ... Amen! So what kind of data structure would you like for a typmod? I'm a primitive enough beast that just having 64-bits would make me happy. As a general matter though, a bytea? P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] KNNGiST for knn-search (WIP)
I'm sure whatever conclusion -hackers comes to in the end will be the best for pgsql, and I'll be supportive. But until then, let me note from the PostGIS point-of-view: sure would be great to get this in for 8.5 :) P. On Thu, Dec 31, 2009 at 4:26 AM, Greg Stark gsst...@mit.edu wrote: On Wed, Dec 30, 2009 at 4:56 PM, Robert Haas robertmh...@gmail.com wrote: From my point of view, what makes a patch invasive is the likelihood that it might break something other than itself. For example, your patch touches the core planner code and the core GIST code, so it seems possible that adding support for this feature might break something else in one of those areas. It doesn't seem obvious to me that this is a high-risk patch. It's touching the planner which is tricky but it's not the kind of massive overhaul that touches every module that HOT or HS were. I'm really glad HS got in before the end because lots of people with different areas of expertise and different use cases are going to get to exercise it in the time remaining. This patch I would expect relatively few people to need to try it out before any oversights are caught. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER in 8.3.5 on GIST indexes loses all rows
FYI, this is not limited to PostGIS GIST, it appears to be all GIST (gid is an integer column and I have loaded btree_gist): pramsey=# create table ttt as select * from counties; SELECT pramsey=# create index gid_bix on ttt using gist (gid); CREATE INDEX pramsey=# select count(*) from ttt; count --- 3141 (1 row) pramsey=# cluster ttt using gid_bix; CLUSTER pramsey=# select count(*) from ttt; count --- 0 (1 row) pramsey=# select version(); version -- PostgreSQL 8.3.5 on i386-apple-darwin9.5.0, compiled by GCC i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465) (1 row) On Fri, Dec 5, 2008 at 6:05 AM, Gregory Stark [EMAIL PROTECTED] wrote: Mark Cave-Ayland [EMAIL PROTECTED] writes: So in other words, the contents of the temporary table has just disappeared :( Uhm. That rather sucks. I was able to reproduce it too. It seems to happen after I pause for a bit, and not when I run the script in fast succession. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Access to Row ID information in Functions
In PostGIS, we have a problem, in that spatial operations are very costly, CPUwise. We have hit on a nifty enhancement recently, which was to recognize that when processing multiple rows, in joins or with literal argouments, for most functions of the form GeometryOperation(A, B), A (or B) tended to remain constant, while the other argument changed. That meant that we could build an optimized form of the more-constant argument (using internal index structures on the object segments) that allows testing the changing argument much more quickly. The optimized form gets cached and retrieved from a memory context. Each time the function is run within a statement it checks the cache, and sees if one of its arguments are the same as the last time around. If so, it uses the prepared version of that argument. If not, it builds a new prepared version and caches that. The key here is being able to check the identify of the arguments... is this argument A the same as the one we processed last time? One way is to do a memcmp. But it seems likely that PgSQL knows exactly whether it is running a nested loop, or a literal, and could tell somehow that argument A is the same with each call. For lack of a better term, if we knew what the row id of each argument was as the function was called, we could skip the memcmp testing of geometric identity (which gets more expensive precisely at the time our optimization gets more effective, for large arguments) and just trust the row id as the guide of when to build and cache new optimized representations. Any guidance? Thanks, Paul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Access to Row ID information in Functions
Thanks Tom, Yes, we've discussed adding some kind of optional identity information to the object, it remains a potential course of action. Paul On Tue, Apr 1, 2008 at 2:37 PM, Tom Lane [EMAIL PROTECTED] wrote: Paul Ramsey [EMAIL PROTECTED] writes: The optimized form gets cached and retrieved from a memory context. Each time the function is run within a statement it checks the cache, and sees if one of its arguments are the same as the last time around. If so, it uses the prepared version of that argument. If not, it builds a new prepared version and caches that. The key here is being able to check the identify of the arguments... is this argument A the same as the one we processed last time? One way is to do a memcmp. But it seems likely that PgSQL knows exactly whether it is running a nested loop, or a literal, and could tell somehow that argument A is the same with each call. Not really. Certainly there's no way that that information would propagate into function calls. In the special case where your argument is a literal constant, I think there is enough information available to detect that that's the case (look at get_fn_expr_argtype). But if it's not, there's no very good way to know whether it's the same as last time. Perhaps it would be worth changing your on-disk storage format to allow cheaper checking? For instance include a hash value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where to find kind code for STATISTIC_KIND GEOMETRY?
Just take 150-199, and submit a patch to HACKERS to updates the comment in pg_statistic appropriately. I am sure the it will be some time before we invent another 49 kinds of selectivity statistic. P Ale Raza wrote: Tom, What numbers you can reserve for our geometry type, 200 - 299? Ale. -Original Message- From: Paul Ramsey [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 01, 2007 3:58 PM To: Ale Raza; pgsql-hackers@postgresql.org Cc: John Baleja; PostGIS Development Discussion Subject: Re: [HACKERS] Where to find kind code for STATISTIC_KIND GEOMETRY? For all that Tom reserved 100 numbers for us, we're only using one right now. lwgeom_estimate.c:47:#define STATISTIC_KIND_GEOMETRY 100 Paul Alvaro Herrera wrote: Ale Raza wrote: Simon, I am forwarding this to pgsql-hackers. I can send the requirements once I get the right contact. This is it, but as Simon stated, you probably want to get the PostGIS guys involved too, so that duplicates can be sorted out. -- Paul Ramsey Refractions Research http://www.refractions.net [EMAIL PROTECTED] Phone: 250-383-3022 Cell: 250-885-0632 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Where to find kind code for STATISTIC_KIND GEOMETRY?
For all that Tom reserved 100 numbers for us, we're only using one right now. lwgeom_estimate.c:47:#define STATISTIC_KIND_GEOMETRY 100 Paul Alvaro Herrera wrote: Ale Raza wrote: Simon, I am forwarding this to pgsql-hackers. I can send the requirements once I get the right contact. This is it, but as Simon stated, you probably want to get the PostGIS guys involved too, so that duplicates can be sorted out. -- Paul Ramsey Refractions Research http://www.refractions.net [EMAIL PROTECTED] Phone: 250-383-3022 Cell: 250-885-0632 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Modifying and solidifying contrib
Jim Nasby wrote: In addition to Martijn's tsearch case, there's also PostGIS. And I believe this is a pretty big pain for them. Hear hear! It would be nice to dump from an old PostgreSQL/PostGIS combination and restore to a new version combination, without taking all the function definitions along for a ride in the dump process. What we really want is just the data. -- Paul Ramsey Refractions Research http://www.refractions.net [EMAIL PROTECTED] Phone: 250-383-3022 Cell: 250-885-0632 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] uuid type for postgres
Just an FYI: We also ended up rolling our own uuid type, against libuuid. It seems that uuid is a widespread enough concept that implementors *will* be asked to support it, moderately often. We *could* roll our own (were capable), others are not so lucky, and would have to point out the lack of a uuid type as a limitation of pgsql. Which is too bad, given how relatively simple they are. That said: - linking against libuuid is fine for a contrib/ extension, but no good for a built-in type. A real uuid would have to do a proper independent implementation of uuid creation within pgsql. - we cannot snarf libuuid code, it is LGPL (though perhaps the author would re-license. if that is the *only* objection, it is well worth asking) I think having a built-in uuid type is something that a large number of people will use. Whether they use it will or badly is not our problem. It is possible to build crappy databases with all the types that already exist, adding uuid is hardly going to bring the walls down. Having uuid removes another excuse for people not doing a pgsql implementation. I am not sure if I heard clearly from the core team that a self- contained, BSD-licensed uuid would be accepted(able)? If so, I'll contact the libuuid author about a re-license (shortest path from A to B). P. On 6-Sep-05, at 6:50 AM, nathan wagner wrote: I have been in need of a uuid type and ran across the pguuid download by Xiongjian (Mike) Wang. This wasn't really useful to me for two reasons: first, it is GPLed and I would prefer a more liberal license, secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a SIOCGETIFHWADDR (? i think, i can get the exact name if you want it) ioctl() under darwin. While I could dike out the code that calls it, that seems like a suboptimal solution. So after a bit of poking around the interweb i ran across Ralf Engelschall's ossp uuid library. This compiled with minimal effort on mac os. Some reading, and an evening later, i've made a server plugin with supporting SQL that implements an 'ossp_uuid' type. Now i have four questions: 1: Is it feasible for this to be included in the contrib section of the regular download? The uuid library is a notice of copyright style license, and I am willing to put my own code into the public domain. 2: Would just calling the type 'uuid' be better than 'ossp_uuid'? It's certainly a nicer name. 3: Would it be possible to include such a type as a postgres extension to the usual SQL types. It seems to me that having an officially supported type would be better than a user contributed type on the grounds that you could then rely on it being avaiable if postgres was. In particular, installing it as an extension would require the cooperation of the DBA, which may be infeasible in some environments. -- Nathan Wagner ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] $libdir and 8.0
Short bug report: When installing PgSQL into a non-standard location (like /opt/foo) the configure script decides to install all the contrib libraries and plpglsq into /opt/foo/lib/postgresql. This would be fine, except that backend does not recognize this directory as a place to be searched for $libdir (perhaps it is referencing the $libdir macro instead of the $pkglibdir macro?). So tools like 'createlang' fail, and loading .sql files that reference things like $libdir/libfoo.so also fail. Paul ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] $libdir and 8.0
The expansions in the build scripts all seem correct, and in fact, all the libraries are installed in the right place, both the system stuff (plpgsql.so) and the contrib stuff (libpostgis.so) ends up in /opt/foo/lib/postgresql. It is the actual binaries that seem to not know where $libdir is supposed to be. prefix := /home/pramsey/pgtest/8.0 exec_prefix := ${prefix} libdir := ${exec_prefix}/lib pkglibdir = $(libdir) ifeq $(findstring pgsql, $(pkglibdir)) ifeq $(findstring postgres, $(pkglibdir)) override pkglibdir := $(pkglibdir)/postgresql endif endif Bruce Momjian wrote: Devrim GUNDUZ wrote: Hi, On Wed, 18 Aug 2004, Paul Ramsey wrote: When installing PgSQL into a non-standard location (like /opt/foo) the configure script decides to install all the contrib libraries and plpglsq into /opt/foo/lib/postgresql. This would be fine, except that backend does not recognize this directory as a place to be searched for $libdir (perhaps it is referencing the $libdir macro instead of the $pkglibdir macro?). So tools like 'createlang' fail, and loading .sql files that reference things like $libdir/libfoo.so also fail. I'm not sure but if you add /opt/foo/lib/postgresql to /etc/ld.so.conf and run ldconfig, it might work. I checked in the code and $libdir should expand to $(pkglibdir) as determined by configure. What value to you show for that in your Makefile.global? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] $libdir and 8.0
Check this out! [EMAIL PROTECTED] bin]$ ./pg_config --pkglibdir /home/pramsey/pgtest/8.0/bin/lib/postgresql ^^^ And yet: ./port/pg_config_paths.h:#define PKGLIBDIR /home/pramsey/pgtest/8.0/lib/postgresql Could the problem be here? (port/path.c): /* * get_pkglib_path */ void get_pkglib_path(const char *my_exec_path, char *ret_path) { const char *p; if ((p = relative_path(PGBINDIR, PKGLIBDIR))) make_relative(my_exec_path, p, ret_path); else StrNCpy(ret_path, PKGLIBDIR, MAXPGPATH); canonicalize_path(ret_path); } Bruce Momjian wrote: Devrim GUNDUZ wrote: Hi, On Wed, 18 Aug 2004, Paul Ramsey wrote: When installing PgSQL into a non-standard location (like /opt/foo) the configure script decides to install all the contrib libraries and plpglsq into /opt/foo/lib/postgresql. This would be fine, except that backend does not recognize this directory as a place to be searched for $libdir (perhaps it is referencing the $libdir macro instead of the $pkglibdir macro?). So tools like 'createlang' fail, and loading .sql files that reference things like $libdir/libfoo.so also fail. I'm not sure but if you add /opt/foo/lib/postgresql to /etc/ld.so.conf and run ldconfig, it might work. I checked in the code and $libdir should expand to $(pkglibdir) as determined by configure. What value to you show for that in your Makefile.global? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] $libdir and 8.0
I am afraid it gets crazier... I put /home/pramsey/pgtest/8.0/bin on my $PATH Now: [EMAIL PROTECTED] 8.0]$ pg_config --pkglibdir /home/pramsey/pgtest/8.0/lib/postgresql Hm, correct. [EMAIL PROTECTED] 8.0]$ ./bin/pg_config --pkglibdir /home/pramsey/pgtest/8.0/./lib/postgresql Also correct, but getting strangely relative. [EMAIL PROTECTED] 8.0]$ cd bin [EMAIL PROTECTED] bin]$ ./pg_config --pkglibdir /home/pramsey/pgtest/8.0/bin/lib/postgresql Now incorrect. The answer depends on where I am asking from. [EMAIL PROTECTED] bin]$ ./pg_config --bindir /home/pramsey/pgtest/8.0/bin/. Using the direct request. Using the $PATH assisted request: [EMAIL PROTECTED] bin]$ pg_config --bindir /home/pramsey/pgtest/8.0/bin Entirely correct. Problem solved? No... [EMAIL PROTECTED] bin]$ createlang plpgsql test ERROR: could not access file $libdir/plpgsql: No such file or directory createlang: language installation failed: ERROR: could not access file $libdir/plpgsql: No such file or directory If I copy plpgsql.so into /home/pramsey/pgtest/8.0/lib then things work. Bruce Momjian wrote: Ah, what is your $bindir? Is it /home/pramsey/pgtest/8.0/bin/postgresql? --- Paul Ramsey wrote: Check this out! [EMAIL PROTECTED] bin]$ ./pg_config --pkglibdir /home/pramsey/pgtest/8.0/bin/lib/postgresql ^^^ And yet: ./port/pg_config_paths.h:#define PKGLIBDIR /home/pramsey/pgtest/8.0/lib/postgresql Could the problem be here? (port/path.c): /* * get_pkglib_path */ void get_pkglib_path(const char *my_exec_path, char *ret_path) { const char *p; if ((p = relative_path(PGBINDIR, PKGLIBDIR))) make_relative(my_exec_path, p, ret_path); else StrNCpy(ret_path, PKGLIBDIR, MAXPGPATH); canonicalize_path(ret_path); } Bruce Momjian wrote: Devrim GUNDUZ wrote: Hi, On Wed, 18 Aug 2004, Paul Ramsey wrote: When installing PgSQL into a non-standard location (like /opt/foo) the configure script decides to install all the contrib libraries and plpglsq into /opt/foo/lib/postgresql. This would be fine, except that backend does not recognize this directory as a place to be searched for $libdir (perhaps it is referencing the $libdir macro instead of the $pkglibdir macro?). So tools like 'createlang' fail, and loading .sql files that reference things like $libdir/libfoo.so also fail. I'm not sure but if you add /opt/foo/lib/postgresql to /etc/ld.so.conf and run ldconfig, it might work. I checked in the code and $libdir should expand to $(pkglibdir) as determined by configure. What value to you show for that in your Makefile.global? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] $libdir and 8.0
Sounds like we are getting close. From strace: stat64(/home/pramsey/pgtest/8.0/bin/lib/postgresql/plpgsql, 0xbfffe480) = -1 ENOENT (No such file or directory) stat64(/home/pramsey/pgtest/8.0/bin/lib/postgresql/plpgsql.so, 0xbfffe480) = -1 ENOENT (No such file or directory) stat64($libdir/plpgsql, 0xbfffe540) = -1 ENOENT (No such file or directory) I started the database from /home/pramsey/pgtest/8.0/bin using ./pg_ctl start -D /home/pramsey/pgtest/8.0/data Judging from the behavior I have posted to the list of pg_config --pkglibdir, if I started it from any *other* location at all then things would have worked. Paul Tom Lane wrote: Paul Ramsey [EMAIL PROTECTED] writes: ... It is the actual binaries that seem to not know where $libdir is supposed to be. Where do they think it is? A useful way to check would be to strace the backend while you're executing createlang or another command that tries to load a dynamic library. The file names that it tries to stat() would be good evidence. Also, how are you starting the postmaster? I believe that current sources will try to locate $libdir via a relative path from where the postmaster executable is. We've fixed some bugs in that area recently, but maybe there are more. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] $libdir and 8.0
Things definately do not like to be run from within bin. I added /home/pramsey/pgtest/8.0/bin to my PATH and then ran everything without a direct path reference: [EMAIL PROTECTED] 8.0]$ pg_ctl start -D /home/pramsey/pgtest/8.0/data postmaster starting LOG: database system was shut down at 2004-08-18 22:00:32 PDT LOG: checkpoint record is at 0/B2797C LOG: redo record is at 0/B2797C; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 628; next OID: 17263 LOG: database system is ready [EMAIL PROTECTED] 8.0]$ createdb test CREATE DATABASE [EMAIL PROTECTED] 8.0]$ createlang plpgsql test [EMAIL PROTECTED] 8.0]$ Magically, everything now works :) Tom Lane wrote: Paul Ramsey [EMAIL PROTECTED] writes: ... It is the actual binaries that seem to not know where $libdir is supposed to be. Where do they think it is? A useful way to check would be to strace the backend while you're executing createlang or another command that tries to load a dynamic library. The file names that it tries to stat() would be good evidence. Also, how are you starting the postmaster? I believe that current sources will try to locate $libdir via a relative path from where the postmaster executable is. We've fixed some bugs in that area recently, but maybe there are more. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] $libdir and 8.0
Thanks Tom, Yes, this is beta1, not the CVS tip, so all is golden. Paul Tom Lane wrote: Paul Ramsey [EMAIL PROTECTED] writes: I started the database from /home/pramsey/pgtest/8.0/bin using ./pg_ctl start -D /home/pramsey/pgtest/8.0/data Ah-hah ... and this is 8.0beta1, right, not anything later? I fixed some problems associated with . and .. in the executable's path last week. I think if you pull the current nightly snapshot or CVS you will find the problem gone. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Default Locale in initdb
Just because it is not new does not mean that it is good. When this new behavior was introduced, and I migrated our databases to the new PgSQL version (dump/restore), the locale of all my databases were silently changed from C to US_en. This broke one application in a very subtle way because of slightly different sort behavior in the different locale. Tracking it down was quite tricky. PgSQL was just a little too helpful in this case. Andrew Dunstan wrote: [EMAIL PROTECTED] wrote: Is it me or has the default locale of created databases change at some point? Currently, on Linux, if one does not specify a locale, the locale is taken from the system environment and it is not C. While I can both sides of a discussion, I think that choosing a locale without one being specified is a bad idea, even if it is the locale of the machine. The reason why it is a bad idea is that certain features of the database which only work correctly with a locale of C will not work by default. This is not new behaviour. (Why are you the only person who posts here who is nameless?) cheers andrew -- __ / | Paul Ramsey | Refractions Research | Email: [EMAIL PROTECTED] | Phone: (250) 885-0632 \_ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow
Tom Lane wrote: of dead tuples followed by a lot of pages worth of live tuples. Plain VACUUM cannot do much to fix this since it doesn't move rows around. VACUUM FULL will fix it, but its index-update overhead is high enough that CLUSTER is a better deal. Tom: I was interested in performance improvements from cluster, so I tried to cluster a table on a spatial index: dra_working=# \d geomtest Table public.geomtest Column | Type | Modifiers +--+--- rd_segment_id | integer | admit_date | date | retire_date| date | most_recent| boolean | lineargeometry | geometry | Indexes: geomtest_idx gist (lineargeometry) dra_working=# cluster geomtest_idx on geomtest; ERROR: CLUSTER: cannot cluster when index access method does not handle nulls You may be able to work around this by marking column lineargeometry NOT NULL dra_working=# select version(); version --- PostgreSQL 7.3.6 on i686-pc-linux-gnu As of quite a while ago (7.2?) the GiST access method was made null-safe by Teodor and Oleg, I think. Is this a safety wrapper left over from before the upgrade to GiST? -- __ / | Paul Ramsey | Refractions Research | Email: [EMAIL PROTECTED] | Phone: (250) 885-0632 \_ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL pre-fork speedup
sdv mailer wrote: Instead, there's a big need to create a new connection on every query and with PostgreSQL needing to fork on every incoming connection can be quite slow. Really? My general experience has beent that forking/connection setup times are very good with PgSQL. Do not assume your Oracle experience transfers directly over -- Oracle has very large connection time overheads, PgSQL does not. This could be a big win since even a moderate improvement at the connection level will affect almost every user. Any chance of that happening for 7.5? Only if you do it yourself, probably. The calculation of the developers appears to be that the amount of time spent by the database on fork/connect will generally be dwarfed by the amount of time spent by the database actually doing work (this being a database, the actual workloads required of the backend are much higher than, say, for a web server). So the operational benefit of adding the complexity of a pre-fork system is not very high. And if you have the rare workload where a pre-fork actually *would* speed things up a great deal, you can solve the problem yourself with a connection-pooling middleware. -- __ / | Paul Ramsey | Refractions Research \_ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: PostGIS dropgeometrycolumn function (Was: Re: [HACKERS] [7.4]
Bitter experience... I am going to cc Dave here, because I could swear we went through many conniptions trying to make this work. And yet I just did this: create view mytables as select relname from pg_class where relam = 0 and relname not like 'pg_%'; And it seems to work fine. Oh, now I remember. The deal was not views, it was triggers. Since our geometry_columns contains some information not available via a query on existing data, a trigger was what we wanted, so we could harvest the information from a variety of places, and have some spare columns for things like the geometry selectivity stats. Paul On Tuesday, February 3, 2004, at 11:00 AM, Tom Lane wrote: Paul Ramsey [EMAIL PROTECTED] writes: In an idea world though, we would construct the thing as a view, so that when you did a CREATE TABLE that included a geometry type, you would automatically get a row in geometry_columns. That requires a view on system tables though, and that just does not work. :/ Uh, what makes you say it doesn't work? regards, tom lane Paul Ramsey Refractions Research Email: [EMAIL PROTECTED] Phone: (250) 885-0632 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Multicolumn Indexing using R-Tree
Try using GiST rtree (examples in contrib), GiST supports multi-key indexes. On Tuesday, February 3, 2004, at 06:56 AM, Marcio Caetano wrote: I'm using PostgreSQL 7.3.2 and I need to create a R-Tree index that uses more than one column in a table. When I run the instruction it appears this message bellow: DefineIndex: access method rtree does not support multi-column indexes How can I solve this problem ? Is it a limitation of PostgreSQL or the R-Tree concept ? Thank you in advance. Márcio Caetano ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] PostGIS Integration
Actually, in my wet dream, we stored everything in system tables. Dimensionality and SRID became parameters of the geometry, the selectivity stats lived in the system stats table (as Mark's patch should hopefully do) and the geometry_columns view just pulled everything together into one user-convenient location. CREATE TABLE foo ( mygeom POLYGON(4326) ); CREATE TABLE bar ( mygeom MULTILINESTRING(20711, 2 ) ); I think we had this discussion before though, and the parameterized types, like varchar(256), were not available for extended types, like our geometries. P. On Tuesday, February 3, 2004, at 12:06 PM, Tom Lane wrote: Paul Ramsey [EMAIL PROTECTED] writes: Oh, now I remember. The deal was not views, it was triggers. Oh, okay. You're right, we don't do triggers on system tables. But couldn't you combine a view on the system tables with storage of additional data outside? regards, tom lane Paul Ramsey Refractions Research Email: [EMAIL PROTECTED] Phone: (250) 885-0632 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly