from:"Paul Ramsey"

[HACKERS] Inlining functions with "expensive" parameters

2017-11-09 Thread Paul Ramsey

All,

As we try and make PostGIS more "parallel sensitive" we have been added
costs to our functions, so that their relative CPU cost is more accurately
reflected in parallel plans.

This has resulted in an odd side effect: some of our "wrapper" functions
stop giving index scans in plans [1]. This is a problem!

An example of a "wrapper" is ST_Intersects(geom1, geom2). It combines an
index operation (geom1 && geom2) with an exact spatial test
(_ST_Intersects(geom1, geom2). This is primarily for user convenience, and
has worked for us well for a decade and more. Having this construct stop
working is definitely a problem.

As we add costs to our functions, the odds increase that one of the
parameters to a wrapper might be a costed function. It's not uncommon to
see:

ST_Interects(geom, ST_SetSRID('POLYGON(...)', 4326))

It's fair to say that we really do depend on our wrappers getting inlined
basically all the time. They are simple functions, they do nothing other
than 'SELECT func1() AND func2() AND arg1 && arg2'.

However, once costs are added to the parameters, the inlining can be turned
off relatively quickly. Here's a PgSQL native example:

-- Create data table and index. Analyze.
DROP TABLE IF EXISTS boxen;
CREATE TABLE boxen AS
SELECT row_number() OVER() As gid,
box(point(x, y),point(x+1, y+1)) AS b, x, y
FROM generate_series(-100,100) As y, generate_series(-100,100) As x;
CREATE INDEX idx_b_geom_gist ON boxen USING gist(b);
ANALYZE boxen;

-- An inlined function
-- When set 'STRICT' it breaks index access
-- However 'IMMUTABLE' doesn't seem to bother it
CREATE OR REPLACE FUNCTION good_box(box, box)
RETURNS boolean
AS 'SELECT $1 OPERATOR(&&) $2 AND length(lseg(point($1),point($2)))
< 3'
LANGUAGE 'sql';

-- Start with a low cost circle()
ALTER FUNCTION circle(point, double precision) COST 1;

-- [A] Query plan hits index
EXPLAIN SELECT gid
FROM boxen
WHERE good_box(
boxen.b,
box(circle(point(20.5, 20.5), 2))
);

-- [B] Query plan hits index
EXPLAIN SELECT gid
FROM boxen,
(SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c
WHERE good_box(
boxen.b,
box(circle(point(c.x, c.y), 2))
);

-- Increase cost of circle
ALTER FUNCTION circle(point, double precision) COST 100;

-- [B] Query plan does not hit index!
EXPLAIN SELECT gid
FROM boxen,
(SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c
WHERE good_box(
boxen.b,
box(circle(point(c.x, c.y), 2))
);

The inlining is getting tossed out on a test of how expensive the function
parameters are [2]. As a result, we lose what is really the correct plan,
and get a sequence scan instead of an index scan.

The test of parameter cost seems quite old (15+ years) and perhaps didn't
anticipate highly variable individual function costs (or maybe it did). As
it stands though, PostGIS is currently stuck choosing between having costs
on our functions or having our inlined wrappers, because we cannot have
both at the same time.

Any thoughts?

Thanks!

P.


[1] https://trac.osgeo.org/postgis/ticket/3675#comment:18
[2]
https://github.com/postgres/postgres/blob/ae20b23a9e7029f31ee902da08a464d968319f56/src/backend/optimizer/util/clauses.c#L4581-L4584

Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-06 Thread Paul Ramsey

>From my perspective, this is much much better. For sufficiently large
tables, I get parallel behaviour without jimmying with the defaults
on parallel_setup_cost and parallel_tuple_cost. *And*, the parallel
behaviour *is* sensitive to the costs of functions in target lists, so
reasonably chosen costs will flip us into a parallel mode for expensive
functions against smaller tables too.
Hopefully some variant of this finds it's way into core! Is there any way I
can productively help?
P.

On Sat, Nov 4, 2017 at 10:02 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Paul Ramsey <pram...@cleverelephant.ca> writes:
> >>> Whether I get a parallel aggregate seems entirely determined by the
> number
> >>> of rows, not the cost of preparing those rows.
> >
> >> This is true, as far as I can tell and unfortunate. Feeding tables with
> >> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
> >> matter how costly the work going on within. That's true of changing
> costs
> >> on the subquery select list, and on the aggregate transfn.
> >
> > This sounds like it might be the same issue being discussed in
> >
> > https://www.postgresql.org/message-id/flat/CAMkU=
> 1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com
> >
>
> I have rebased the patch being discussed on that thread.
>
> Paul, you might want to once check with the recent patch [1] posted on
> the thread mentioned by Tom.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2B1H5Urm0_
> Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-05 Thread Paul Ramsey

On Sat, Nov 4, 2017 at 10:02 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Paul Ramsey <pram...@cleverelephant.ca> writes:
> >>> Whether I get a parallel aggregate seems entirely determined by the
> number
> >>> of rows, not the cost of preparing those rows.
> >
> >> This is true, as far as I can tell and unfortunate. Feeding tables with
> >> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
> >> matter how costly the work going on within. That's true of changing
> costs
> >> on the subquery select list, and on the aggregate transfn.
> >
> > This sounds like it might be the same issue being discussed in
> >
> > https://www.postgresql.org/message-id/flat/CAMkU=
> 1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com
> >
>

Thanks Tom, Amit; yes, this issue (expensive things in target lists not
affecting plans) seems like what I'm talking about in this particular case
and something that shows up a lot in PostGIS use cases: a function on a
target list like ST_Buffer() or ST_Intersection() will be a couple orders
of magnitude more expensive than anything in the filters.


> I have rebased the patch being discussed on that thread.
>
> Paul, you might want to once check with the recent patch [1] posted on
> the thread mentioned by Tom.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2B1H5Urm0_
> Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com


Awesome! I will compare and report back,
Thanks much!
P



>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-03 Thread Paul Ramsey

Just clarifying myself a little, since I made a dumb error partway through.

On Thu, Nov 2, 2017 at 10:09 AM, Paul Ramsey <pram...@cleverelephant.ca>
wrote:

> I'm working on a custom aggregate, that generates a serialized data
> format. The preparation of the geometry before being formatted is pretty
> intense, so it is probably a good thing for that work to be done in
> parallel, in partial aggregates. Here's an example SQL call:
>
> EXPLAIN analyze
> SELECT length(ST_AsMVT(a)) FROM (
> SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid,
> fed_num
> FROM pts_10 p
> WHERE p.geom && ::geometry_literal
> AND p.geom IS NOT NULL
> ) a;
>
> The ST_AsMVTGeom() function can be comically expensive, it's really good
> when it's in partial aggregates. But the cost of the function seems to be
> ignored.
>
> (First note that, in order to consistently get parallel plans I have to
> brutally suppress parallel_tuple_cost, as described here http://blog.
> cleverelephant.ca/2017/10/parallel-postgis-2.html)
>
> Whether I get a parallel aggregate seems entirely determined by the number
> of rows, not the cost of preparing those rows.
>

This is true, as far as I can tell and unfortunate. Feeding tables with
100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
matter how costly the work going on within. That's true of changing costs
on the subquery select list, and on the aggregate transfn.


> When changing the number of rows in the subquery, with a LIMIT, I can
> change from a seq scan to a paralllel seq scan and finally to a parallel
> aggregate, as the number of rows goes up.
>

I see now that as soon as I brought the LIMIT in, the plans had to go
sequential, just due to the nature of a LIMIT in a subquery. Ignore the
below, sorry.
Thanks!
P



>
> An odd effect: when I have enough rows to get a paralllel seq scan, I get
> flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That
> seems odd and backwards.
>
> Is there anywhere a guide or rough description to how costs are used in
> determining parallel plans? The empirical approach starts to wear one down
> after a while :)
>
> P.
>
>

[HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-02 Thread Paul Ramsey

I'm working on a custom aggregate, that generates a serialized data format.
The preparation of the geometry before being formatted is pretty intense,
so it is probably a good thing for that work to be done in parallel, in
partial aggregates. Here's an example SQL call:

EXPLAIN analyze
SELECT length(ST_AsMVT(a)) FROM (
SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid,
fed_num
FROM pts_10 p
WHERE p.geom && ::geometry_literal
AND p.geom IS NOT NULL
) a;

The ST_AsMVTGeom() function can be comically expensive, it's really good
when it's in partial aggregates. But the cost of the function seems to be
ignored.

(First note that, in order to consistently get parallel plans I have to
brutally suppress parallel_tuple_cost, as described here
http://blog.cleverelephant.ca/2017/10/parallel-postgis-2.html)

Whether I get a parallel aggregate seems entirely determined by the number
of rows, not the cost of preparing those rows.

When changing the number of rows in the subquery, with a LIMIT, I can
change from a seq scan to a paralllel seq scan and finally to a parallel
aggregate, as the number of rows goes up.

An odd effect: when I have enough rows to get a paralllel seq scan, I get
flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That
seems odd and backwards.

Is there anywhere a guide or rough description to how costs are used in
determining parallel plans? The empirical approach starts to wear one down
after a while :)

P.

Re: [HACKERS] generate_series regression 9.6->10

2017-05-24 Thread Paul Ramsey

Thanks Tom. This showed up in a regression test of ours that built the test
data using generate_series, so it's not a critical production issue or
anything, just a surprise change in behaviour.

P.

On Wed, May 24, 2017 at 10:28 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Paul Ramsey <pram...@cleverelephant.ca> writes:
> > The behaviour of generate_series seems to have changed a little, at least
> > in conjunction w/ CTEs.
>
> What's changed is the behavior of multiple SRFs in a SELECT's targetlist,
> cf
>
> https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=
> 69f4b9c85f168ae006929eec44fc44d569e846b9
>
> specifically this comment:
>
> While moving SRF evaluation to ProjectSet would allow to retain the old
> "least common multiple" behavior when multiple SRFs are present in one
> targetlist (i.e.  continue returning rows until all SRFs are at the
> end of
> their input at the same time), we decided to instead only return rows
> till
> all SRFs are exhausted, returning NULL for already exhausted ones.  We
> deemed the previous behavior to be too confusing, unexpected and
> actually
> not particularly useful.
>
> I see the current v10 release notes have failed miserably at documenting
> this :-(.  Will try to improve that.
>
> regards, tom lane
>

[HACKERS] generate_series regression 9.6->10

2017-05-24 Thread Paul Ramsey

The behaviour of generate_series seems to have changed a little, at least
in conjunction w/ CTEs. Under 9.6 (and prior) this query returns 2127 rows,
with no nulls:

with
ij as ( select i, j from generate_series(1, 10) i, generate_series(1, 10)
j),
iijj as (select generate_series(1, i) as a, generate_series(1, j) b from ij)
select a, b from iijj;

Under 10, it returns only 715 rows, with many nulls.

Re: [HACKERS] Retiring from the Core Team

2017-01-11 Thread Paul Ramsey

On Wed, Jan 11, 2017 at 4:29 PM, Josh Berkus  wrote:

>
> For that reason, as of today, I am stepping down from the PostgreSQL
> Core Team.
>

Massive thanks Josh, you've been a great advocate and a wonderful bridge
into the PgSQL community for those of us finding our way towards the warm
centre of PgSQL.
ATB,
P

Re: [HACKERS] User-defined Operator Pushdown and Collations

2016-11-28 Thread Paul Ramsey

On Sun, Nov 27, 2016 at 11:50 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Paul Ramsey <pram...@cleverelephant.ca> writes:
> > On Sun, Nov 27, 2016 at 9:31 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> Why doesn't hs_fdw.h have a collation?
>
> > I think I'm missing something, I cannot find a file like that anywhere.
>
> I was referring to the variable shown in your EXPLAIN.
>

Ah right. Why would hs_fdw.h have a collation, it's an hstore. It's not
declared with a collation (the CREATE TYPE call doesn't set the COLLATEABLE
attribue to true). Again my ignorance is running ahead of me: does every
object in the database necessarily have a collation?

CREATE FOREIGN TABLE hs_fdw ( id integer, h hstore collate "en_CA.UTF-8")
server foreign_server OPTIONS (table_name 'hs');
ERROR: collations are not supported by type hstore

> > With respect to this particular example, is this a case of a very large
> > collation hammer getting in the way? Both '->' and '=' are operators that
> > would be unaffected by collation, right? They are both just
> equality-based
> > tests. But all operators are getting tested for coherent collation
> > behaviour, so they get caught up in the net?
>
> Yeah, we don't know whether the operator actually cares about its input
> collation.  It'd be possible to be much more liberal if we knew it did
> not, but such labeling was not included in the design for the collation
> facility.  That might've been a mistake ...
>

In this case the hammer seems very large, since only one side of the '<-'
operator is even collatable. Mind you, if it *did* work it would still
bubble up to a case of 'text = text' at the top node, so the problem would
still remain. Although it seems unfair: I can definite declare a table with
a text column and run a query with Const = Var and it'll ship that OpExpr
over, which seems no more fishy than what I'm asking the hstore to do.

hstore=# explain (verbose) select * from txt_fdw where txt = 'this';
QUERY PLAN
---
 Foreign Scan on public.txt_fdw  (cost=100.00..127.20 rows=7 width=36)
   Output: id, txt
   Remote SQL: SELECT id, txt FROM public.txt WHERE ((txt = 'this'::text))
(3 rows)

The fdw table can't know much about what the remote collation is, it only
knows what I've told it locally which is that it's the default, so
everything matches up. Sounds like the problem is hstore lacking a
collation? Or, that lacking a collation is not considered equivalent to the
default in the testing code.

P

P

>
> regards, tom lane
>

Re: [HACKERS] User-defined Operator Pushdown and Collations

2016-11-27 Thread Paul Ramsey

On Sun, Nov 27, 2016 at 9:31 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Paul Ramsey <pram...@cleverelephant.ca> writes:
> > On Fri, Nov 25, 2016 at 11:30 AM, Paul Ramsey <pram...@cleverelephant.ca
> >
> > wrote:
> >> I've been trying to figure out an issue with operators not being pushed
> >> down for user defined types, in this case "hstore". TL;DR:
> >>
> >> hstore=# explain (verbose) select * from hs_fdw where h -> 'a' = '1';
> >> QUERY PLAN
> >> --
> >> Foreign Scan on public.hs_fdw  (cost=100.00..157.78 rows=7 width=36)
> >> Output: id, h
> >> Filter: ((hs_fdw.h -> 'a'::text) = '1'::text)
> >> Remote SQL: SELECT id, h FROM public.hs
> >> (4 rows)
> >>
> >> In terms of "shippability" the "->" operator passes fine. It ends up not
> >> being shipped because its collation bubbles up as FDW_COLLATE_NONE, and
> >> gets kicked back as not deparseable around here:
> >>
> >> https://github.com/postgres/postgres/blob/
> 4e026b32d4024b03856b4981b26c74
> >> 7b7fef7afb/contrib/postgres_fdw/deparse.c#L499
>
> > I'm finding this piece of code a little suspect, but that may just be my
> > not fully understanding why/what determines when a collation is
> shippable.
>
> > In the case of my example above, the OpExpr '->' has an input collation
> of
> > 100 (DEFAULT_COLLATION_ID). The Var below has a collation of 0
> (InvalidOid)
> > and state of FDW_COLLATE_NONE, and the Const has collation of 100
> > (DEFAULT_COLLATION_ID ) and state of FDW_COLLATE_NONE.
>
> Why doesn't hs_fdw.h have a collation?
>

I think I'm missing something, I cannot find a file like that anywhere.


> The intuition behind the rules in this area is that we'll only push down
> expressions whose collation is traceable to a foreign Var.  Assuming that
> you've correctly declared your foreign table with column collations that
> match the column collations of the real table on the remote server, this
> should ensure that you get the same collation-dependent behavior as you
> would have gotten locally.  In this example, the expression's collation
> behavior would be per DEFAULT_COLLATION_ID on both servers ... but they
> might have different default collations.


OK, so there's a potential workaround w/ explicitly declared collations, I
am hearing?

With respect to this particular example, is this a case of a very large
collation hammer getting in the way? Both '->' and '=' are operators that
would be unaffected by collation, right? They are both just equality-based
tests. But all operators are getting tested for coherent collation
behaviour, so they get caught up in the net?

Thanks!

P


> So without this rule the
> expression would be pushed down and could then give different results.
>
> regards, tom lane
>

Re: [HACKERS] User-defined Operator Pushdown and Collations

2016-11-25 Thread Paul Ramsey

On Fri, Nov 25, 2016 at 11:30 AM, Paul Ramsey <pram...@cleverelephant.ca>
wrote:

> I've been trying to figure out an issue with operators not being pushed
> down for user defined types, in this case "hstore". TL;DR:
>
> hstore=# explain (verbose) select * from hs_fdw where h -> 'a' = '1';
>   QUERY PLAN
> --
>  Foreign Scan on public.hs_fdw  (cost=100.00..157.78 rows=7 width=36)
>Output: id, h
>Filter: ((hs_fdw.h -> 'a'::text) = '1'::text)
>Remote SQL: SELECT id, h FROM public.hs
> (4 rows)
>
> In terms of "shippability" the "->" operator passes fine. It ends up not
> being shipped because its collation bubbles up as FDW_COLLATE_NONE, and
> gets kicked back as not deparseable around here:
>
> https://github.com/postgres/postgres/blob/4e026b32d4024b03856b4981b26c74
> 7b7fef7afb/contrib/postgres_fdw/deparse.c#L499
>
>
I'm finding this piece of code a little suspect, but that may just be my
not fully understanding why/what determines when a collation is shippable.

In the case of my example above, the OpExpr '->' has an input collation of
100 (DEFAULT_COLLATION_ID). The Var below has a collation of 0 (InvalidOid)
and state of FDW_COLLATE_NONE, and the Const has collation of 100
(DEFAULT_COLLATION_ID ) and state of FDW_COLLATE_NONE.  In other parts of
the code, the default collation and collection 0 are treated as both
leading to a state of FDW_COLLATE_NONE. But the OpExpr instead of bubbling
FDW_COLLATE_NONE up the chain, returns false and ends the shippability of
the Node.

What are the issues around shipping nodes of different collations to the
remote? All the nodes in my example are foreign Var, or local Const, and
all either are collation 0 or DEFAULT_COLLATION_ID. Surely it should be
shippable?

P

> I'm still working at wrapping my head around why this is good or not, but
> if there's an obvious explanation and/or workaround, I'd love to know.
>
> Thanks!
>
> P
>
>

[HACKERS] User-defined Operator Pushdown and Collations

2016-11-25 Thread Paul Ramsey

I've been trying to figure out an issue with operators not being pushed
down for user defined types, in this case "hstore". TL;DR:

hstore=# explain (verbose) select * from hs_fdw where h -> 'a' = '1';
  QUERY PLAN
--
 Foreign Scan on public.hs_fdw  (cost=100.00..157.78 rows=7 width=36)
   Output: id, h
   Filter: ((hs_fdw.h -> 'a'::text) = '1'::text)
   Remote SQL: SELECT id, h FROM public.hs
(4 rows)

In terms of "shippability" the "->" operator passes fine. It ends up not
being shipped because its collation bubbles up as FDW_COLLATE_NONE, and
gets kicked back as not deparseable around here:

https://github.com/postgres/postgres/blob/4e026b32d4024b03856b4981b26c747b7fef7afb/contrib/postgres_fdw/deparse.c#L499

I'm still working at wrapping my head around why this is good or not, but
if there's an obvious explanation and/or workaround, I'd love to know.

Thanks!

P

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Paul Ramsey

On Wed, Jun 1, 2016 at 8:59 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> On 6/1/16 10:03 AM, Paul Ramsey wrote:
>>
>> We don't depend on these, we have our own :/
>> The real answer for a GIS system is to have an explicit tolerance
>> parameter for calculations like distance/touching/containment, but
>> unfortunately we didn't do that so now we have our own
>> compatibility/boil the ocean problem if we ever wanted/were funded to
>> add one.
>
>
> Well it sounds like what's currently happening in Postgres is probably going
> to change, so how might we structure that to help PostGIS? Would simply
> lopping off the last few bits of the significand/mantissa work, or is that
> not enough when different GRSes are involved?

PostGIS doesn't look at all at what the PgSQL geotypes do, so go
forward w/o fear. Tolerance in geo world is more than vertex rounding
though, it's things like saying that when distance(pt,line) < epsilon
then distance(pt,line) == 0, or similarly for shape touching, etc. One
of the things people find annoying about postgis is that

ST_Intersects(ST_Intersection(a, b), a) can come out as false (a
derived point at a crossing of lines may not exactly intersect either
of the input lines), which is a direct result of our use of exact math
for the boolean intersects test.

Anyways, go forth and do whatever makes sense for PgSQL

P

>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Paul Ramsey

On Wed, Jun 1, 2016 at 7:52 AM, Jim Nasby  wrote:

>
> I suspect another wrinkle here is that in the GIS world a single point can
> be represented it multiple reference/coordinate systems, and it would have
> different values in each of them. AIUI the transforms between those systems
> can be rather complicated if different projection methods are involved. I
> don't know if PostGIS depends on what these macros are doing or not. If it
> doesn't, perhaps it would be sufficient to lop of the last few bits of the
> significand. ISTM that'd be much better than what the macros currently do.

We don't depend on these, we have our own :/
The real answer for a GIS system is to have an explicit tolerance
parameter for calculations like distance/touching/containment, but
unfortunately we didn't do that so now we have our own
compatibility/boil the ocean problem if we ever wanted/were funded to
add one.
P.

> BTW, I suspect the macro values were chosen specifically for dealing with
> LAT/LONG.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-04-26 Thread Paul Ramsey

On Fri, Apr 22, 2016 at 11:44 AM, Stephen Frost <sfr...@snowman.net> wrote:
> Paul,
>
> * Paul Ramsey (pram...@cleverelephant.ca) wrote:
>> On Mon, Mar 28, 2016 at 9:45 AM, Stephen Frost <sfr...@snowman.net> wrote:
>> > Would you agree that it'd be helpful to have for making the st_union()
>> > work better in parallel?
>>
>> For our particular situation w/ ST_Union, yes, it would be ideal to be
>> able to run a worker-side combine function as well as the master-side
>> one. Although the cascaded union would be less effective spread out
>> over N nodes, doing it only once per worker, rather than every N
>> records would minimize the loss of effectiveness.
>
> I chatted with Robert a bit about this and he had an interesting
> suggestion.  I'm not sure that it would work for you, but the
> serialize/deserialize functions are used to transfer the results from
> the worker process to the main process.  You could possibly do the
> per-worker finalize work in the serialize function to get the benefit of
> running that in parallel.
>
> You'll need to mark the aggtranstype as 'internal' to have the
> serialize/deserialize code called.  Hopefully that's not too much of an
> issue.

Thanks Stephen. We were actually thinking that it might make more
sense to just do the parallel processing in our own threads in the
finalfunc. Not as elegant and magical as bolting into the PgSQL infra,
but if we're doing something hacky anyways, might as well be our own
hacky.

ATB,
P

>
> Thanks!
>
> Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protocol buffer support for Postgres

2016-04-26 Thread Paul Ramsey

On Tue, Apr 26, 2016 at 6:40 AM, Tom Lane  wrote:
> Craig Ringer  writes:
>> On 26 April 2016 at 14:06, 陈天舟  wrote:
>>> (1) Since each protocol buffer column requires a schema. I am not sure
>>> where is the best place to store that schema info. Should it be in a
>>> CONSTRAINT (but I am not able to find the doc referring any custom
>>> constraint), or should it be in the COMMENT or somewhere else?
>
>> I can't really imagine how you'd do that without adding a new catalog like
>> we have for enum members. A typmod isn't sufficient since you need a whole
>> lot more than an integer, and typmods aren't tracked throughout the server
>> that well.
>
>> That'll make it hard to do it with an extension.
>
> PostGIS manages to reference quite a lot of schema-like information via
> a geometry column's typmod.  Maybe there's a reason why their approach
> wouldn't be a good fit for this, but it'd be worth investigating.

We pack a short type number, two flags and 24 bits of SRID number into
an integer. The SRID number is in turn a foreign key into the
spatial_ref_sys table where the fully spelled out spatial reference
definition lives.  It's not very nice, and it's quite breakable since
there's no foreign key integrity between the typmod and the
spatial_ref_sys pk.

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-04-08 Thread Paul Ramsey

On Fri, Apr 8, 2016 at 9:06 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On 8 April 2016 at 17:00, Paul Ramsey <pram...@cleverelephant.ca> wrote:
>>
>> On Fri, Apr 8, 2016 at 8:23 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> > On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapil...@gmail.com>
>> > wrote:
>> >> Other than that, patch looks good and I have marked it as Ready For
>> >> Committer.  Hope, we get this for 9.6.
>> >
>> > Committed.  I think this is likely to make parallel query
>> > significantly more usable in 9.6.
>>
>> I'm kind of worried that it will make it yet less usable for PostGIS,
>> since approaches that ignore costs in favour of relpages will
>> dramatically under-resource our queries. I can spin a query for
>> multiple seconds on a table with less than 100K records, not even
>> trying very hard.
>
> Doesn't sound good.

I admit, it's not a "usual" database thing, but it's right in the
meaty middle of use cases that parallelism can crushingly awesomely
defeat. It's also probably not too unusual for extension use cases,
where complex data are held in user defined types, whether they be
image fragments, music samples, genetic data, raster data or LIDAR
point clouds. PostGIS is just one voice of many in the Symphony of
Crazy Shit in the Database.

>> Functions have very unequal CPU costs, and we're talking here about
>> using CPUs more effectively, why are costs being given the see-no-evil
>> treatment? This is as true in core as it is in PostGIS, even if our
>> case is a couple orders of magnitude more extreme: a filter based on a
>> complex combination of regex queries will use an order of magnitude
>> more CPU than one that does a little math, why plan and execute them
>> like they are the same?
>
> Functions have user assignable costs.

We have done a relatively bad job of globally costing our functions
thus far, because it mostly didn't make any difference. In my testing
[1], I found that costing could push better plans for parallel
sequence scans and parallel aggregates, though at very extreme cost
values (1000 for sequence scans and 1 for aggregates)

Obviously, if costs can make a difference for 9.6 and parallelism
we'll rigorously ensure we have good, useful costs. I've already
costed many functions in my parallel postgis test branch [2]. Perhaps
the avoidance of cost so far is based on the relatively nebulous
definition it has: about the only thing in the docs is "If the cost is
not specified, 1 unit is assumed for C-language and internal
functions, and 100 units for functions in all other languages. Larger
values cause the planner to try to avoid evaluating the function more
often than necessary."

So what about C functions then? Should a string comparison be 5 and a
multiplication 1? An image histogram 1000?

>> As it stands now, it seems like out of the box PostGIS users will
>> actually not see much benefit from parallelism unless they  manhandle
>> their configuration settings to force it.
>
> Does this concern apply to this patch, or to the general situation for 9.6.

Insofar as the patch is throttling how many parallel workers you get
based solely on your relsize, it does concern this patch, but it's a
general issue in both the extreme and not obviously related costings
needed to trip parallel sequence and parallel aggregate plans. The
parallel join seems to not take function/operator costs into account
at all [3], at least I couldn't plump up a high enough cost to trip it
without also adjusting the global parallel tuple cost configuration.

I've seen a number of asides to the effect that "yes, costs are
important, but we probably can't do anything about that for 9.6" in
parallel patch threads, including this one, so I'm getting concerned
that the core improvement we've been hoping for for years won't
actually address our use cases when it is first released. That may
just be the way it is, c'est la vie, but it would be unfortunate.

P

[1] http://blog.cleverelephant.ca/2016/03/parallel-postgis.html
[2] https://github.com/pramsey/postgis/tree/parallel
[3] http://blog.cleverelephant.ca/2016/03/parallel-postgis-joins.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Choosing parallel_degree

2016-04-08 Thread Paul Ramsey

On Fri, Apr 8, 2016 at 8:23 AM, Robert Haas  wrote:
> On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila  wrote:
>> Other than that, patch looks good and I have marked it as Ready For
>> Committer.  Hope, we get this for 9.6.
>
> Committed.  I think this is likely to make parallel query
> significantly more usable in 9.6.

I'm kind of worried that it will make it yet less usable for PostGIS,
since approaches that ignore costs in favour of relpages will
dramatically under-resource our queries. I can spin a query for
multiple seconds on a table with less than 100K records, not even
trying very hard.

Functions have very unequal CPU costs, and we're talking here about
using CPUs more effectively, why are costs being given the see-no-evil
treatment? This is as true in core as it is in PostGIS, even if our
case is a couple orders of magnitude more extreme: a filter based on a
complex combination of regex queries will use an order of magnitude
more CPU than one that does a little math, why plan and execute them
like they are the same?

As it stands now, it seems like out of the box PostGIS users will
actually not see much benefit from parallelism unless they  manhandle
their configuration settings to force it.

ATB,

P

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-03-31 Thread Paul Ramsey

On Tue, Mar 29, 2016 at 12:51 PM, Paul Ramsey <pram...@cleverelephant.ca> wrote:
> On Tue, Mar 29, 2016 at 12:48 PM, Paul Ramsey <pram...@cleverelephant.ca> 
> wrote:
>
>>> On the join case, I wonder if it's possible that _st_intersects is not
>>> marked parallel-safe?  If that's not the problem, I don't have a
>>> second guess, but the thing to do would be to figure out whether
>>> consider_parallel is false for the RelOptInfo corresponding to either
>>> of pd and pts, or whether it's true for both but false for the
>>> joinrel's RelOptInfo, or whether it's true for all three of them but
>>> you don't get the desired path anyway.
>>
>> _st_intersects is definitely marked parallel safe, and in fact will
>> generate a parallel plan if used alone (without the operator though,
>> it's impossibly slow). It's the && operator that is the issue... and I
>> just noticed that the PROCEDURE bound to the && operator
>> (geometry_overlaps) is *not* marked parallel safe: could be the
>> problem?
>
> Asked and answered: marking the geometry_overlaps as parallel safe
> gets me a parallel plan! Now to play with costs and see how it behaves
> when force_parallel_mode is not set.

For the record I can get a non-forced parallel join plan, *only* if I
reduce the parallel_join_cost by a factor of 10, from 0.1 to 0.01.

http://blog.cleverelephant.ca/2016/03/parallel-postgis-joins.html

This seems non-optimal. No amount of cranking up the underlying
function COST seems to change this, perhaps because the join cost is
entirely based on the number of expected tuples in the join relation?

In general it seems like function COST values have been considered a
relatively unimportant input to planning in the past, but with
parallel processing it seems like they are now a lot more
determinative about what makes a good plan.

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Paul Ramsey

On Tue, Mar 29, 2016 at 1:14 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Mar 29, 2016 at 3:48 PM, Paul Ramsey <pram...@cleverelephant.ca> 
> wrote:
>>> I have no idea why the cost adjustments that you need are different
>>> for the scan case and the aggregate case.  That does seem problematic,
>>> but I just don't know why it's happening.
>>
>> What might be a good way to debug it? Is there a piece of code I can
>> look at to try and figure out the contribution of COST in either case?
>
> Well, the cost calculations are mostly in costsize.c, but I dunno how
> much that helps.  Maybe it would help if you posted some EXPLAIN
> ANALYZE output for the different cases, with and without parallelism?
>
> One thing I noticed about this output (from your blog)...
>
> Finalize Aggregate
>  (cost=16536.53..16536.79 rows=1 width=8)
>  (actual time=2263.638..2263.639 rows=1 loops=1)
>->  Gather
>(cost=16461.22..16461.53 rows=3 width=32)
>(actual time=754.309..757.204 rows=4 loops=1)
>  Number of Workers: 3
>  ->  Partial Aggregate
>  (cost=15461.22..15461.23 rows=1 width=32)
>  (actual time=676.738..676.739 rows=1 loops=4)
>->  Parallel Seq Scan on pd
>(cost=0.00..13856.38 rows=64 width=2311)
>(actual time=3.009..27.321 rows=42 loops=4)
>  Filter: (fed_num = 47005)
>  Rows Removed by Filter: 17341
>  Planning time: 0.219 ms
>  Execution time: 2264.684 ms
>
> ...is that the finalize aggregate phase is estimated to be very cheap,
> but it's actually wicked expensive.  We get the results from the
> workers in only 750 ms, but it takes another second and a half to
> aggregate those 4 rows???

This is probably a vivid example of the bad behaviour of the naive
union approach. If we have worker states 1,2,3,4 and we go

combine(combine(combine(1,2),3),4)

Then we get kind of a worst case complexity situation where we three
times union an increasingly complex object on the left with a simpler
object on the right. Also, if the objects went into the transfer
functions in relatively non-spatially correlated order, the polygons
coming out of the transfer functions could be quite complex, and each
merge would only add complexity to the output until the final merge
which melts away all the remaining internal boundaries.

I'm surprised it's quite so awful at the end though, and less awful in
the worker stage... how do the workers end up getting rows to work on?
1,2,3,4,1,2,3,4,1,2,3,4? or 1,1,1,2,2,2,3,3,3,4,4,4? The former could
result in optimally inefficient unions, given a spatially correlated
input (surprisingly common in load-once GIS tables)

P.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Paul Ramsey

On Tue, Mar 29, 2016 at 12:48 PM, Paul Ramsey <pram...@cleverelephant.ca> wrote:

>> On the join case, I wonder if it's possible that _st_intersects is not
>> marked parallel-safe?  If that's not the problem, I don't have a
>> second guess, but the thing to do would be to figure out whether
>> consider_parallel is false for the RelOptInfo corresponding to either
>> of pd and pts, or whether it's true for both but false for the
>> joinrel's RelOptInfo, or whether it's true for all three of them but
>> you don't get the desired path anyway.
>
> _st_intersects is definitely marked parallel safe, and in fact will
> generate a parallel plan if used alone (without the operator though,
> it's impossibly slow). It's the && operator that is the issue... and I
> just noticed that the PROCEDURE bound to the && operator
> (geometry_overlaps) is *not* marked parallel safe: could be the
> problem?

Asked and answered: marking the geometry_overlaps as parallel safe
gets me a parallel plan! Now to play with costs and see how it behaves
when force_parallel_mode is not set.

P.

>
> Thanks,
>
> P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Paul Ramsey

> First, I beg to differ with this statement: "Some of the execution
> results output are wrong! "  The point is that
> line has loops=4, so as in any other case where loops>1, you're seeing
> the number of rows divided by the number of loops.  It is the
> *average* number of rows that were processed by each loop - one loop
> per worker, in this case.

Thanks for the explanation, let my reaction be a guide to what the
other unwashed will think :)

> Now, on to your actual question:
>
> I have no idea why the cost adjustments that you need are different
> for the scan case and the aggregate case.  That does seem problematic,
> but I just don't know why it's happening.

What might be a good way to debug it? Is there a piece of code I can
look at to try and figure out the contribution of COST in either case?

> On the join case, I wonder if it's possible that _st_intersects is not
> marked parallel-safe?  If that's not the problem, I don't have a
> second guess, but the thing to do would be to figure out whether
> consider_parallel is false for the RelOptInfo corresponding to either
> of pd and pts, or whether it's true for both but false for the
> joinrel's RelOptInfo, or whether it's true for all three of them but
> you don't get the desired path anyway.

_st_intersects is definitely marked parallel safe, and in fact will
generate a parallel plan if used alone (without the operator though,
it's impossibly slow). It's the && operator that is the issue... and I
just noticed that the PROCEDURE bound to the && operator
(geometry_overlaps) is *not* marked parallel safe: could be the
problem?

Thanks,

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-03-29 Thread Paul Ramsey

On Mon, Mar 28, 2016 at 9:18 AM, Paul Ramsey <pram...@cleverelephant.ca> wrote:

> Parallel join would be a huge win, so some help/pointers on figuring
> out why it's not coming into play when our gist operators are in
> effect would be helpful.

Robert, do you have any pointers on what I should look for to figure
out why the parallel join code doesn't fire if I add a GIST operator
to my join condition?

Thanks,

P

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-03-28 Thread Paul Ramsey

On Mon, Mar 28, 2016 at 9:45 AM, Stephen Frost <sfr...@snowman.net> wrote:
> Paul,
>
> * Paul Ramsey (pram...@cleverelephant.ca) wrote:
>> I spent some time over the weekend trying out the different modes of
>> parallel query (seq scan, aggregate, join) in combination with PostGIS
>> and have written up the results here:
>>
>> http://blog.cleverelephant.ca/2016/03/parallel-postgis.html
>
> Neat!
>
> Regarding aggregate parallelism and the cascaded union approach, though
> I imagine in other cases as well, it seems like having a
> "final-per-worker" function for aggregates would be useful.
>
> Without actually looking at the code at all, it seems like that wouldn't
> be terribly difficult to add.
>
> Would you agree that it'd be helpful to have for making the st_union()
> work better in parallel?

For our particular situation w/ ST_Union, yes, it would be ideal to be
able to run a worker-side combine function as well as the master-side
one. Although the cascaded union would be less effective spread out
over N nodes, doing it only once per worker, rather than every N
records would minimize the loss of effectiveness.

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel Queries and PostGIS

2016-03-28 Thread Paul Ramsey

I spent some time over the weekend trying out the different modes of
parallel query (seq scan, aggregate, join) in combination with PostGIS
and have written up the results here:

http://blog.cleverelephant.ca/2016/03/parallel-postgis.html

The TL:DR; is basically

* With some adjustments to function COST both parallel sequence scan
and parallel aggregation deliver very good parallel performance
results.
* The cost adjustments for sequence scan and aggregate scan are not
consistent in magnitude.
* Parallel join does not seem to work for PostGIS indexes yet, but
perhaps there is some magic to learn from PostgreSQL core on that.

The two findings at the end are ones that need input from parallel
query masters...

We recognize we'll have to adjust costs to that our particular use
case (very CPU-intensive calculation per function) is planned better,
but it seems like different query modes are interpreting costs in
order-of-magnitude different ways in building plans.

Parallel join would be a huge win, so some help/pointers on figuring
out why it's not coming into play when our gist operators are in
effect would be helpful.

Happy Easter to you all,
P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread Paul Ramsey

On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
 wrote:
> On 14 March 2016 at 14:52, James Sewell  wrote:
>> One question - how is the upper limit of workers chosen?
>
> See create_parallel_paths() in allpaths.c. Basically the bigger the
> relation (in pages) the more workers will be allocated, up until
> max_parallel_degree.

Does the cost of the aggregate function come into this calculation at
all? In PostGIS land, much smaller numbers of rows can generate loads
that would be effective to parallelize (worker time much >> than
startup cost).

P

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanded Objects and Order By

2016-01-20 Thread Paul Ramsey

Thank the Maker, it is reproduceable: returning an expanded header in the _in 
function is not appreciated in a very narrow number of cases.

Edit arrayfuncs.c:array_in(), change the return line thus:

// PG_RETURN_ARRAYTYPE_P(retval);
PG_RETURN_DATUM(expand_array(PointerGetDatum(retval), 
CurrentMemoryContext, my_extra));

And here is a small test case that exercises it:

CREATE TABLE orderby_expanded (
  id integer,
  a integer[]
);

INSERT INTO orderby_expanded (id, a) VALUES (1, ARRAY[1,2]);

-- works 
SELECT id, a FROM orderby_expanded 
ORDER BY a = '{1,2}'::integer[];

-- ERROR:  could not find pathkey item to sort
SELECT id FROM orderby_expanded 
ORDER BY a = '{1,2}'::integer[];

-- works
SELECT id FROM orderby_expanded 
ORDER BY a = ARRAY[1,2];



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Expanded Objects and Order By

2016-01-18 Thread Paul Ramsey

I'm starting to think this might not actually be my mistake, but be a
very narrow issue w/ the expanded object code.

So, I've added support for converting postgis in-memory objects into
expanded outputs, and have overwritten the usual
lwgeom_to_gserialized() function with one that creates an expanded
object. I haven't done anything to actually handle expanded objects on
input, but as I understand it, that's fine, everything is supposed to
work as normal when handed expanded objects. And thus far, that has
been true, almost all regression tests work fine.

But here's my narrow case. This works fine with the old
lwgeom_to_gserialized that uses standard flat varlena outputs. It does
the following w/ expanded outputs.

SELECT gid
FROM test_table
ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5;

ERROR:  could not find pathkey item to sort

Tracing through the debugger, I see the lwgeom_to_gserialized()
function get hit once for the geometry literal, via parse_analyze.
After that, the error shows up shortly.

If I change the query ever so slightly, adding a geometry output column:

SELECT gid, geom
FROM test_table
ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5;

It runs through to completion, five records returned. As long as the
query includes a geometry output on the output line, it works. A
function that uses a geometry, but not doesn't actually output it,
will still fail, with the same pathkey error. This will fail, for
example.

SELECT gid, ST_AsText(geom)
FROM test_table
ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5;

So I'm wondering what I should debug next. My code is hardly being
touched at all at this point, and I'm very confident it's memory
clean, etc (the flattener raises hell if the output is shorter or
longer than the expected allocation).

Any pointers as to what to look for, much appreciated.

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanded Objects and Order By

2016-01-18 Thread Paul Ramsey

I have a size/flatten callback setup (and they are very careful not to write 
outside their boundaries), so that’s all OK. 
Since you’re not seeing anything “aha” in the error pattern, I’ll go back to 
the mats on memory… is there a good page on valgriding postgresql? I thought 
the memory manager papered over things so much that valgrind couldn’t see what 
was going on under the covers.

Thanks!

P

> On Jan 18, 2016, at 8:36 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Paul Ramsey <pram...@cleverelephant.ca> writes:
>> So, I've added support for converting postgis in-memory objects into
>> expanded outputs, and have overwritten the usual
>> lwgeom_to_gserialized() function with one that creates an expanded
>> object. I haven't done anything to actually handle expanded objects on
>> input, but as I understand it, that's fine, everything is supposed to
>> work as normal when handed expanded objects. And thus far, that has
>> been true, almost all regression tests work fine.
> 
> Hmm ... you do need to be able to flatten them again.  In the given
> example, the parser is going to want to form a Const node whose Datum
> value is a geometry object, and that Const node needs to be copiable
> by copyObject(), which means datumCopy() has to work, and if you look
> at that it will exercise EOH_get_flat_size/EOH_flatten_into when the
> input routine originally produced an expanded object.
> 
> The error message is very strange; it's hard to see how toying with the
> internal representation of Consts could cause that.  I think the
> hypothesis of a memory clobber is stronger than you give it credit for,
> especially since you see the behavior changing for seemingly unrelated
> reasons.  Valgrind might be a useful tool here.
> 
>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Expanded Object Header and Flat Cache

2016-01-15 Thread Paul Ramsey

I've been working through the expanded object code to try and get a
demonstration of it working with PostGIS (still having some problems,
but it's a learning experience). On an unrelated from, I noticed in
the array expanded code, that the array code is managing its own copy
of a cache of the flat representation and flat size,

https://github.com/postgres/postgres/blob/cf7dfbf2d6c5892747cd6fca399350d86c16f00f/src/backend/utils/adt/array_expanded.c#L247-L253

This seems like generic code, which any implementor is going to end up
copying (after seeing it, and seeing how often the flatten size
callback is being hit while debugging code, it seems like an obvious
next thing to add to my expanded representation, once I get things
working).

Why isn't caching the flat representation and size (and short
circuiting when the cache is already filled) part of the generic
functionality PgSQL provides? Should it be? I guess it would imply a
required function to dirty the EOH cache when changes are made to the
in-memory data, but that seems no worse as part of the generic API
than in all the client code.

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-21 Thread Paul Ramsey

On December 21, 2015 at 2:33:56 AM, Haribabu Kommi (kommi.harib...@gmail.com) 
wrote:
Yes the query is producing more groups according to the selectivity. 
For example - scan selectivity - 40, the number of groups - 400 

Following is the query: 

SELECT tenpoCord, 
SUM(yokinZandaka) AS yokinZandakaxGOUKEI, 
SUM(kashikoshiZandaka) AS kashikoshiZandakaxGOUKEI, 
SUM(kouzasuu) AS kouzasuuxGOUKEI, 
SUM(sougouKouzasuu) AS sougouKouzasuuxGOUKEI 
FROM public.test01 
WHERE tenpoCord <= '001' AND 
kamokuCord = '01' AND 
kouzaKatujyoutaiCord = '0' 
GROUP BY kinkoCord,tenpoCord; 
Shouldn’t parallel aggregate come into play regardless of scan selectivity? I 
know in PostGIS land there’s a lot of stuff like:

SELECT ST_Union(geom) FROM t GROUP BY areacode;

Basically, in the BI case, there’s often no filter at all. Hoping that’s 
considered a prime case for parallel agg :)

P

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-14 Thread Paul Ramsey

On Wed, Dec 2, 2015 at 1:55 PM, Robert Haas  wrote:

> Oops.  The new version I've attached should fix this.

I've been trying to see if parallel join has any effect on PostGIS
spatial join queries, which are commonly CPU bound. (My tests [1] on
simple parallel scan were very positive, though quite limited in that
they only parallelized such a small part of the work).

Like Amit, I found the current patches are missing a change to
src/include/nodes/relation.h, but just adding in "Relids
extra_lateral_rels" to JoinPathExtraData allowed a warning-free build.

The assumptions on parallel code in generally are that setting up
parallel workers is very costly compared to the work to be done, so to
get PostGIS code to parallelize I've been reduced to shoving
parallel_setup_cost very low (1) and parallel_tuple_cost as well.
Otherwise I just end up with ordinary plans. I did redefine all the
relevant functions as "parallel safe" and upped their declared costs
significantly.

I set up a 8000 record spatial table, with a spatial index, and did a
self-join on it.

explain analyze
select a.gid, b.gid from vada a join vada b
on st_intersects(a.geom, b.geom)
where a.gid != b.gid;

With no parallelism, I got this:

set max_parallel_degree = 0;

  QUERY PLAN
--
 Nested Loop  (cost=0.15..227332.48 rows=1822243 width=8) (actual
time=0.377..5528.461 rows=52074 loops=1)
   ->  Seq Scan on vada a  (cost=0.00..1209.92 rows=8792 width=1189)
(actual time=0.027..5.004 rows=8792 loops=1)
   ->  Index Scan using vada_gix on vada b  (cost=0.15..25.71 rows=1
width=1189) (actual time=0.351..0.622 rows=6 loops=8792)
 Index Cond: (a.geom && geom)
 Filter: ((a.gid <> gid) AND _st_intersects(a.geom, geom))
 Rows Removed by Filter: 3
 Planning time: 3.976 ms
 Execution time: 5533.573 ms


With parallelism, I got this:

   QUERY PLAN
-
 Nested Loop  (cost=0.15..226930.05 rows=1822243 width=8) (actual
time=0.840..5462.029 rows=52074 loops=1)
   ->  Gather  (cost=0.00..807.49 rows=8792 width=1189) (actual
time=0.335..39.326 rows=8792 loops=1)
 Number of Workers: 1
 ->  Parallel Seq Scan on vada a  (cost=0.00..806.61 rows=5861
width=1189) (actual time=0.015..10.167 rows=4396 loops=2)
   ->  Index Scan using vada_gix on vada b  (cost=0.15..25.71 rows=1
width=1189) (actual time=0.353..0.609 rows=6 loops=8792)
 Index Cond: (a.geom && geom)
 Filter: ((a.gid <> gid) AND _st_intersects(a.geom, geom))
 Rows Removed by Filter: 3
 Planning time: 4.019 ms
 Execution time: 5467.126 ms

Given that it's a CPU-bound process, I was hoping for something closer
to the results of the sequence tests, about 50% time reduction, based
on the two cores in my test machine.

In general either the parallel planner is way too conservative (it
seems), or we need to radically increase the costs of our PostGIS
functions (right now, most "costly" functions are cost 100, but I had
to push costs up into the 10 range to get parallelism to kick in
sometimes). Some guidelines on cost setting would be useful, something
that says, "this function run against this kind of data is cost level
1, compare the time your function takes on 'standard' data to the
baseline function to get a cost ratio to use in the function
definition"

ATB,

P.



[1]  https://gist.github.com/pramsey/84e7a39d83cccae692f8


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-14 Thread Paul Ramsey

On Thu, Dec 10, 2015 at 10:42 PM, Haribabu Kommi
 wrote:
>
> Here I attached a POC patch of parallel aggregate based on combine
> aggregate patch. This patch contains the combine aggregate changes
> also. This patch generates and executes the parallel aggregate plan
> as discussed in earlier threads.

I tried this out using PostGIS with no great success. I used a very
simple aggregate for geometry union because the combine function is
just the same as the transfer function for this case (I also mark
ST_Area() as parallel safe, so that the planner will attempt to
parallelize the query)..

  CREATE AGGREGATE ST_MemUnion (
   basetype = geometry,
   sfunc = ST_Union,
   cfunc = ST_Union,
   stype = geometry
  );

Unfortunately attempting a test causes memory corruption and death.

  select riding,
  st_area(st_memunion(geom))
  from vada group by riding;

The explain looks OK:

  QUERY PLAN
---
 Finalize HashAggregate  (cost=220629.47..240380.26 rows=79 width=1189)
   Group Key: riding
   ->  Gather  (cost=0.00..807.49 rows=8792 width=1189)
 Number of Workers: 1
 ->  Partial HashAggregate  (cost=220628.59..220629.38 rows=79
width=1189)
   Group Key: riding
   ->  Parallel Seq Scan on vada  (cost=0.00..806.61
rows=8792 width=1189)

But the run dies.

NOTICE:  SRID value -32897 converted to the officially unknown SRID value 0
ERROR:  Unknown geometry type: 2139062143 - Invalid type

>From the message it looks like geometry gets corrupted at some point,
causing a read to fail on very screwed up metadata.

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] PostGIS Doc Urls

2015-12-12 Thread Paul Ramsey

The attached patch changes web references to PostGIS to point to the actual 
community site (postgis.net) which is active, rather than the historical site 
(postgis.org).

P.


-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

postgis_doc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-11-03 Thread Paul Ramsey

Thanks everyone for the held and feedback on this patch!

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

On November 3, 2015 at 3:47:37 PM, Tom Lane (t...@sss.pgh.pa.us) wrote:

Robert Haas <robertmh...@gmail.com> writes:  
> On Tue, Nov 3, 2015 at 2:57 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:  
>> Paul Ramsey <pram...@cleverelephant.ca> writes:  
>>> [ 20151006b_postgres_fdw_extensions.patch ]  

>> There might be a case for raising a WARNING during  
>> postgres_fdw_validator(), but no more than that, IMO. Certainly ERROR  
>> during regular use of the server is right out.  

> Agreed. I don't know whether it's better to emit a WARNING or some  
> lower-level message (INFO, DEBUG), but I think an ERROR will suck due  
> to the pg_dump issues you mention.  

I've committed this with a WARNING during validation and no comment  
otherwise.  

I left out the proposed regression tests because they fail in "make  
installcheck" mode, unless you've previously built and installed cube  
and seg, which seems like an unacceptable requirement to me. I don't  
think that leaving the code untested is a good final answer, of course.  
The idea I was toying with was to create a dummy extension for testing  
purposes by means of doing a direct INSERT into pg_extension --- which  
is ugly and would only work for superusers, but the latter is true of  
"CREATE EXTENSION cube" too. Anybody have a better idea?  

regards, tom lane

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-10-06 Thread Paul Ramsey

On Tue, Oct 6, 2015 at 8:52 AM, Andres Freund  wrote:
> I think it'd be good to add a test exercising two servers with different
> extensions marked as shippable.

Done,
P


20151006b_postgres_fdw_extensions.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-10-06 Thread Paul Ramsey

 
On October 4, 2015 at 9:56:10 PM, Michael Paquier 
(michael.paqu...@gmail.com(mailto:michael.paqu...@gmail.com)) wrote:
> On Sun, Oct 4, 2015 at 11:40 AM, Paul Ramsey wrote:
> > I put all changes relative to your review here if you want a nice colorized
> > place to check
> >
> > https://github.com/pramsey/postgres/commit/ed33e7489601e659f436d6afda3cce28304eba50
>  
> - /* updatable is available on both server and table */
> + /* updatable option is available on both server and table */
> This is just noise (perhaps I am the one who introduced it, oops).

No, I did, it matches the change to the comment below that addresses Andres 
note. The principle of least change is butting up against the principle of 
language usage consistency here. Can remove, of course.

P 


--  
http://postgis.net  
http://cleverelephant.ca




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-10-06 Thread Paul Ramsey

On Tue, Oct 6, 2015 at 5:32 AM, Andres Freund  wrote:

> The problem is basically that cache invalidations can arrive while
> you're building new cache entries. Everytime you e.g. open a relation
> cache invalidations can arrive. Assume you'd written the code like:

> You're avoiding that by only entering into the hashtable *after* the
> lookup. And I think that deserves a comment.

Thanks, revised patch attached.

P.


20151006_postgres_fdw_extensions.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-10-06 Thread Paul Ramsey

On October 6, 2015 at 6:32:36 AM, Andres Freund 
(and...@anarazel.de(mailto:and...@anarazel.de)) wrote:

> On 2015-10-06 06:28:34 -0700, Paul Ramsey wrote:
> > +/*
> > + * is_shippable
> > + * Is this object (proc/op/type) shippable to foreign server?
> > + * Check cache first, then look-up whether (proc/op/type) is
> > + * part of a declared extension if it is not cached.
> > + */
> > +bool
> > +is_shippable(Oid objnumber, Oid classnumber, List *extension_list)
> > +{
> > + ShippableCacheKey key;
> > + ShippableCacheEntry *entry;
> > +
> > + /* Always return false if the user hasn't set the "extensions" option */
> > + if (extension_list == NIL)
> > + return false;
> > +
> > + /* Find existing cache, if any. */
> > + if (!ShippableCacheHash)
> > + InitializeShippableCache();
> > +
> > + /* Zero out the key */
> > + memset(, 0, sizeof(key));
> > +
> > + key.objid = objnumber;
> > + key.classid = classnumber;
> > +
> > + entry = (ShippableCacheEntry *)
> > + hash_search(ShippableCacheHash,
> > + (void *) ,
> > + HASH_FIND,
> > + NULL);
> > +
> > + /* Not found in ShippableCacheHash cache. Construct new entry. */
> > + if (!entry)
> > + {
> > + /*
> > + * Right now "shippability" is exclusively a function of whether
> > + * the obj (proc/op/type) is in an extension declared by the user.
> > + * In the future we could additionally have a whitelist of functions
> > + * declared one at a time.
> > + */
> > + bool shippable = lookup_shippable(objnumber, classnumber, extension_list);
> > +
> > + /*
> > + * Don't create a new hash entry until *after* we have the shippable
> > + * result in hand, as the shippable lookup might trigger a cache
> > + * invalidation.
> > + */
> > + entry = (ShippableCacheEntry *)
> > + hash_search(ShippableCacheHash,
> > + (void *) ,
> > + HASH_ENTER,
> > + NULL);
> > +
> > + entry->shippable = shippable;
> > + }
> > +
> > + if (!entry)
> > + return false;
> > + else
> > + return entry->shippable;
> > +}
>  
> Maybe I'm missing something major here. But given that you're looking up
> solely based on Oid objnumber, Oid classnumber, how would this cache
> work if there's two foreign servers with different extension lists?

*sigh*, no you’re not missing anything. In moving to the cache and marking 
things just as “shippable” I’ve lost the test that ensures they are shippable 
for this *particular* server (which only happens in the lookup stage). So yeah, 
the cache will be wrong in cases where different servers have different 
extension opti ons.

Thanks, 
P.


--  
http://postgis.net  
http://cleverelephant.ca




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-10-06 Thread Paul Ramsey

On Tue, Oct 6, 2015 at 6:55 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2015-10-06 06:42:17 -0700, Paul Ramsey wrote:
>> *sigh*, no you’re not missing anything. In moving to the cache and
>> marking things just as “shippable” I’ve lost the test that ensures
>> they are shippable for this *particular* server (which only happens in
>> the lookup stage). So yeah, the cache will be wrong in cases where
>> different servers have different extension opti ons.
>
> Should be easy enough to fix - just add the server's oid to the key.
>
> But I guess you're already on that.

Yep, just a big chastened :)
Thanks for the catch, here's the patch,
P


20151006a_postgres_fdw_extensions.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-10-03 Thread Paul Ramsey

Andres, 
Thanks so much for the review!

I put all changes relative to your review here if you want a nice colorized 
place to check

https://github.com/pramsey/postgres/commit/ed33e7489601e659f436d6afda3cce28304eba50

On October 3, 2015 at 8:49:04 AM, Andres Freund (and...@anarazel.de) wrote:

> +  /* this must have already-installed extensions */ 

I don't understand that comment. 
Fixed, I hope.

> +  /* extensions is available on server */ 
> +  {"extensions", ForeignServerRelationId, false}, 

awkward spelling in comment. 
Fixed, I hope.

> + * throw up an error. 
> + */ 

s/throw up an error/throw an error/ or raise an error. 
But “throw up” is so evocative :) fixed.

> + /* Optional extensions to support (list of oid) */ 

*oids 
Fixed.

> + /* Always return false if we don't have any declared extensions */ 

Imo there's nothing declared here... 
Changed...

> + if (extension_list == NIL) 
> +  return false; 
> + 
> + /* We need this relation to scan */ 

Not sure what that means. 
Me neither, removed.

> +  if (foundDep->deptype == DEPENDENCY_EXTENSION && 
> +  list_member_oid(extension_list, foundDep->refobjid)) 
> +  { 
> +  is_shippable = true; 
> +  break; 
> +  } 
> + } 

Hm. 
I think this “hm” is addressed lower down.

> + /* Always return false if we don't have any declared extensions */ 
> + if (extension_list == NIL) 
> +  return false; 

I again dislike declared here ;) 
Altered.

> + key.objid = objnumber; 

Hm. Oids can conflict between different system relations. Shouldn't the 
key be over class (i.e. pg_proc, pg_type etc.) and object id? 
I’ve changed the lookup to use class/obj instead. I’m *hoping* I don’t get 
burned by it, but it regresses fine at least. Each call to is_shippable now has 
a hard-coded class oid in it depending on the context of the call. It seemed 
like the right way to do it.

> +  /* 
> +  * Right now "shippability" is exclusively a function of whether 
> +  * the obj (proc/op/type) is in an extension declared by the user. 
> +  * In the future we could additionally have a whitelist of functions 
> +  * declared one at a time. 
> +  */ 
> +  bool shippable = lookup_shippable(objnumber, extension_list); 
> + 
> +  entry = (ShippableCacheEntry *) 
> +  hash_search(ShippableCacheHash, 
> +  (void *) , 
> +  HASH_ENTER, 
> +  NULL); 
> + 
> +  entry->shippable = shippable; 
> + } 

I suggest adding a comment that we consciously are only HASH_ENTERing 
the key after doing the lookup_shippable(), to avoid the issue that the 
lookup might accept cache invalidations and thus delete the entry again. 
I’m not understanding this one. I only ever delete cache entries in bulk, when 
InvalidateShippableCacheCallback gets called on updates to the foreign server 
definition. I must not be quite understanding your gist.

Thanks!

P

20151003_postgres_fdw_extensions.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-09-30 Thread Paul Ramsey

On September 30, 2015 at 7:06:58 AM, Tom Lane (t...@sss.pgh.pa.us) wrote:
Paul Ramsey <pram...@cleverelephant.ca> writes: 
> Hm. Wouldn't it be just fine if only the server is able to define a  
> list of extensions then? It seems to me that all the use-cases of this  
> feature require to have a list of extensions defined per server, and  
> not per fdw type. This would remove a level of complexity in your  
> patch without impacting the feature usability as well. I would  
> personally go without it but I am fine to let a committer (Tom?) put a  
> final judgement stamp on this matter. Thoughts?  

Maybe I'm missing something, but I had envisioned the set of 
safe-to-transmit extensions as typically being defined at the 
foreign-server level. The reason being that you are really declaring two 
things: one is that the extension's operations are reproducible remotely, 
and the other is that the extension is in fact installed on this 
particular remote server. Perhaps there are use-cases for specifying it 
as an FDW option or per-table option, but per-server option seems by 
far the most plausible case. 
Fair enough. Declaring it for the whole database as an option to CREATE FOREIGN 
DATA WRAPPER was just a convenience really, so you could basically say “I 
expect this extension on all my servers”. But you’re right, logically “having 
the extension” is an attribute of the servers, so restricting it to the server 
definitions only has a nice simple logic to it.

P. 

-- 
http://postgis.net
http://cleverelephant.ca

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-09-30 Thread Paul Ramsey

 On September 30, 2015 at 3:32:21 PM, Michael Paquier 
(michael.paqu...@gmail.com) wrote:
OK. Once you can get a new patch done with a reworked  
extractExtensionList, I'll get a new look at it in a timely fashion  
and then let's move it to a committer's hands.  

Done and thanks!
P

-- 
http://postgis.net
http://cleverelephant.ca



fdw-extension-support8.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-09-30 Thread Paul Ramsey

On September 30, 2015 at 12:54:44 AM, Michael Paquier 
(michael.paqu...@gmail.com) wrote:

>> +extern bool extractExtensionList(char *extensionString, 
>> + List **extensionOids); 
>> What's the point of the boolean status in this new routine? The return 
>> value of extractExtensionList is never checked, and it would be more 
>> simple to just return the parsed list as return value, no? 
> 
> I started changing it, then found out why it is the way it is. During 
> the options parsing, the list of current extensionOids is passed in, 
> so that extra ones can be added, since both the wrapper and the server 
> can be declared with extensionOids. It's also doubling as a flag on 
> whether the function should bother to try and populate the list, or 
> just do a sanity check on the options string. I can change the 
> signature to 
> 
> extern List* extractExtensionList(char *extensionString, List 
> *currentExtensionOids, bool populateList); 
> to be more explicit if necessary. 

Hm. Wouldn't it be just fine if only the server is able to define a 
list of extensions then? It seems to me that all the use-cases of this 
feature require to have a list of extensions defined per server, and 
not per fdw type. This would remove a level of complexity in your 
patch without impacting the feature usability as well. I would 
personally go without it but I am fine to let a committer (Tom?) put a 
final judgement stamp on this matter. Thoughts? 
Simon wanted to be able to define extensions at the wrapper level as well as 
the server level. It’s a nice enough little feature to warrant a small amount 
of added complexity, IMO. I’m going to change the signature, since verbose 
clarity > terse obscurity for my wee brain (only took me a couple weeks to 
forget why that signature was what it wa s).

>> +-- === 
>> +-- clean up 
>> +-- === 
>> Perhaps here you meant dropping the schema and the foreign tables 
>> created previously? 
> 
> I did, but since postgres_fdw doesn't clean up after itself, perhaps 
> just leaving the room messy is the appropriate behaviour? 

That's appropriate when keeping around objects that are aimed to be 
tested with for example pg_upgrade, some regression tests of 
src/test/regress do that on purpose for example. Now it is true that 
an extension regression database will be normally reused by another 
extension just after, and that there is no general rule on this 
I’ll clean up then, I was just being lazy.

P.

-- 
http://postgis.net
http://cleverelephant.ca

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-09-28 Thread Paul Ramsey

On Sat, Sep 26, 2015 at 5:41 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Sat, Sep 26, 2015 at 4:29 AM, Paul Ramsey wrote:
>> On Thu, Aug 20, 2015 at 6:01 PM, Michael Paquier wrote:
>> src/backend/utils/adt/format_type.c
>> +/*
>> + * This version allows a nondefault typemod to be specified and fully
>> qualified.
>> + */
>> +char *
>> +format_type_with_typemod_qualified(Oid type_oid, int32 typemod)
>> +{
>> +   return format_type_internal(type_oid, typemod, true, false, true);
>> +}
>
> Patch 0001 in this email has a routine called format_type_detailed
> that I think is basically what we need for this stuff:
> http://www.postgresql.org/message-id/CACACo5Q_UXYwF117LBhjZ3xaMPyrgqnqE=mxvrhefjj51ac...@mail.gmail.com
> Could you look at it?

I'm not sure it helps much. I'd still need a function to turn the
output into a formatted string, but more importantly the big question
for me to avoid breaking regression is: if it's built-in, don't schema
qualitfy it; if it's extended do so. I'm not seeing why ignoring the
typmod in the case of deparsing extended type constants is going to be
a problem? All the old behaviour is untouched in the current patch.

> + * shippable.c
> + *   Non-built-in objects cache management and utilities.
> + *
> + * Is a non-built-in shippable to the remote server? Only if
> + * the object is in an extension declared by the user in the
> + * OPTIONS of the wrapper or the server.
> This is rather unclear. It would be more simple to describe that as
> "Facility to track database objects shippable to a foreign server".

Done

> +extern bool extractExtensionList(char *extensionString,
> +   List **extensionOids);
> What's the point of the boolean status in this new routine? The return
> value of extractExtensionList is never checked, and it would be more
> simple to just return the parsed list as return value, no?

I started changing it, then found out why it is the way it is. During
the options parsing, the list of current extensionOids is passed in,
so that extra ones can be added, since both the wrapper and the server
can be declared with extensionOids. It's also doubling as a flag on
whether the function should bother to try and populate the list, or
just do a sanity check on the options string. I can change the
signature to

extern List* extractExtensionList(char *extensionString, List
*currentExtensionOids, bool populateList);

to be more explicit if necessary.

> -REGRESS = postgres_fdw
> +REGRESS = postgres_fdw shippable
> +EXTRA_INSTALL = contrib/seg
> The order of the tests is important and should be mentioned,
> shippable.sql using the loopback server created by postgres_fdw.

Done

> +-- ===
> +-- clean up
> +-- ===
> Perhaps here you meant dropping the schema and the foreign tables
> created previously?

I did, but since postgres_fdw doesn't clean up after itself, perhaps
just leaving the room messy is the appropriate behaviour?

> +CREATE SCHEMA "SH 1";
> Is there a reason to use double-quoted relations? There is no real
> reason to not use them, this is just to point out that this is not a
> common practice in the other regression tests.

Just following the pattern from postgres_fdw. And since I found things
to be sensitive to full qualification of function names, etc, it
seemed like a good idea to try out odd named tables/schemas since the
pattern was there to follow.

I also changed the extension being tested from 'seg' to 'cube', since
cube had some functions I could test as well as operators.

P.



> --
> Michael
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index d2b98e1..f78fc64 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -1,7 +1,7 @@
 # contrib/postgres_fdw/Makefile
 
 MODULE_big = postgres_fdw
-OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES)
+OBJS = postgres_fdw.o option.o deparse.o connection.o shippable.o $(WIN32RES)
 PGFILEDESC = "postgres_fdw - foreign data wrapper for PostgreSQL"
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
@@ -10,7 +10,9 @@ SHLIB_LINK = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+# Note: shippable tests depend on postgres_fdw tests setup
+REGRESS = postgres_fdw shippable
+EXTRA_INSTALL = contrib/cube
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 697de60..897ecdb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -233,6 +233,9 @@ foreign_expr_walker(Node *node,
Oid collation;
FDWCollateStat

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-09-25 Thread Paul Ramsey

Back from summer and conferencing, and finally responding, sorry for
the delay...

On Thu, Aug 20, 2015 at 6:01 PM, Michael Paquier
 wrote:
>
>
> if (needlabel)
> appendStringInfo(buf, "::%s",
> -
> format_type_with_typemod(node->consttype,
> -
> node->consttypmod));
> +
> format_type_be_qualified(node->consttype));
> Pondering more about this one, I think that we are going to need a new
> routine in format_type.c to be able to call format_type_internal as
> format_type_internal(type_oid, typemod, true/false, false, true). If typemod
> is -1, then typemod_given (the third argument) is false, otherwise
> typemod_given is true. That's close to what the C function format_type at
> the SQL level can do except that we want it to be qualified. Regression
> tests will need an update as well.

I ended up switching on whether the type being formatted was an
extension type or not. Extension types need to be fully qualified or
they won't get found by the remote. Conversely if you fully qualify
the built-in types, the regression test for postgres_fdw isn't happy.
This still isn't quite the best/final solution, perhaps something as
simple as this would work, but I'm not sure:

src/backend/utils/adt/format_type.c
+/*
+ * This version allows a nondefault typemod to be specified and fully
qualified.
+ */
+char *
+format_type_with_typemod_qualified(Oid type_oid, int32 typemod)
+{
+   return format_type_internal(type_oid, typemod, true, false, true);
+}

> Comment format is incorrect, this should be written like that:

Comments fixed.

> +   if (!extension_list)
> +   return false;
> I would rather mark that as == NIL instead, NIL is what an empty list is.

Done

> +extern bool is_shippable(Oid procnumber, PgFdwRelationInfo *fpinfo);
> There is no need to pass PgFdwRelationInfo to is_shippable. Only the list of
> extension OIDs is fine.

Done

> That's nitpicking, but normally we avoid more than 80 characters per line of
> code.

Done.

> When attempting to create a server when an extension is not installed we get
> an appropriate error:
> =# CREATE SERVER postgres_server
>  FOREIGN DATA WRAPPER postgres_fdw
> OPTIONS (host 'localhost', port '5432', dbname 'postgres', extensions
> 'seg');
> ERROR:  42601: the "seg" extension must be installed locally before it can
> be used on a remote server
> LOCATION:  extractExtensionList, shippable.c:245
> However, it is possible to drop an extension listed in a postgres_fdw
> server. Shouldn't we invalidate cache as well when pg_extension is updated?
> Thoughts?

For extensions with types, it's pretty hard to pull the extension out
from under the FDW, because the tables using the type will depend on
it. For simpler function-only extensions, it's possible, but as soon
as you pull the extension out and run a query, your FDW will notice
the extension is missing and complain. And then you'll have to update
the foreign server or foreign table entries and the cache will get
flushed. So there's no way to get a stale cache.

> +   list_free(extlist);
> It does not matter much to call list_free here. The list is going to be
> freed in the error context with ERROR.

Done.

> +   foreach(ext, extension_list)
> +   {
> +   Oid extension_oid = (Oid) lfirst(ext);
> +   if (foundDep->refobjid == extension_oid)
> +   {
> +   nresults++;
> +   }
> +   }
> You could just use list_member_oid here. And why not just breaking the loop
> once there is a match? This is doing unnecessary work visibly

Done.

> +By default only built-in operators and functions will be sent from the
> +local to the foreign server. This may be overridden using the following
> option:
> Missing a mention to "data types" here?

Actually extension data types traverse the postgres_fdw without
trouble without this patch, as long as both sides have the extension
installed. So not strictly needed to mention data types.

> It would be good to get some regression tests for this feature, which is
> something doable with the flag EXTRA_INSTALL and some tests with EXPLAIN
> VERBOSE. Note that you will need to use CREATE EXTENSION to make the
> extension available in the new test, which should be I imagine a new file
> like shippable.sql.

I've put a very very small regression file in that tests the shippable
feature, which can be fleshed out further as needed.

Thanks so much!

P.

> Regards,
> --
> Michael
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index d2b98e1..63576c4 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -1,7 +1,7 @@
 # contrib/postgres_fdw/Makefile
 
 MODULE_big = postgres_fdw
-OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES)
+OBJS =

Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Paul Ramsey

On August 20, 2015 at 2:17:31 AM, Simon Riggs 
(si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:

 On 18 August 2015 at 21:03, Paul Ramsey wrote:
  
  So I need a way to either (a) notice when I already have a (old) copy
  of the library loaded and avoid trying to setup the GUC in that case
  or (b) set-up the GUC in a somewhat less brittle way than
  DefineCustomStringVariable() allows, something that can overwrite
  things instead of just erroring out.
  
 Are you trying to preserve the in-memory state across upgrade as well? It 
 sounds unlikely we can support that directly in the general case. 

I’m not sure what you mean by this.

 Sounds like we need RedefineCustomStringVariable() 

Yes, if that had existed we would not have had any problems (as long as it 
delegated back to Define..() in the case where the variable hadn’t been created 
yet…, since one of the problems is knowing if/to-what-extent a custom variable 
has already been defined).

We do now have a fix, which involved copying about 100 lines of core code 
(find_option, guc_var_compare, guc_name_compare) up, that does a low level 
search to see if there is a config_generic for a particular variable name, and 
if so whether it’s a placeholder or not. The presence of a non-placeholding 
definition is used as a “uh oh, there’s already a library in here” warning 
which keeps us from re-defining the variable and causing trouble.

P.

  
  --
 Simon Riggs http://www.2ndQuadrant.com/(http://www.2ndquadrant.com/)
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Paul Ramsey


On August 20, 2015 at 7:21:10 AM, Tom Lane 
(t...@sss.pgh.pa.us(mailto:t...@sss.pgh.pa.us)) wrote:

 I'm not sure that the situation you describe can be expected to work
 reliably; the problems are far wider than just GUC variables. If two
 different .so's are exposing broadly the same set of C function names,
 how can we be sure which one will get called by the dynamic linker?
 Isn't it likely that we'd end up continuing to call some functions
 out of the old .so, probably leading to disaster?

Well, when you put it that way it sounds pretty scary :) 

For whatever it’s worth, we’ve had versioned .so’s for a very long time, and 
every time an upgrade happens there is a period during which a backend will 
have two versions loaded simultaneously. But we only noticed recently when we 
got the GUC collision (our first GUC was only introduced in PostGIS 2.1). 
Perhaps because after upgrading most people disconnect, and then the next time 
they connect they only get the new library henceforth. In some cases (start a 
fresh backend and then do the upgrade immediately) it’s even possible to do an 
upgrade without triggering the double-load situation.

 I don't necessarily object to providing some solution for GUCs, but
 I think first we need to question whether that isn't just the tip of
 a large iceberg.

There’s no doubt that the GUC issue is just a symptom of a potentially awful 
larger disease, but so far it’s the only symptom we’ve observed. Maybe because 
only a small % of functionality ever changes in a given release, combined with 
the relatively short lifespan of the double-loaded condition, and the fact the 
problem goes away immediately on re-connect, any problems have always just been 
ignored.

P.

 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Extension upgrade and GUCs

2015-08-18 Thread Paul Ramsey

Hi hackers, I've been wrestling with this one for a while and gone
down a couple blind alleys, so time to ask the experts.

PostGIS has a couple things different from the extensions that live in contrib,

- it has some GUCs
- it has a versioned loadable library (postgis-2.1.so, postgis-2.2.so, etc)

We've found that, when we run the following SQL,

  CREATE EXTENSION postgis VERSION '2.1.9';
  ALTER EXTENSION postgis UPDATE TO '2.2.0';

The update fails, because of a collision in the GUC.

When the extension is CREATEd, postgis-2.1.so is loaded, _PG_init() is
called, and that in turn calls DefineCustomStringVariable() to create
our GUC.

When the ALTER is called, the first time a C function definition is
called, the new library, postgis-2.2.so is loaded, the _PG_init() of
*that* library is called, and DefineCustomStringVariable() is called,
but this time it runs into the GUC definition from the first library
load, and the EXTENSION update process stops as an ERROR is thrown.

My initial attempt at avoiding this problem involved looking at
GetConfigOption() before running DefineCustomStringVariable() to see
if the GUC was already defined. This did not work, as it's possible to
define a GUC before loading the library. So an otherwise innocent
sequence of commands like:

  SET my_guc = 'foo'; -- no library loaded yet
  SELECT my_library_function(); -- causes library load and _PG_init() to fire

Would now fail, as it hit my test for a pre-existing GUC.

Unfortunately, the GUC we are using is not one where we simply read a
value now and a again. It switches the backend geometry library that
various functions use, so performance is a big deal: instead of
reading the GUC value, the code expects that GUC changes will flip a
global variable, using the GUC assign callback.

So I need a way to either (a) notice when I already have a (old) copy
of the library loaded and avoid trying to setup the GUC in that case
or (b) set-up the GUC in a somewhat less brittle way than
DefineCustomStringVariable() allows, something that can overwrite
things instead of just erroring out.

The ugly code in question is here

https://github.com/postgis/postgis/blob/svn-trunk/postgis/lwgeom_backend_api.c#L105

Discussion is here

https://trac.osgeo.org/postgis/ticket/2382

Thanks,

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-08-04 Thread Paul Ramsey

Thanks so much Michael! Let me know when you have further feedback I should 
incorporate.

ATB,
P. 

-- 
http://postgis.net
http://cleverelephant.ca


On July 25, 2015 at 4:52:11 AM, Michael Paquier (michael.paqu...@gmail.com) 
wrote:

On Sat, Jul 25, 2015 at 2:19 AM, Paul Ramsey pram...@cleverelephant.ca wrote: 
 
 On Thu, Jul 23, 2015 at 7:48 AM, Paul Ramsey pram...@cleverelephant.ca 
 wrote:  
 Here's an updated patch that clears the cache on changes to foreign  
 wrappers and servers.  
  
 Any chance one of you folks could by my official commitfest reviewer?  
 Appreciate all the feedback so far!  
  
 https://commitfest.postgresql.org/6/304/  

That's an interesting topic, I will register as such.  
--  
Michael

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-24 Thread Paul Ramsey

On Thu, Jul 23, 2015 at 7:48 AM, Paul Ramsey pram...@cleverelephant.ca wrote:
 Here's an updated patch that clears the cache on changes to foreign
 wrappers and servers.

Any chance one of you folks could by my official commitfest reviewer?
Appreciate all the feedback so far!

https://commitfest.postgresql.org/6/304/

Thanks,

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-23 Thread Paul Ramsey

On Wed, Jul 22, 2015 at 12:19 PM, Paul Ramsey pram...@cleverelephant.ca wrote:

 I’ll have a look at doing invalidation for the case of changes to the FDW
 wrappers and servers.

Here's an updated patch that clears the cache on changes to foreign
wrappers and servers.
In testing it I came across an unrelated issue which could make it
hard for users to manage the options on their wrappers/servers

fdw=# ALTER SERVER foreign_server OPTIONS ( extensions 'postgis' );
ALTER SERVER
fdw=# ALTER SERVER foreign_server OPTIONS ( extensions 'postgis,seg' );
ERROR:  option extensions provided more than once

Once set, an option seems to be effectively immutable.
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index d2b98e1..3543312 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -1,7 +1,7 @@
 # contrib/postgres_fdw/Makefile
 
 MODULE_big = postgres_fdw
-OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES)
+OBJS = postgres_fdw.o option.o deparse.o connection.o shippable.o $(WIN32RES)
 PGFILEDESC = postgres_fdw - foreign data wrapper for PostgreSQL
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..d6fff30 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -229,6 +229,9 @@ foreign_expr_walker(Node *node,
Oid collation;
FDWCollateState state;
 
+   /* Access extension metadata from fpinfo on baserel */
+   PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo 
*)(glob_cxt-foreignrel-fdw_private);
+
/* Need do nothing for empty subexpressions */
if (node == NULL)
return true;
@@ -361,7 +364,7 @@ foreign_expr_walker(Node *node,
 * can't be sent to remote because it might 
have incompatible
 * semantics on remote side.
 */
-   if (!is_builtin(fe-funcid))
+   if (!is_builtin(fe-funcid)  
!is_shippable(fe-funcid, fpinfo))
return false;
 
/*
@@ -407,7 +410,7 @@ foreign_expr_walker(Node *node,
 * (If the operator is, surely its underlying 
function is
 * too.)
 */
-   if (!is_builtin(oe-opno))
+   if (!is_builtin(oe-opno)  
!is_shippable(oe-opno, fpinfo))
return false;
 
/*
@@ -445,7 +448,7 @@ foreign_expr_walker(Node *node,
/*
 * Again, only built-in operators can be sent 
to remote.
 */
-   if (!is_builtin(oe-opno))
+   if (!is_builtin(oe-opno)  
!is_shippable(oe-opno, fpinfo))
return false;
 
/*
@@ -591,7 +594,7 @@ foreign_expr_walker(Node *node,
 * If result type of given expression is not built-in, it can't be sent 
to
 * remote because it might have incompatible semantics on remote side.
 */
-   if (check_type  !is_builtin(exprType(node)))
+   if (check_type  !is_builtin(exprType(node))  
!is_shippable(exprType(node), fpinfo))
return false;
 
/*
@@ -1404,8 +1407,7 @@ deparseConst(Const *node, deparse_expr_cxt *context)
}
if (needlabel)
appendStringInfo(buf, ::%s,
-
format_type_with_typemod(node-consttype,
-   
  node-consttypmod));
+   format_type_be_qualified(node-consttype));
 }
 
 /*
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 7547ec2..9aeaf1a 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -15,6 +15,7 @@
 #include postgres_fdw.h
 
 #include access/reloptions.h
+#include catalog/pg_foreign_data_wrapper.h
 #include catalog/pg_foreign_server.h
 #include catalog/pg_foreign_table.h
 #include catalog/pg_user_mapping.h
@@ -124,6 +125,10 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 errmsg(%s requires a 
non-negative numeric value,
def-defname)));
}
+   else if (strcmp(def-defname, extensions) == 0)
+   {
+   extractExtensionList(defGetString(def), NULL);
+   }
}
 
PG_RETURN_VOID();
@@ -153,6 +158,9 @@ InitPgFdwOptions(void)
/* updatable is available on both server and table

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-23 Thread Paul Ramsey

On Thu, Jul 23, 2015 at 7:48 AM, Paul Ramsey pram...@cleverelephant.ca wrote:

 In testing it I came across an unrelated issue which could make it
 hard for users to manage the options on their wrappers/servers

 fdw=# ALTER SERVER foreign_server OPTIONS ( extensions 'postgis' );
 ALTER SERVER
 fdw=# ALTER SERVER foreign_server OPTIONS ( extensions 'postgis,seg' );
 ERROR:  option extensions provided more than once

 Once set, an option seems to be effectively immutable.

Whoops, I see I just didn't read the fully syntax on providing options, using

ALTER SERVER foreign_server OPTIONS ( SET extensions 'postgis,seg' );

works just fine. Sorry for noise,

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Paul Ramsey

On July 22, 2015 at 12:15:14 PM, Andres Freund (and...@anarazel.de) wrote:
It doesn't seem that unlikely that somebody does an ALTER SERVER OPTIONS 
SET .. to add an extension to be shippable while connections are already 
using the fdw. It'll be confusing if some clients are fast and some 
others are really slow. 
This seems more likely than anyone mucking around with extension stuff (adding 
new functions (and working with FDW in production at the same time?)) or 
adding/dropping whole extensions (you’ll have more problems than a stale cache, 
whole columns will disappear if you DROP EXTENSION postgis CASCADE), and has 
the added benefit of not needing me to muck into core stuff for my silly 
feature.

I’ll have a look at doing invalidation for the case of changes to the FDW 
wrappers and servers.

P.

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

On Tue, Jul 21, 2015 at 7:45 AM, Andres Freund and...@anarazel.de wrote:

 +
 + /* We need this relation to scan */
 + depRel = heap_open(DependRelationId, RowExclusiveLock);
 +
 + /* Scan the system dependency table for a all entries this operator */
 + /* depends on, then iterate through and see if one of them */
 + /* is a registered extension */
 + ScanKeyInit(key[0],
 + Anum_pg_depend_objid,
 + BTEqualStrategyNumber, F_OIDEQ,
 + ObjectIdGetDatum(procnumber));
 +
 + scan = systable_beginscan(depRel, DependDependerIndexId, true,
 +   
 GetCatalogSnapshot(depRel-rd_id), nkeys, key);
 +
 + while (HeapTupleIsValid(tup = systable_getnext(scan)))
 + {
 + Form_pg_depend foundDep = (Form_pg_depend) GETSTRUCT(tup);
 +
 + if ( foundDep-deptype == DEPENDENCY_EXTENSION )
 + {
 + List *extlist = fpinfo-extensions;
 + ListCell *ext;
 +
 + foreach(ext, extlist)
 + {
 + Oid extension_oid = (Oid) lfirst(ext);
 + if ( foundDep-refobjid == extension_oid )
 + {
 + nresults++;
 + }
 + }
 + }
 + if ( nresults  0 ) break;
 + }
 +
 + systable_endscan(scan);
 + relation_close(depRel, RowExclusiveLock);
 +
 + return nresults  0;
 +}

 Phew. That's mighty expensive to do at frequency.

 I guess it'll be more practical to expand this list once and then do a
 binary search on the result for the individual functions

So, right after reading the options in postgresGetForeignRelSize,
expand the extension list into a list of all ops/functions, in a
sorted list, and let that carry through to the deparsing instead? That
would happen once per query, right? But the deparsing also happens
once per query too, yes? Is the difference going to be that big? (I
speak not knowing the overhead of things like systable_beginscan, etc)

Thanks!

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

On July 21, 2015 at 11:07:36 AM, Tom Lane (t...@sss.pgh.pa.us) wrote:
I'm inclined to think that it's not really necessary to worry about 
invalidating a per-connection cache of is this function safe to ship 
determinations.
So: yes to a local cache of all forwardable functions/ops, populated in full 
the first time through (does that speak maybe to using a binary search on a 
sorted list instead of a hash, since I only pay the sort price once and am not 
doing any insertions?). And then we just hold it until the connection goes 
away. 

Yes?

P.

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

 
On July 21, 2015 at 8:26:31 AM, Andres Freund 
(and...@anarazel.de(mailto:and...@anarazel.de)) wrote:
 On 2015-07-21 17:00:51 +0200, Andres Freund wrote:
  On 2015-07-21 07:55:17 -0700, Paul Ramsey wrote:
   On Tue, Jul 21, 2015 at 7:45 AM, Andres Freund wrote:
   So, right after reading the options in postgresGetForeignRelSize,
   expand the extension list into a list of all ops/functions, in a
   sorted list, and let that carry through to the deparsing instead?
 
  I'd actually try to make it longer lived, i.e. permanently. And just
  deallocate when a catcache callback says it needs to be invalidated;
  IIRC there is a relevant cache.
  
 On second thought I'd not use a binary search but a hash table. If you
 choose the right key a single table is enough for the lookup.
  
 If you need references for invalidations you might want to look for
 CacheRegisterSyscacheCallback callers. E.g. attoptcache.c

 On 2015-07-21 08:32:34 -0700, Paul Ramsey wrote: 
  Thanks! Reading some of the syscache callback stuff, one thing I’m not 
  sure of is which SysCacheIdentifier(s) I should be registering 
  callbacks against to ensure my list of funcs/procs that reside in 
  particular extensions is kept fresh. I don’t see anything tracking the 
  dependencies there 

 FOREIGNSERVEROID, and a new syscache on pg_extension.oid should suffice 
 I think. pg_foreign_server will be changed upon option changes and 
 pg_extension.oid on extension upgrades. 

 Since dependencies won't change independently of extension versions I 
 don't think we need to care otherwise. There's ALTER EXTENSION ... ADD 
 but I'm rather prepared to ignore that; if that's not ok it's trivial to 
 make it emit an invalidation. 

This hole just keeps getting deeper :) So,

- a HASH in my own code to hold all the functions that I consider “safe to 
forward” (which I’ll derive by reading the contents of the extensions the users 
declare)
- callbacks in my code registered using CacheRegisterSyscacheCallback on 
FOREIGNSERVEROID and __see_below__ that refresh my cache when called
- since there is no syscache for extensions right now, a new syscache entry for 
pg_extension.oid (and I ape things in syscache.h and syscache.c and Magic 
Occurs?)
- which means I can also CacheRegisterSyscacheCallback on the new EXTENSIONOID 
syscache
- and finally change my is_in_extension() to efficiently check my HASH instead 
of querying the system catalog

Folks are going to be OK w/ me dropping in new syscache entries so support my 
niche little feature?

ATB,

P






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

On July 21, 2015 at 11:22:12 AM, Tom Lane (t...@sss.pgh.pa.us) wrote:
 So: yes to a local cache of all forwardable functions/ops, populated in full 
 the first time through (does that speak maybe to using a binary search on a 
 sorted list instead of a hash, since I only pay the sort price once and am 
 not doing any insertions?). And then we just hold it until the connection 
 goes away.   

No, *not* populated first-time-through, because that won't handle any of  
the CREATE, DROP, or UPGRADE cases. It's also doing a lot of work you  
might never need. I was thinking of populate on demand, that is, first  
time you need to know whether function X is shippable, you find that out  
and then cache the answer (whether it be positive or negative).  

Roger that. Off to the races..

P

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

On Tue, Jul 21, 2015 at 11:29 AM, Paul Ramsey pram...@cleverelephant.ca wrote:
 On July 21, 2015 at 11:22:12 AM, Tom Lane (t...@sss.pgh.pa.us) wrote:

 No, *not* populated first-time-through, because that won't handle any of
 the CREATE, DROP, or UPGRADE cases. It's also doing a lot of work you
 might never need. I was thinking of populate on demand, that is, first
 time you need to know whether function X is shippable, you find that out
 and then cache the answer (whether it be positive or negative).


 Roger that. Off to the races..

Attached, reworked with a local cache. I felt a little dirty sticking
the cache entry right in postgres_fdw.c, so I broke out all my
nonsense into shippable.c.

Thanks!

P
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index d2b98e1..3543312 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -1,7 +1,7 @@
 # contrib/postgres_fdw/Makefile
 
 MODULE_big = postgres_fdw
-OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES)
+OBJS = postgres_fdw.o option.o deparse.o connection.o shippable.o $(WIN32RES)
 PGFILEDESC = postgres_fdw - foreign data wrapper for PostgreSQL
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..d6fff30 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -229,6 +229,9 @@ foreign_expr_walker(Node *node,
Oid collation;
FDWCollateState state;
 
+   /* Access extension metadata from fpinfo on baserel */
+   PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo 
*)(glob_cxt-foreignrel-fdw_private);
+
/* Need do nothing for empty subexpressions */
if (node == NULL)
return true;
@@ -361,7 +364,7 @@ foreign_expr_walker(Node *node,
 * can't be sent to remote because it might 
have incompatible
 * semantics on remote side.
 */
-   if (!is_builtin(fe-funcid))
+   if (!is_builtin(fe-funcid)  
!is_shippable(fe-funcid, fpinfo))
return false;
 
/*
@@ -407,7 +410,7 @@ foreign_expr_walker(Node *node,
 * (If the operator is, surely its underlying 
function is
 * too.)
 */
-   if (!is_builtin(oe-opno))
+   if (!is_builtin(oe-opno)  
!is_shippable(oe-opno, fpinfo))
return false;
 
/*
@@ -445,7 +448,7 @@ foreign_expr_walker(Node *node,
/*
 * Again, only built-in operators can be sent 
to remote.
 */
-   if (!is_builtin(oe-opno))
+   if (!is_builtin(oe-opno)  
!is_shippable(oe-opno, fpinfo))
return false;
 
/*
@@ -591,7 +594,7 @@ foreign_expr_walker(Node *node,
 * If result type of given expression is not built-in, it can't be sent 
to
 * remote because it might have incompatible semantics on remote side.
 */
-   if (check_type  !is_builtin(exprType(node)))
+   if (check_type  !is_builtin(exprType(node))  
!is_shippable(exprType(node), fpinfo))
return false;
 
/*
@@ -1404,8 +1407,7 @@ deparseConst(Const *node, deparse_expr_cxt *context)
}
if (needlabel)
appendStringInfo(buf, ::%s,
-
format_type_with_typemod(node-consttype,
-   
  node-consttypmod));
+   format_type_be_qualified(node-consttype));
 }
 
 /*
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 7547ec2..9aeaf1a 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -15,6 +15,7 @@
 #include postgres_fdw.h
 
 #include access/reloptions.h
+#include catalog/pg_foreign_data_wrapper.h
 #include catalog/pg_foreign_server.h
 #include catalog/pg_foreign_table.h
 #include catalog/pg_user_mapping.h
@@ -124,6 +125,10 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 errmsg(%s requires a 
non-negative numeric value,
def-defname)));
}
+   else if (strcmp(def-defname, extensions) == 0)
+   {
+   extractExtensionList(defGetString(def), NULL);
+   }
}
 
PG_RETURN_VOID();
@@ -153,6 +158,9 @@ InitPgFdwOptions(void)
/* updatable

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

Here's a third revision that allows the 'extensions' option on the
wrapper as well, so that supported extensions can be declared once in
one place.

Since the CREATE FOREIGN DATA WRAPPER statement is actually called
inside the CREATE EXTENSION script for postgres_fdw, the way to get
this option is actually to alter the wrapper,

  ALTER FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( extensions 'seg' );

Right now declaring extensions at different levels is additive, I
didn't add the option to revoke an extension for a particular server
definition (the cube,-postgis entry for example). If that's a
deal-breaker, I can add it too, but it felt like something that could
wait for a user to positively declare I must have that feature!

P.


On Fri, Jul 17, 2015 at 5:58 AM, Paul Ramsey pram...@cleverelephant.ca wrote:

 On July 17, 2015 at 5:57:42 AM, Simon Riggs 
 (si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:

 Options already exist on CREATE FOREIGN DATA WRAPPER, so it should be easy 
 to support that.

 I'd rather add it once on the wrapper than be forced to list all the options 
 on every foreign server, unless required to do so.

 Gotcha, that does make sense.

 P.
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..9c5136e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -34,11 +34,15 @@
 
 #include postgres_fdw.h
 
+#include access/genam.h
 #include access/heapam.h
 #include access/htup_details.h
 #include access/sysattr.h
 #include access/transam.h
+#include catalog/dependency.h
+#include catalog/indexing.h
 #include catalog/pg_collation.h
+#include catalog/pg_depend.h
 #include catalog/pg_namespace.h
 #include catalog/pg_operator.h
 #include catalog/pg_proc.h
@@ -49,8 +53,10 @@
 #include optimizer/var.h
 #include parser/parsetree.h
 #include utils/builtins.h
+#include utils/fmgroids.h
 #include utils/lsyscache.h
 #include utils/rel.h
+#include utils/snapmgr.h
 #include utils/syscache.h
 
 
@@ -136,6 +142,7 @@ static void printRemoteParam(int paramindex, Oid paramtype, 
int32 paramtypmod,
 deparse_expr_cxt *context);
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
   deparse_expr_cxt *context);
+static bool is_in_extension(Oid procid, PgFdwRelationInfo *fpinfo);
 
 
 /*
@@ -229,6 +236,9 @@ foreign_expr_walker(Node *node,
Oid collation;
FDWCollateState state;
 
+   /* Access extension metadata from fpinfo on baserel */
+   PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo 
*)(glob_cxt-foreignrel-fdw_private);
+
/* Need do nothing for empty subexpressions */
if (node == NULL)
return true;
@@ -361,7 +371,7 @@ foreign_expr_walker(Node *node,
 * can't be sent to remote because it might 
have incompatible
 * semantics on remote side.
 */
-   if (!is_builtin(fe-funcid))
+   if (!is_builtin(fe-funcid)  
!is_in_extension(fe-funcid, fpinfo))
return false;
 
/*
@@ -407,7 +417,7 @@ foreign_expr_walker(Node *node,
 * (If the operator is, surely its underlying 
function is
 * too.)
 */
-   if (!is_builtin(oe-opno))
+   if (!is_builtin(oe-opno)  
!is_in_extension(oe-opno, fpinfo))
return false;
 
/*
@@ -445,7 +455,7 @@ foreign_expr_walker(Node *node,
/*
 * Again, only built-in operators can be sent 
to remote.
 */
-   if (!is_builtin(oe-opno))
+   if (!is_builtin(oe-opno)  
!is_in_extension(oe-opno, fpinfo))
return false;
 
/*
@@ -591,7 +601,7 @@ foreign_expr_walker(Node *node,
 * If result type of given expression is not built-in, it can't be sent 
to
 * remote because it might have incompatible semantics on remote side.
 */
-   if (check_type  !is_builtin(exprType(node)))
+   if (check_type  !is_builtin(exprType(node))  
!is_in_extension(exprType(node), fpinfo))
return false;
 
/*
@@ -669,6 +679,65 @@ is_builtin(Oid oid)
 
 
 /*
+ * Returns true if given operator/function is part of an extension declared in 
the 
+ * server options.
+ */
+static bool
+is_in_extension(Oid procnumber, PgFdwRelationInfo *fpinfo)
+{
+   static int nkeys = 1;
+   ScanKeyData key[nkeys];
+   HeapTuple tup;
+   Relation depRel;
+   SysScanDesc scan

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-17 Thread Paul Ramsey

 
On July 17, 2015 at 12:49:04 AM, Simon Riggs 
(si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:
 On 17 July 2015 at 01:23, Michael Paquier wrote:
  
   Well, as I see it there’s three broad categories of behavior available:
  
   1- Forward nothing non-built-in (current behavior)
   2- Use options to forward only specified non-built-in things (either in
   function chunks (extensions, as in this patch) or one-by-one (mark your
   desired functions / ops)
   3- Forward everything if a “forward everything” option is set
   
  Then what you are describing here is a parameter able to do a switch
  among this selection:
  - nothing, which is to check on built-in stuff
  - extension list.
  - all.
  
 all seems to be a very blunt instrument but is certainly appropriate in 
 some cases  
  
 I see an intermediate setting, giving four categories in total  
  
 0. Nothing, as now  
 1. Extension list option on the Foreign Server
 2. Extension list option on the Foreign Data Wrapper, applies to all Foreign 
 Servers of that type
 3. All extensions allowed

I feel like a lot of knobs are being discussed for maximum configurability, but 
without knowing if people are going to show up with the desire to fiddle them 
:) if marking extensions is not considered a good API, then I’d lean towards 
(a) being able to toggle global fowarding on and off combined with (b) the 
ability to mark individual objects forwardable or not. Which I see, is almost 
what you’ve described.

There’s no facility to add OPTIONS to an EXTENSION right now, so this 
capability seems to be very much server-by-server (adding a FDW-specific 
capability to the EXTENSION mechanism seems like overkill for a niche feature 
addition).

But within the bounds of the SERVER, being able to build combinations of 
allowed/restricted forwarding is certainly powerful,

  CREATE SERVER foo 
    FOREIGN DATA WRAPPER postgres_fdw 
    OPTIONS ( host ‘localhost’, dbname ‘foo’, forward ‘all -postgis’ );

  CREATE SERVER foo 
    FOREIGN DATA WRAPPER postgres_fdw 
    OPTIONS ( host ‘localhost’, dbname ‘foo’, forward ’none +postgis’ );

  CREATE SERVER foo 
    FOREIGN DATA WRAPPER postgres_fdw 
    OPTIONS ( host ‘localhost’, dbname ‘foo’, forward ’none +’ );

Once we get to individual functions or operators it breaks down, since of 
course ‘’ refers to piles of different operators for different types. Their 
descriptions would be unworkably verbose once you have more than a tiny handful.

I’m less concerned with configurability than just the appropriateness of 
forwarding particular functions. In an earlier thread on this topic the problem 
of forwarding arbitrary user-defined PL/PgSQL functions was brought up. In 
passing it was mentioned that maybe VOLATILE functions should *never* be 
forwarded ever. Or, that only IMMUTABLE functions should be *ever* be 
forwarded. Since PostGIS includes a generous mix of all kinds of functions, 
discussing whether that kind of policy is required for any kind of additional 
forwarding would be interesting. Maybe IMMUTABLE/STABLE/VOLATILE doesn’t 
actually capture what it is that makes a function/op FORWARDABLE or not.
 






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-17 Thread Paul Ramsey


On July 17, 2015 at 5:57:42 AM, Simon Riggs 
(si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:

 Options already exist on CREATE FOREIGN DATA WRAPPER, so it should be easy to 
 support that.
  
 I'd rather add it once on the wrapper than be forced to list all the options 
 on every foreign server, unless required to do so.  

Gotcha, that does make sense.

P. 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-16 Thread Paul Ramsey

Michael, thanks so much for the review!

On July 15, 2015 at 7:35:11 PM, Michael Paquier (michael.paqu...@gmail.com) 
wrote:
This patch includes some diff noise, it would be better to remove that. 
Done.

- if (!is_builtin(fe-funcid)) 
+ if (!is_builtin(fe-funcid)  
(!is_in_extension(fe-funcid, fpinfo))) 
Extra parenthesis are not needed. 
OK, will remove.

+ if ( (!is_builtin(oe-opno))  
(!is_in_extension(oe-opno, fpinfo)) ) 
... And this does not respect the project code format. See here for 
more details for example: 
http://www.postgresql.org/docs/devel/static/source.html 
I’m sorry, that link doesn’t clarify for me what’s stylistically wrong here 
(it’s almost all about error messages), could you help (is it the padding 
around the conditionals? removed that)

+ /* PostGIS metadata */ 
+ List *extensions; 
+ bool use_postgis; 
+ Oid postgis_oid; 
This addition in PgFdwRelationInfo is surprising. What the point of 
keeping use_postgis and postgres_oid that are actually used nowhere? 
Whoops, a couple old pieces from my proof-of-concept survived the conversion to 
a generic features. Removed.

appendStringInfo(buf, ::%s, 
- format_type_with_typemod(node-consttype, 
- node-consttypmod)); 
+ format_type_be_qualified(node-consttype)); 
What's the reason for this change? 
Good question. As I recall, the very sparse search path the FDW connection 
makes can sometimes leave remote function failing to find other functions they 
need, so we want to force the calls to be schema-qualified. Unfortunately 
there’s no perfect public call for that, ideally it would be return 
format_type_internal(type_oid, typemod, true, false, true), but there’s no 
function like that, so I settled for format_type_be_qualified(), which forces 
qualification at the expense of ignoring the typmod.

Thinking a bit wider, why is this just limited to extensions? You may 
have as well other objects defined locally and remotely like functions 
or operators that are not defined in extensions, but as normal 
objects. Hence I think that what you are looking for is not this 
option, but a boolean option enforcing the behavior of code paths 
using is_builtin() in foreign_expr_walker such as the sanity checks on 
existing objects are not limited to FirstBootstrapObjectId but to 
other objects in the catalogs. 
Well, as I see it there’s three broad categories of behavior available:

1- Forward nothing non-built-in (current behavior)

2- Use options to forward only specified non-built-in things (either in 
function chunks (extensions, as in this patch) or one-by-one (mark your desired 
functions / ops)

3- Forward everything if a “forward everything” option is set

I hadn’t actually considered the possibility of option 3, but for my purposes 
it would work just as well, with the added efficiency bonus of not having to 
check whether particular funcs/ops are inside declared extensions. Both the 
current state of FDW and the patch I’m providing expect a *bit* of smarts on 
the part of users, to make sure their remote/local environments are in sync (in 
particular versions of pgsql and of extensions). Option 3 just ups the ante on 
that requirement. I’d be fine w/ this, makes the patch very simple and fast.

For option 2, marking things one at a time really isn’t practical for a package 
like PostGIS, with several hundred functions and operators. Using the larger 
block of “extension” makes more sense. I think in general, expecting users to 
itemize every func/op they want to forward, particular if they just want an 
extension to “work” over FDW is too big an expectation. That’s not to minimize 
the utility of being able to mark individual functions/ops for forwarding, but 
I think that’s a separate use case that doesn’t eliminate the need for 
extension-level forwarding.

Thanks again for the review!

P.

 

That's a risky option because it could 
lead to inconsistencies among objects, so obviously the default is 
false but by documenting correctly the risks of using this option we 
may be able to get something integrated (aka be sure that each object 
is defined consistently across the servers queried or you'll have 
surprising results!). In short, it seems to me that you are taking the 
wrong approach. 
-- 



fdw-extension-support-2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] postgres_fdw extension support

2015-07-15 Thread Paul Ramsey

Hi all,

Attached is a patch that implements the extension support discussed at
PgCon this year during the FDW unconference sesssion. Highlights:

* Pass extension operators and functions to the foreign server
* Only send ops/funcs if the foreign server is declared to support the
relevant extension, for example:

CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '127.0.0.1', port '5432', dbname 'my_db',
extensions 'cube, seg');

Github branch is here:
  https://github.com/pramsey/postgres/tree/fdw-extension-suppport

Synthetic pull request for easy browsing/commenting is here:
  https://github.com/pramsey/postgres/pull/1

Thanks!

Paul
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..bbe3c9d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -34,11 +34,15 @@
 
 #include postgres_fdw.h
 
+#include access/genam.h
 #include access/heapam.h
 #include access/htup_details.h
 #include access/sysattr.h
 #include access/transam.h
+#include catalog/dependency.h
+#include catalog/indexing.h
 #include catalog/pg_collation.h
+#include catalog/pg_depend.h
 #include catalog/pg_namespace.h
 #include catalog/pg_operator.h
 #include catalog/pg_proc.h
@@ -49,8 +53,10 @@
 #include optimizer/var.h
 #include parser/parsetree.h
 #include utils/builtins.h
+#include utils/fmgroids.h
 #include utils/lsyscache.h
 #include utils/rel.h
+#include utils/snapmgr.h
 #include utils/syscache.h
 
 
@@ -136,6 +142,7 @@ static void printRemoteParam(int paramindex, Oid paramtype, 
int32 paramtypmod,
 deparse_expr_cxt *context);
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
   deparse_expr_cxt *context);
+static bool is_in_extension(Oid procid, PgFdwRelationInfo *fpinfo);
 
 
 /*
@@ -167,6 +174,7 @@ classifyConditions(PlannerInfo *root,
}
 }
 
+
 /*
  * Returns true if given expr is safe to evaluate on the foreign server.
  */
@@ -177,7 +185,7 @@ is_foreign_expr(PlannerInfo *root,
 {
foreign_glob_cxt glob_cxt;
foreign_loc_cxt loc_cxt;
-
+   
/*
 * Check that the expression consists of nodes that are safe to execute
 * remotely.
@@ -207,6 +215,8 @@ is_foreign_expr(PlannerInfo *root,
return true;
 }
 
+
+
 /*
  * Check if expression is safe to execute remotely, and return true if so.
  *
@@ -229,6 +239,9 @@ foreign_expr_walker(Node *node,
Oid collation;
FDWCollateState state;
 
+   /* Access extension metadata from fpinfo on baserel */
+   PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo 
*)(glob_cxt-foreignrel-fdw_private);
+
/* Need do nothing for empty subexpressions */
if (node == NULL)
return true;
@@ -361,7 +374,7 @@ foreign_expr_walker(Node *node,
 * can't be sent to remote because it might 
have incompatible
 * semantics on remote side.
 */
-   if (!is_builtin(fe-funcid))
+   if (!is_builtin(fe-funcid)  
(!is_in_extension(fe-funcid, fpinfo)))
return false;
 
/*
@@ -407,7 +420,7 @@ foreign_expr_walker(Node *node,
 * (If the operator is, surely its underlying 
function is
 * too.)
 */
-   if (!is_builtin(oe-opno))
+   if ( (!is_builtin(oe-opno))  
(!is_in_extension(oe-opno, fpinfo)) )
return false;
 
/*
@@ -445,7 +458,7 @@ foreign_expr_walker(Node *node,
/*
 * Again, only built-in operators can be sent 
to remote.
 */
-   if (!is_builtin(oe-opno))
+   if (!is_builtin(oe-opno)  
(!is_in_extension(oe-opno, fpinfo)))
return false;
 
/*
@@ -591,7 +604,7 @@ foreign_expr_walker(Node *node,
 * If result type of given expression is not built-in, it can't be sent 
to
 * remote because it might have incompatible semantics on remote side.
 */
-   if (check_type  !is_builtin(exprType(node)))
+   if (check_type  !is_builtin(exprType(node))  
(!is_in_extension(exprType(node), fpinfo)) )
return false;
 
/*
@@ -643,6 +656,8 @@ foreign_expr_walker(Node *node,
return true;
 }
 
+
+
 /*
  * Return true if given object is one of PostgreSQL's built-in objects.
  *
@@ -669,6 +684,67 @@ is_builtin(Oid oid)
 
 
 /*
+ * Returns true if given operator/function is part of an extension declared in 
the

Re: [HACKERS] Hashable custom types

2015-07-08 Thread Paul Ramsey

On July 8, 2015 at 1:36:49 PM, Tom Lane (t...@sss.pgh.pa.us) wrote:
Paul Ramsey pram...@cleverelephant.ca writes: 
 UNION will preferentially glom onto the btree equality operator, if memory  
 serves. If that isn't also the hash equality operator, things won't work  
 pleasantly.  

 So… what does that mean for types that have both btree and hash equality 
 operators? Don’t all the built-ins also have this problem?  

What I'm asking is why it would possibly be sensible to have different 
notions of equality for hash and btree. In every existing type that has 
both btree and hash opclasses, the equality members of those opclasses 
are *the same operator*. You don't really want UNION giving different 
answers depending on which implementation the planner happens to pick, 
do you? 
Well, that’s where things get pretty darn ugly. For reasons of practicality at 
the time, our equality btree operator ‘=‘ got tied to ‘bounding box equals’ way 
back in the mists of pre-history. (Basically the reasoning was “all our index 
work is about bounding boxes!”. I must admit, given my understanding of the zen 
of PostgreSQL now (and the features that PostgreSQL now has) that would not 
happen now.)

So… yeah. Probably ‘=‘ should be something else, probably something quite 
expensive but logically defensible  like a geometric equality test, but it’s 
not, and we have lots of third part stuff layered on top of us that expects 
existing semantics.

If getting the objects to UNION means that the btree and hash ops have to be 
the same, then that just means I’m SOL and will revert to my hack of just 
casting the objects to bytea to force them to UNION in the recursive CTE.

P.

Re: [HACKERS] Hashable custom types

2015-07-08 Thread Paul Ramsey


 It still says I lack the secret sauce...  

 ERROR: could not implement recursive UNION  
 DETAIL: All column datatypes must be hashable.  

UNION will preferentially glom onto the btree equality operator, if memory  
serves. If that isn't also the hash equality operator, things won't work  
pleasantly.  

So… what does that mean for types that have both btree and hash equality 
operators? Don’t all the built-ins also have this problem?

Re: [HACKERS] Hashable custom types

2015-07-08 Thread Paul Ramsey

On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca 
 wrote:
 Is it possible to make custom types hashable? There's no hook in the
 CREATE TYPE call for a hash function, but can one be hooked up
 somewhere else? In an operator?

 See 35.14.6., System Dependencies on Operator Classes

Coming back to this, I created an appropriate opclass...

CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry)
RETURNS boolean
AS '$libdir/postgis-2.2', 'lwgeom_hash_eq'
LANGUAGE 'c' IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION geometry_hash(geom1 geometry)
RETURNS integer
AS '$libdir/postgis-2.2', 'lwgeom_hash'
LANGUAGE 'c' IMMUTABLE STRICT;

-- Availability: 0.9.0
CREATE OPERATOR == (
LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = geometry_hash_eq,
COMMUTATOR = '==',
RESTRICT = contsel, JOIN = contjoinsel
);

CREATE OPERATOR CLASS hash_geometry_ops
  DEFAULT FOR TYPE geometry USING hash AS
  OPERATOR 1 == (geometry, geometry),
  FUNCTION 1 geometry_hash(geometry);

I even tested that it as an index!

   create index hashidx on points using hash ( the_geom_webmercator);
  CREATE INDEX

But when I run my recursive query...

WITH RECURSIVE find_cluster(cartodb_id, cluster_id, geom) AS (

(SELECT
  points.cartodb_id, points.cartodb_id as cluster_id,
points.the_geom_webmercator as geom
FROM points
WHERE points.cartodb_id in (select cartodb_id from points))
UNION
(SELECT
  pts.cartodb_id, n.cluster_id, pts.the_geom_webmercator as geom
FROM points pts
JOIN find_cluster n
ON ST_DWithin(n.geom, pts.the_geom_webmercator, 2)
WHERE n.cartodb_id  pts.cartodb_id)
)
SELECT * FROM find_cluster;

It still says I lack the secret sauce...

ERROR:  could not implement recursive UNION
DETAIL:  All column datatypes must be hashable.

What's the sauce?

Thanks!

P






 --
 Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Extension support for postgres_fdw

2015-06-20 Thread Paul Ramsey

I would like to enhance the postgres_fdw to allow more complete support for 
user-defined types. 

Right now, postgres_fdw already does a good job of passing user-defined type 
data back and forth, which is pretty nice. However, it will not pass functions 
or operators that use user-defined types to the remote host. For a extension 
like PostGIS, that means that spatial filters cannot be executed on remote 
servers, which makes FDW not so useful for PostGIS.  

I think the postgres_fdw extension should pass user-defined functions and 
operators, but only when it knows those functions and operators exist at the 
remote. One way would be to ask the remote what extensions it has, but the 
overhead of doing that is a bit high. A simpler way would be to just have the 
DBA declare what extensions the remote will have, when she creates the server 
definition, for example:  

CREATE SERVER fire_department_server  
  FOREIGN DATA WRAPPER postgres_fdw  
  OPTIONS (
    host       'fire.city.gov',  
    dbname     'infrastructure',  
    port       '5432',
    extensions 'postgis, seg'
);

Once the local host knows what extensions to expect on the remote side, it can 
retain functions and operators in those extensions in the set of remote 
restrictions and deparse them for use in the query of the remote. Basically, 
everywhere there is a call to is_builtin(Oid oid) now, there's also be a call 
to is_allowed_extension() (or somesuch) as well.  

There is a PostGIS-specific implementation of this concept here:  

  
https://github.com/pramsey/postgres/blob/9.4-postgres-fdw-postgis/contrib/postgres_fdw
  

If the approach above sounds OK, I'll genericize my PostGIS specific code to 
hand arbitrary extensions and submit a patch.  

Thanks!  

Paul  


--  
Paul Ramsey
http://cleverelephant.ca  
http://postgis.net




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GiST KNN Crasher

2015-05-21 Thread Paul Ramsey

I'm implementing the recheck functionality for PostGIS so we can
support it when 9.5 comes out, and came across this fun little
crasher.

This works:

select id, name from geonames order by geom -
'SRID=4326;POINT(-75.6163 39.746)'::geometry limit 10;

This crashes (just reversing the argument order to the - operator):

select id, name from geonames order by 'SRID=4326;POINT(-75.6163
39.746)'::geometry - geom limit 10;

The stack trace on crash looks like this:

* thread #1: tid = 0x8d2bb, 0x000100455247
postgres`create_indexscan_plan(root=0x7fbf8d005c80,
best_path=0x7fbf8b823600, tlist=0x7fbf8d0088d0,
scan_clauses=0x, indexonly='\0') + 1063 at
createplan.c:1354, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_ACCESS (code=1, address=0x24)

  * frame #0: 0x000100455247
postgres`create_indexscan_plan(root=0x7fbf8d005c80,
best_path=0x7fbf8b823600, tlist=0x7fbf8d0088d0,
scan_clauses=0x, indexonly='\0') + 1063 at
createplan.c:1354

frame #1: 0x0001004518c9
postgres`create_scan_plan(root=0x7fbf8d005c80,
best_path=0x7fbf8b823600) + 377 at createplan.c:360

frame #2: 0x00010044e749
postgres`create_plan_recurse(root=0x7fbf8d005c80,
best_path=0x7fbf8b823600) + 73 at createplan.c:248

frame #3: 0x00010044e691
postgres`create_plan(root=0x7fbf8d005c80,
best_path=0x7fbf8b823600) + 113 at createplan.c:209

frame #4: 0x000100460770
postgres`grouping_planner(root=0x7fbf8d005c80,
tuple_fraction=0.048008487900660838) + 4560 at planner.c:1736

frame #5: 0x00010045e1dd
postgres`subquery_planner(glob=0x7fbf8b823950,
parse=0x7fbf8b823060, parent_root=0x,
hasRecursion='\0', tuple_fraction=0, subroot=0x7fff5faf7028) +
2461 at planner.c:619

frame #6: 0x00010045d4a2
postgres`standard_planner(parse=0x7fbf8b823060, cursorOptions=0,
boundParams=0x) + 450 at planner.c:229

frame #7: 0x00010045d2d1
postgres`planner(parse=0x7fbf8b823060, cursorOptions=0,
boundParams=0x) + 81 at planner.c:157

frame #8: 0x00010054ab6c
postgres`pg_plan_query(querytree=0x7fbf8b823060, cursorOptions=0,
boundParams=0x) + 140 at postgres.c:809

frame #9: 0x00010054ac43
postgres`pg_plan_queries(querytrees=0x7fbf8d006230,
cursorOptions=0, boundParams=0x) + 115 at
postgres.c:868

frame #10: 0x00010054d920
postgres`exec_simple_query(query_string=0x7fbf8b821a38) + 800 at
postgres.c:1033

frame #11: 0x00010054cda2 postgres`PostgresMain(argc=1,
argv=0x7fbf8b8066d0, dbname=0x7fbf8b806538,
username=0x7fbf8b806518) + 2546 at postgres.c:4025

frame #12: 0x0001004b17fe
postgres`BackendRun(port=0x7fbf8b4079d0) + 686 at
postmaster.c:4162

frame #13: 0x0001004b0d90
postgres`BackendStartup(port=0x7fbf8b4079d0) + 384 at
postmaster.c:3838

frame #14: 0x0001004ad497 postgres`ServerLoop + 663 at postmaster.c:1594

frame #15: 0x0001004aab9c postgres`PostmasterMain(argc=3,
argv=0x7fbf8b407760) + 5644 at postmaster.c:1241

frame #16: 0x0001003e649d postgres`main(argc=3,
argv=0x7fbf8b407760) + 749 at main.c:221

frame #17: 0x7fff8ca915fd libdyld.dylib`start + 1

And my SQL declarations look like this:

#if POSTGIS_PGSQL_VERSION = 95

-- Availability: 2.2.0
CREATE OR REPLACE FUNCTION geography_knn_distance(geography, geography)
  RETURNS float8
  AS 'MODULE_PATHNAME','geography_distance'
  LANGUAGE 'c' IMMUTABLE STRICT
  COST 100;

-- Availability: 2.2.0
CREATE OPERATOR - (
  LEFTARG = geography, RIGHTARG = geography, PROCEDURE = geography_knn_distance,
  COMMUTATOR = '-'
);

-- Availability: 2.2.0
CREATE OR REPLACE FUNCTION geography_gist_distance(internal, geography, int4)
RETURNS float8
AS 'MODULE_PATHNAME' ,'gserialized_gist_distance'
LANGUAGE 'c';

#endif

-- Availability: 1.5.0
CREATE OPERATOR CLASS gist_geography_ops
DEFAULT FOR TYPE geography USING GIST AS
STORAGE gidx,
OPERATOR3 ,
-- OPERATOR6~= ,
-- OPERATOR7~ ,
-- OPERATOR8@ ,
#if POSTGIS_PGSQL_VERSION = 95
-- Availability: 2.2.0
OPERATOR13   - FOR ORDER BY pg_catalog.float_ops,
FUNCTION8geography_gist_distance (internal, geography, int4),
#endif
FUNCTION1geography_gist_consistent (internal, geography, int4),
FUNCTION2geography_gist_union (bytea, internal),
FUNCTION3geography_gist_compress (internal),
FUNCTION4geography_gist_decompress (internal),
FUNCTION5geography_gist_penalty (internal, internal, internal),
FUNCTION6geography_gist_picksplit (internal, internal),
FUNCTION7geography_gist_same (box2d, box2d, internal);


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] compress method for spgist - 2

2015-03-04 Thread Paul Ramsey

On Wed, Feb 25, 2015 at 6:13 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 In the original post on this, you mentioned that the PostGIS guys planned to
 use this to store polygons, as bounding boxes
 (http://www.postgresql.org/message-id/5447b3ff.2080...@sigaev.ru). Any idea
 how would that work?

Poorly, by hanging boxes that straddled dividing lines off the parent
node in a big linear list. The hope would be that the case was
sufficiently rare compared to the overall volume of data, to not be an
issue. Oddly enough this big hammer has worked in other
implementations at least passable well
(https://github.com/mapserver/mapserver/blob/branch-7-0/maptree.c#L261)

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hashable custom types

2014-04-25 Thread Paul Ramsey

When trying to write a recursive CTE using the PostGIS geometry type,
I was told this:

ERROR:  could not implement recursive UNION
DETAIL:  All column datatypes must be hashable.

Is it possible to make custom types hashable? There's no hook in the
CREATE TYPE call for a hash function, but can one be hooked up
somewhere else? In an operator?

Thanks,

P

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Traffic jams in fn_extra

2013-11-26 Thread Paul Ramsey

On Sunday, November 24, 2013 at 4:42 PM, Tom Lane wrote:
 The real question of course is whether transaction-level caching is
 appropriate for what they're storing. If they want only statement-level
 caching then using fn_extra is often the right thing.

I'm not certain it is... we get some great effects out of the statement level 
stuff, and it really works great except for those cases where somebody else is 
already taking the slot (SRF_*)
 
 Also note that having the cache go away is the easy part. The hard part
 is knowing whether you've created it yet in the current transaction, and
 finding it if you have. The usual method is to keep a static variable
 pointing to it, and plugging into the transaction cleanup callback
 mechanism with a routine that'll reset the pointer to NULL at transaction
 end. For examples, look for callers of RegisterXactCallback().
 

I'm glad you said that, because I felt too stoopid to ask :) my previous 
spelunking through memory contexts showed that it's easy to get the memory, 
hard to find it again. Which is why fn_extra is so appealing, it's just sitting 
there, and the parent context goes away at the end of the query, so it's 
wonderfully simple to use... if there's not already someone in it.

Thanks for the advice, it could be very helpful for my pointcloud work, since 
having access to a cache object during SRF_* stuff could improve performance 
quite a bit, so the complexity trade-off of using the transactional context 
could be well worth it.

P.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Traffic jams in fn_extra

2013-11-24 Thread Paul Ramsey

Hi Simon,

We do the dance because it’s how we always have and don’t know any other way, 
any better way. :) The usual explanation. Is there any place you can point to 
that demonstrates your technique?  

Thanks!

P

--  
Paul Ramsey
http://cleverelephant.ca/
http://postgis.net/


On Sunday, November 24, 2013 at 8:21 AM, Simon Riggs wrote:

 On 19 November 2013 23:08, Paul Ramsey pram...@cleverelephant.ca 
 (mailto:pram...@cleverelephant.ca) wrote:
  
  On the solution, I wasn't suggesting another void* slot, but rather a
  slot that holds a hash table, so that an arbitrary number of things
  can be stuffed in. Overkill, really, since in 99.9% of times only one
  thing would be in there, and in the other 0.1% of times two things. In
  our own GenericCacheCollection, we just statically allocate 16 slots.
  
  
  
 Why do you need to do this dance with fn_extra?
  
 It's possible to allocate a hash table in a Transaction-lifetime
 memory context on first call into a function then cache things there.
  
 --  
 Simon Riggs http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Traffic jams in fn_extra

2013-11-19 Thread Paul Ramsey

As an extension with a lot of CPU load, we (postgis) tend to use 
flinfo-fn_extra a lot, for caching things that are intensive to calculate at 
the start of a query and reuse throughout subsequent functions calls. 

- coordinate projection objects
- indexes of the edges of large geometries
- other kinds of indexes of the edges of large geometries :)
- schemas of lidar pointcloud collections

As we've added different kinds of caching, in our own project, we've banged up 
against problems of multiple functions trying to stuff information into the 
same pointer, and ended up putting an extra container of our own into fn_extra, 
to hold the different kinds of stuff we might want to store, a 
GenericCacheCollection

https://github.com/postgis/postgis/blob/svn-trunk/libpgcommon/lwgeom_cache.c#L46-L48

As (by now) a connoisseur of fn_extra caching, I've noticed while reading bits 
of PgSQL code that there are far more places that stuff state into fn_extra 
than I ever knew, and that they do so without any substantial concern that 
other state might already be there. (Well, that's not true, they usually check 
for NULL and they give up if fn_extra is already in use.) The range types, I 
was surprised to find doing some performance caching in fn_extra. The 
set-returning function macros use it to hold state. And many others I'm sure.

Would it be good/wise to add another, better managed, slot to flinfo, one that 
isn't just void* but is a hash? (Or, has some management macros to handle it 
and make it a hash* if it's null, whatever API makes sense) so that multiple 
bits of code can cache state over function calls without banging into one 
another?

flinfo-fn_extra_hash perhaps?

If this sounds OK, I'd be honored to try and make it my first submission to 
PgSQL.

P. 

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Traffic jams in fn_extra

2013-11-19 Thread Paul Ramsey

On Tue, Nov 19, 2013 at 7:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Paul Ramsey pram...@cleverelephant.ca writes:
 As we've added different kinds of caching, in our own project, we've banged 
 up against problems of multiple functions trying to stuff information into 
 the same pointer, and ended up putting an extra container of our own into 
 fn_extra, to hold the different kinds of stuff we might want to store, a 
 GenericCacheCollection

 TBH, I fail to understand what you're on about here.  Any one function
 owns the value of fn_extra in calls to it, and is solely responsible for
 its usage; furthermore, there's no way for any other code to mangle that
 pointer unless the owner explicitly makes it available.  So where is
 the problem?  And if there is a problem, how does adding another field
 of exactly the same kind make it better?

Right, sorry, I'm reasoning overly aggressively from the specific to
the general. The specific problems have been

- two lines of geometry caching code, either of which can be called
within a single function, depending on the inputs, which mostly didn't
result in conflicts, except when they did, which was eventually
rectified by layering a GenericCacheCollection into the function
- a cached lidar schema object which would have been very useful to
have around in an SRF, but couldn't because the SRF needed the
fn_extra slot

The first case is an application-specific problem, and since we've
coded around it, the only advantage to a pgsql-specific fix would be
to allow others who also need to cache several independent things to
not reinvent that wheel.

The second case is one of the instances where the pgsql code itself is
getting in the way and cannot be worked around at the application
level. My solution was just not to do caching for that function.

So, that's what I perceive the problem to be. Now that you point it
out to me, yes, it's pretty small bore stuff.

On the solution, I wasn't suggesting another void* slot, but rather a
slot that holds a hash table, so that an arbitrary number of things
can be stuffed in. Overkill, really, since in 99.9% of times only one
thing would be in there, and in the other 0.1% of times two things. In
our own GenericCacheCollection, we just statically allocate 16 slots.

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-24 Thread Paul Ramsey

On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote:
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 I believe it is time to start adding parallel execution to the backend. 
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes. I think it is time we
 start to consider additional options
 
 

 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries. The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
I just got out of a meeting that included Oracle Spatial folks, who
were boasting of big performance increases in enabling parallel query
on their spatial queries. Basically the workloads on things like big
spatial joins are entirely CPU bound, so they are seeing that adding
15 processors makes things 15x faster. Spatial folks would love love
love to see parallel query execution.

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

Re: [HACKERS] [BUGS] BUG #6379: SQL Function Causes Back-end Crash

2012-01-04 Thread Paul Ramsey

Further notes, from Andrew (RhodiumToad) on IRC about the cause of this crasher:

[12:03pm] RhodiumToad: what happens is this
[12:04pm] RhodiumToad: postquel_start know this statement doesn't
return the result, so it supplies None_Receiver as the dest-receiver
for the query
[12:04pm] RhodiumToad: however, it knows it's a plannedStmt, so it
fires up the full executor to run it
[12:05pm] RhodiumToad: and the executor allocates a new destreceiver
in its own memory context, replaces es-qd-dest with it,
[12:05pm] RhodiumToad: (the new destreceiver is the one that writes
tuples to the created table)
[12:06pm] RhodiumToad: then at executorEnd (called from postquel_end),
executor shutdown closes the new rel, _and then frees the executor's
memory context, including the destreceiver it created
[12:07pm] RhodiumToad: postquel_end doesn't know that its setting of
-dest was clobbered, so it goes to try and destroy it again, and gets
garbage (if assertions are on)
[12:07pm] RhodiumToad: if assertions weren't on, then the rDestroy
call is harmless
[12:07pm] RhodiumToad: well, mostly harmless
[12:07pm] RhodiumToad: sneaky one, that
[12:09pm] RhodiumToad: you can confirm it by tracing through that
second call to postquel_end and confirming that it's the call to
ExecutorEnd that stomps the content of qd-dest
[12:12pm] pramsey: confirmed, the pass through ExecutorEnd has
clobbered the value so there's garbage when it arrives at line 638
[12:14pm] RhodiumToad: if you trace through ExecutorEnd itself, it
should be the FreeExecutorState that does it
[12:15pm] RhodiumToad: wonder how far back this bug goes
[12:16pm] RhodiumToad: actually not very far
[12:17pm] RhodiumToad: older versions just figured that qd-dest was
always None_Receiver and therefore did not need an rDestroy call
[12:17pm] RhodiumToad: (which is a no-op for None_Receiver)
[12:17pm] pramsey: kills my 8.4
[12:17pm] RhodiumToad: so this is broken in 8.4+
[12:17pm] pramsey: ah
[12:18pm] RhodiumToad: 8.4 introduced the lazy-eval of selects in sql functions
[12:19pm] RhodiumToad: prior to that they were always run immediately
to completion
[12:19pm] RhodiumToad: that requires juggling the destreceiver a bit,
hence the bug
[12:20pm] RhodiumToad: btw, the first statement of the function
shouldn't be needed
[12:21pm] RhodiumToad: just  ... as $f$ create table foo as select 1
as x; $f$;  should be enough to break it
[12:31pm] RhodiumToad: there's no trivial fix


On Wed, Jan 4, 2012 at 11:32 AM, Paul Ramsey pram...@cleverelephant.ca wrote:
 One extra detail, my PostgreSQL is compiled with --enable-cassert.
 This is required to set off the killer function.

 On Wed, Jan 04, 2012 at 07:17:17PM +, pram...@cleverelephant.ca wrote:
 The following bug has been logged on the website:

 Bug reference:      6379
 Logged by:          Paul Ramsey
 Email address:      pram...@cleverelephant.ca
 PostgreSQL version: 9.1.2
 Operating system:   OSX 10.6.8
 Description:

 CREATE OR REPLACE FUNCTION kill_backend()
 RETURNS VOID
 AS $$
   DROP TABLE if EXISTS foo;
   CREATE TABLE foo AS SELECT * FROM pg_class LIMIT 1;
 $$ LANGUAGE 'sql';

 SELECT kill_backend();


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist - 0.8

2010-10-16 Thread Paul Ramsey


   (And, if we are going to break everything
 in sight, now would be a good time to think about changing typmod to
 something more flexible than one int32.)

As someone who is jamming geometry type, spatial reference number and 
dimensionality into said 32bit typmod, let me say emphatically ... Amen!

P

Re: [HACKERS] knngist - 0.8

2010-10-16 Thread Paul Ramsey

On Sat, Oct 16, 2010 at 10:17 AM, Peter Eisentraut pete...@gmx.net wrote:
 On lör, 2010-10-16 at 09:23 -0700, Paul Ramsey wrote:
    (And, if we are going to break everything
  in sight, now would be a good time to think about changing typmod to
  something more flexible than one int32.)

 As someone who is jamming geometry type, spatial reference number and
 dimensionality into said 32bit typmod, let me say emphatically ...
 Amen!

 So what kind of data structure would you like for a typmod?

I'm a primitive enough beast that just having 64-bits would make me
happy. As a general matter though, a bytea?

P

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] KNNGiST for knn-search (WIP)

2010-01-04 Thread Paul Ramsey

I'm sure whatever conclusion -hackers comes to in the end will be the
best for pgsql, and I'll be supportive. But until then, let me note
from the PostGIS point-of-view: sure would be great to get this in for
8.5 :)

P.

On Thu, Dec 31, 2009 at 4:26 AM, Greg Stark gsst...@mit.edu wrote:
 On Wed, Dec 30, 2009 at 4:56 PM, Robert Haas robertmh...@gmail.com wrote:

 From my point of view, what makes a patch invasive is the likelihood
 that it might break something other than itself.  For example, your
 patch touches the core planner code and the core GIST code, so it
 seems possible that adding support for this feature might break
 something else in one of those areas.

 It doesn't seem obvious to me that this is a high-risk patch. It's
 touching the planner which is tricky but it's not the kind of massive
 overhaul that touches every module that HOT or HS were.  I'm really
 glad HS got in before the end because lots of people with different
 areas of expertise and different use cases are going to get to
 exercise it in the time remaining. This patch I would expect
 relatively few people to need to try it out before any oversights are
 caught.

 --
 greg

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER in 8.3.5 on GIST indexes loses all rows

2008-12-05 Thread Paul Ramsey

FYI, this is not limited to PostGIS GIST, it appears to be all GIST
(gid is an integer column and I have loaded btree_gist):

pramsey=# create table ttt as select * from counties;
SELECT
pramsey=# create index gid_bix on ttt using gist (gid);
CREATE INDEX
pramsey=# select count(*) from ttt;
 count
---
  3141
(1 row)

pramsey=# cluster ttt using gid_bix;
CLUSTER
pramsey=# select count(*) from ttt;
 count
---
 0
(1 row)

pramsey=# select version();
   version
--
 PostgreSQL 8.3.5 on i386-apple-darwin9.5.0, compiled by GCC
i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465)
(1 row)


On Fri, Dec 5, 2008 at 6:05 AM, Gregory Stark [EMAIL PROTECTED] wrote:

 Mark Cave-Ayland [EMAIL PROTECTED] writes:

 So in other words, the contents of the temporary table has just disappeared 
 :(

 Uhm. That rather sucks. I was able to reproduce it too.

 It seems to happen after I pause for a bit, and not when I run the script in
 fast succession.

 --
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Access to Row ID information in Functions

2008-04-01 Thread Paul Ramsey

In PostGIS, we have a problem, in that spatial operations are very
costly, CPUwise.

We have hit on a nifty enhancement recently, which was to recognize
that when processing multiple rows, in joins or with literal
argouments, for most functions of the form GeometryOperation(A, B), A
(or B) tended to remain constant, while the other argument changed.
That meant that we could build an optimized form of the
more-constant argument (using internal index structures on the object
segments) that allows testing the changing argument much more quickly.

The optimized form gets cached and retrieved from a memory context.
Each time the function is run within a statement it checks the cache,
and sees if one of its arguments are the same as the last time around.
If so, it uses the prepared version of that argument. If not, it
builds a new prepared version and caches that.

The key here is being able to check the identify of the arguments...
is this argument A the same as the one we processed last time? One way
is to do a memcmp.  But it seems likely that PgSQL knows exactly
whether it is running a nested loop, or a literal, and could tell
somehow that argument A is the same with each call.

For lack of a better term, if we knew what the row id of each
argument was as the function was called, we could skip the memcmp
testing of geometric identity (which gets more expensive precisely at
the time our optimization gets more effective, for large arguments)
and just trust the row id as the guide of when to build and cache new
optimized representations.

Any guidance?

Thanks,

Paul

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Access to Row ID information in Functions

2008-04-01 Thread Paul Ramsey

Thanks Tom,

Yes, we've discussed adding some kind of optional identity information
to the object, it remains a potential course of action.

Paul

On Tue, Apr 1, 2008 at 2:37 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Paul Ramsey [EMAIL PROTECTED] writes:
   The optimized form gets cached and retrieved from a memory context.
   Each time the function is run within a statement it checks the cache,
   and sees if one of its arguments are the same as the last time around.
   If so, it uses the prepared version of that argument. If not, it
   builds a new prepared version and caches that.

   The key here is being able to check the identify of the arguments...
   is this argument A the same as the one we processed last time? One way
   is to do a memcmp.  But it seems likely that PgSQL knows exactly
   whether it is running a nested loop, or a literal, and could tell
   somehow that argument A is the same with each call.

  Not really.  Certainly there's no way that that information would
  propagate into function calls.

  In the special case where your argument is a literal constant, I think
  there is enough information available to detect that that's the case
  (look at get_fn_expr_argtype).  But if it's not, there's no very good
  way to know whether it's the same as last time.

  Perhaps it would be worth changing your on-disk storage format to allow
  cheaper checking?  For instance include a hash value.

 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Where to find kind code for STATISTIC_KIND GEOMETRY?

2007-05-04 Thread Paul Ramsey

Just take 150-199, and submit a patch to HACKERS to updates the comment 
in pg_statistic appropriately. I am sure the it will be some time before 
we invent another 49 kinds of selectivity statistic.


P

Ale Raza wrote:

Tom,

What numbers you can reserve for our geometry type, 200 - 299?

Ale.

-Original Message-
From: Paul Ramsey [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 01, 2007 3:58 PM

To: Ale Raza; pgsql-hackers@postgresql.org
Cc: John Baleja; PostGIS Development Discussion
Subject: Re: [HACKERS] Where to find kind code for STATISTIC_KIND
GEOMETRY?

For all that Tom reserved 100 numbers for us, we're only using one right

now.

   lwgeom_estimate.c:47:#define STATISTIC_KIND_GEOMETRY 100

Paul


Alvaro Herrera wrote:

Ale Raza wrote:

Simon,

I am forwarding this to pgsql-hackers. I can send the requirements

once

I get the right contact.

This is it, but as Simon stated, you probably want to get the PostGIS
guys involved too, so that duplicates can be sorted out.







--

  Paul Ramsey
  Refractions Research
  http://www.refractions.net
  [EMAIL PROTECTED]
  Phone: 250-383-3022
  Cell: 250-885-0632

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Where to find kind code for STATISTIC_KIND GEOMETRY?

2007-05-01 Thread Paul Ramsey

For all that Tom reserved 100 numbers for us, we're only using one right 
now.


  lwgeom_estimate.c:47:#define STATISTIC_KIND_GEOMETRY 100

Paul


Alvaro Herrera wrote:

Ale Raza wrote:

Simon,

I am forwarding this to pgsql-hackers. I can send the requirements once
I get the right contact.


This is it, but as Simon stated, you probably want to get the PostGIS
guys involved too, so that duplicates can be sorted out.




--

  Paul Ramsey
  Refractions Research
  http://www.refractions.net
  [EMAIL PROTECTED]
  Phone: 250-383-3022
  Cell: 250-885-0632

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Modifying and solidifying contrib

2007-02-07 Thread Paul Ramsey


Jim Nasby wrote:

In addition to Martijn's tsearch case, there's also PostGIS. And I 
believe this is a pretty big pain for them.


Hear hear! It would be nice to dump from an old PostgreSQL/PostGIS 
combination and restore to a new version combination, without taking all 
the function definitions along for a ride in the dump process.  What we 
really want is just the data.


--

  Paul Ramsey
  Refractions Research
  http://www.refractions.net
  [EMAIL PROTECTED]
  Phone: 250-383-3022
  Cell: 250-885-0632

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] uuid type for postgres

2005-09-06 Thread Paul Ramsey


Just an FYI:

We also ended up rolling our own uuid type, against libuuid.  It  
seems that uuid is a widespread enough concept that implementors  
*will* be asked to support it, moderately often.  We *could* roll our  
own (were capable), others are not so lucky, and would have to point  
out the lack of a uuid type as a limitation of pgsql.  Which is too  
bad, given how relatively simple they are.


That said:

- linking against libuuid is fine for a contrib/ extension, but no  
good for a built-in type.  A real uuid would have to do a proper  
independent implementation of uuid creation within pgsql.
- we cannot snarf libuuid code, it is LGPL (though perhaps the author  
would re-license. if that is the *only* objection, it is well worth  
asking)


I think having a built-in uuid type is something that a large number  
of people will use.  Whether they use it will or badly is not our  
problem.  It is possible to build crappy databases with all the types  
that already exist, adding uuid is hardly going to bring the walls  
down.  Having uuid removes another excuse for people not doing a  
pgsql implementation.


I am not sure if I heard clearly from the core team that a self- 
contained, BSD-licensed uuid would be accepted(able)?  If so, I'll  
contact the libuuid author about a re-license (shortest path from A  
to B).


P.

On 6-Sep-05, at 6:50 AM, nathan wagner wrote:




I have been in need of a uuid type and ran across the pguuid download
by Xiongjian (Mike) Wang.  This wasn't really useful to me for two
reasons: first, it is GPLed and I would prefer a more liberal license,
secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a
SIOCGETIFHWADDR (? i think, i can get the exact name if you want it)
ioctl() under darwin.

While I could dike out the code that calls it, that seems like a  
suboptimal

solution.  So after a bit of poking around the interweb i ran across
Ralf Engelschall's ossp uuid library.  This compiled with minimal
effort on mac os.  Some reading, and an evening later, i've made
a server plugin with supporting SQL that implements an 'ossp_uuid'
type.

Now i have four questions:

1: Is it feasible for this to be included in the contrib section of
the regular download?  The uuid library is a notice of copyright  
style

license, and I am willing to put my own code into the public domain.

2: Would just calling the type 'uuid' be better than 'ossp_uuid'?   
It's

certainly a nicer name.

3: Would it be possible to include such a type as a postgres extension
to the usual SQL types.  It seems to me that having an officially
supported type would be better than a user contributed type on the  
grounds

that you could then rely on it being avaiable if postgres was.
In particular, installing it as an extension would require the  
cooperation

of the DBA, which may be infeasible in some environments.

--
Nathan Wagner






---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] $libdir and 8.0

2004-08-18 Thread Paul Ramsey

Short bug report:
When installing PgSQL into a non-standard location (like /opt/foo) the 
configure script decides to install all the contrib libraries and 
plpglsq into /opt/foo/lib/postgresql.  This would be fine, except that 
backend does not recognize this directory as a place to be searched for 
$libdir (perhaps it is referencing the $libdir macro instead of the 
$pkglibdir macro?).  So tools like 'createlang' fail, and loading .sql 
files that reference things like $libdir/libfoo.so also fail.

Paul
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] $libdir and 8.0

2004-08-18 Thread Paul Ramsey

The expansions in the build scripts all seem correct, and in fact, all 
the libraries are installed in the right place, both the system stuff 
(plpgsql.so) and the contrib stuff (libpostgis.so) ends up in 
/opt/foo/lib/postgresql. It is the actual binaries that seem to not know 
where $libdir is supposed to be.

prefix := /home/pramsey/pgtest/8.0
exec_prefix := ${prefix}
libdir := ${exec_prefix}/lib
pkglibdir = $(libdir)
ifeq $(findstring pgsql, $(pkglibdir)) 
ifeq $(findstring postgres, $(pkglibdir)) 
override pkglibdir := $(pkglibdir)/postgresql
endif
endif
Bruce Momjian wrote:
Devrim GUNDUZ wrote:
Hi,
On Wed, 18 Aug 2004, Paul Ramsey wrote:

When installing PgSQL into a non-standard location (like /opt/foo) the 
configure script decides to install all the contrib libraries and 
plpglsq into /opt/foo/lib/postgresql.  This would be fine, except that 
backend does not recognize this directory as a place to be searched for 
$libdir (perhaps it is referencing the $libdir macro instead of the 
$pkglibdir macro?).  So tools like 'createlang' fail, and loading .sql 
files that reference things like $libdir/libfoo.so also fail.
I'm not sure but if you add /opt/foo/lib/postgresql to /etc/ld.so.conf and 
run ldconfig, it might work.

I checked in the code and $libdir should expand to $(pkglibdir) as
determined by configure.  What value to you show for that in
your Makefile.global?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] $libdir and 8.0

2004-08-18 Thread Paul Ramsey

Check this out!
[EMAIL PROTECTED] bin]$ ./pg_config --pkglibdir
/home/pramsey/pgtest/8.0/bin/lib/postgresql
 ^^^
And yet:
./port/pg_config_paths.h:#define PKGLIBDIR 
/home/pramsey/pgtest/8.0/lib/postgresql

Could the problem be here? (port/path.c):
/*
 *  get_pkglib_path
 */
void
get_pkglib_path(const char *my_exec_path, char *ret_path)
{
const char *p;
if ((p = relative_path(PGBINDIR, PKGLIBDIR)))
make_relative(my_exec_path, p, ret_path);
else
StrNCpy(ret_path, PKGLIBDIR, MAXPGPATH);
canonicalize_path(ret_path);
}
Bruce Momjian wrote:
Devrim GUNDUZ wrote:
Hi,
On Wed, 18 Aug 2004, Paul Ramsey wrote:

When installing PgSQL into a non-standard location (like /opt/foo) the 
configure script decides to install all the contrib libraries and 
plpglsq into /opt/foo/lib/postgresql.  This would be fine, except that 
backend does not recognize this directory as a place to be searched for 
$libdir (perhaps it is referencing the $libdir macro instead of the 
$pkglibdir macro?).  So tools like 'createlang' fail, and loading .sql 
files that reference things like $libdir/libfoo.so also fail.
I'm not sure but if you add /opt/foo/lib/postgresql to /etc/ld.so.conf and 
run ldconfig, it might work.

I checked in the code and $libdir should expand to $(pkglibdir) as
determined by configure.  What value to you show for that in
your Makefile.global?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] $libdir and 8.0

2004-08-18 Thread Paul Ramsey

I am afraid it gets crazier...
I put /home/pramsey/pgtest/8.0/bin on my $PATH
Now:
[EMAIL PROTECTED] 8.0]$ pg_config --pkglibdir
/home/pramsey/pgtest/8.0/lib/postgresql
Hm, correct.
[EMAIL PROTECTED] 8.0]$ ./bin/pg_config --pkglibdir
/home/pramsey/pgtest/8.0/./lib/postgresql
Also correct, but getting strangely relative.
[EMAIL PROTECTED] 8.0]$ cd bin
[EMAIL PROTECTED] bin]$ ./pg_config --pkglibdir
/home/pramsey/pgtest/8.0/bin/lib/postgresql
Now incorrect. The answer depends on where I am asking from.
[EMAIL PROTECTED] bin]$ ./pg_config --bindir
/home/pramsey/pgtest/8.0/bin/.
Using the direct request. Using the $PATH assisted request:
[EMAIL PROTECTED] bin]$ pg_config --bindir
/home/pramsey/pgtest/8.0/bin
Entirely correct.
Problem solved? No...
[EMAIL PROTECTED] bin]$ createlang plpgsql test
ERROR:  could not access file $libdir/plpgsql: No such file or directory
createlang: language installation failed: ERROR:  could not access file 
$libdir/plpgsql: No such file or directory

If I copy plpgsql.so into /home/pramsey/pgtest/8.0/lib then things work.
Bruce Momjian wrote:
Ah, what is your $bindir?  Is it /home/pramsey/pgtest/8.0/bin/postgresql?
---
Paul Ramsey wrote:
Check this out!
[EMAIL PROTECTED] bin]$ ./pg_config --pkglibdir
/home/pramsey/pgtest/8.0/bin/lib/postgresql
 ^^^
And yet:
./port/pg_config_paths.h:#define PKGLIBDIR 
/home/pramsey/pgtest/8.0/lib/postgresql

Could the problem be here? (port/path.c):
/*
 *  get_pkglib_path
 */
void
get_pkglib_path(const char *my_exec_path, char *ret_path)
{
const char *p;
if ((p = relative_path(PGBINDIR, PKGLIBDIR)))
make_relative(my_exec_path, p, ret_path);
else
StrNCpy(ret_path, PKGLIBDIR, MAXPGPATH);
canonicalize_path(ret_path);
}
Bruce Momjian wrote:
Devrim GUNDUZ wrote:

Hi,
On Wed, 18 Aug 2004, Paul Ramsey wrote:

When installing PgSQL into a non-standard location (like /opt/foo) the 
configure script decides to install all the contrib libraries and 
plpglsq into /opt/foo/lib/postgresql.  This would be fine, except that 
backend does not recognize this directory as a place to be searched for 
$libdir (perhaps it is referencing the $libdir macro instead of the 
$pkglibdir macro?).  So tools like 'createlang' fail, and loading .sql 
files that reference things like $libdir/libfoo.so also fail.
I'm not sure but if you add /opt/foo/lib/postgresql to /etc/ld.so.conf and 
run ldconfig, it might work.

I checked in the code and $libdir should expand to $(pkglibdir) as
determined by configure.  What value to you show for that in
your Makefile.global?



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] $libdir and 8.0

2004-08-18 Thread Paul Ramsey

Sounds like we are getting close. From strace:
stat64(/home/pramsey/pgtest/8.0/bin/lib/postgresql/plpgsql, 
0xbfffe480) = -1 ENOENT (No such file or directory)
stat64(/home/pramsey/pgtest/8.0/bin/lib/postgresql/plpgsql.so, 
0xbfffe480) = -1 ENOENT (No such file or directory)
stat64($libdir/plpgsql, 0xbfffe540)   = -1 ENOENT (No such file or 
directory)

I started the database from /home/pramsey/pgtest/8.0/bin using 
./pg_ctl start -D /home/pramsey/pgtest/8.0/data

Judging from the behavior I have posted to the list of pg_config 
--pkglibdir, if I started it from any *other* location at all then 
things would have worked.

Paul
Tom Lane wrote:
Paul Ramsey [EMAIL PROTECTED] writes:
... It is the actual binaries that seem to not know 
where $libdir is supposed to be.

Where do they think it is?
A useful way to check would be to strace the backend while you're
executing createlang or another command that tries to load a
dynamic library.  The file names that it tries to stat() would be
good evidence.
Also, how are you starting the postmaster?  I believe that current
sources will try to locate $libdir via a relative path from where
the postmaster executable is.  We've fixed some bugs in that area
recently, but maybe there are more.
			regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] $libdir and 8.0

2004-08-18 Thread Paul Ramsey

Things definately do not like to be run from within bin. I added 
/home/pramsey/pgtest/8.0/bin to my PATH and then ran everything without 
a direct path reference:

[EMAIL PROTECTED] 8.0]$ pg_ctl start -D /home/pramsey/pgtest/8.0/data
postmaster starting
LOG:  database system was shut down at 2004-08-18 22:00:32 PDT
LOG:  checkpoint record is at 0/B2797C
LOG:  redo record is at 0/B2797C; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 628; next OID: 17263
LOG:  database system is ready
[EMAIL PROTECTED] 8.0]$ createdb test
CREATE DATABASE
[EMAIL PROTECTED] 8.0]$ createlang plpgsql test
[EMAIL PROTECTED] 8.0]$
Magically, everything now works :)
Tom Lane wrote:
Paul Ramsey [EMAIL PROTECTED] writes:
... It is the actual binaries that seem to not know 
where $libdir is supposed to be.
Where do they think it is?
A useful way to check would be to strace the backend while you're
executing createlang or another command that tries to load a
dynamic library.  The file names that it tries to stat() would be
good evidence.
Also, how are you starting the postmaster?  I believe that current
sources will try to locate $libdir via a relative path from where
the postmaster executable is.  We've fixed some bugs in that area
recently, but maybe there are more.
regards, tom lane
---(end of broadcast)---
TIP 8: explain analyze is your friend

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] $libdir and 8.0

2004-08-18 Thread Paul Ramsey

Thanks Tom,
Yes, this is beta1, not the CVS tip, so all is golden.
Paul
Tom Lane wrote:
Paul Ramsey [EMAIL PROTECTED] writes:
I started the database from /home/pramsey/pgtest/8.0/bin using 
./pg_ctl start -D /home/pramsey/pgtest/8.0/data

Ah-hah ... and this is 8.0beta1, right, not anything later?
I fixed some problems associated with . and .. in the executable's
path last week.  I think if you pull the current nightly snapshot or
CVS you will find the problem gone.
			regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Default Locale in initdb

2004-06-02 Thread Paul Ramsey

Just because it is not new does not mean that it is good.
When this new behavior was introduced, and I migrated our databases to 
the new PgSQL version (dump/restore), the locale of all my databases 
were silently changed from C to US_en. This broke one application in a 
very subtle way because of slightly different sort behavior in the 
different locale. Tracking it down was quite tricky.

PgSQL was just a little too helpful in this case.
Andrew Dunstan wrote:
[EMAIL PROTECTED] wrote:
Is it me or has the default locale of created databases change at some 
point?

Currently, on Linux, if one does not specify a locale, the locale is 
taken
from the system environment and it is not C.

While I can both sides of a discussion, I think that choosing a locale
without one being specified is a bad idea, even if it is the locale of 
the
machine. The reason why it is a bad idea is that certain features of the
database which only work correctly with a locale of C will not work by
default.
This is not new behaviour.
(Why are you the only person who posts here who is nameless?)
cheers
andrew

--
  __
 /
 | Paul Ramsey
 | Refractions Research
 | Email: [EMAIL PROTECTED]
 | Phone: (250) 885-0632
 \_
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-26 Thread Paul Ramsey

Tom Lane wrote:
of dead tuples followed by a lot of pages worth of live tuples.  Plain
VACUUM cannot do much to fix this since it doesn't move rows around.
VACUUM FULL will fix it, but its index-update overhead is high enough
that CLUSTER is a better deal.
Tom: I was interested in performance improvements from cluster, so I 
tried to cluster a table on a spatial index:

dra_working=# \d geomtest
Table public.geomtest
 Column |   Type   | Modifiers
+--+---
 rd_segment_id  | integer  |
 admit_date | date |
 retire_date| date |
 most_recent| boolean  |
 lineargeometry | geometry |
Indexes: geomtest_idx gist (lineargeometry)
dra_working=# cluster geomtest_idx on geomtest;
ERROR:  CLUSTER: cannot cluster when index access method does not handle 
nulls
You may be able to work around this by marking column lineargeometry 
NOT NULL
dra_working=# select version();
  version
---
 PostgreSQL 7.3.6 on i686-pc-linux-gnu

As of quite a while ago (7.2?) the GiST access method was made null-safe 
by Teodor and Oleg, I think. Is this a safety wrapper left over from 
before the upgrade to GiST?

--
  __
 /
 | Paul Ramsey
 | Refractions Research
 | Email: [EMAIL PROTECTED]
 | Phone: (250) 885-0632
 \_
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] PostgreSQL pre-fork speedup

2004-05-03 Thread Paul Ramsey

sdv mailer wrote:
Instead, there's a big need to
create a new connection on
every query and with PostgreSQL needing to fork on
every incoming connection
can be quite slow.
Really? My general experience has beent that forking/connection setup 
times are very good with PgSQL. Do not assume your Oracle experience 
transfers directly over -- Oracle has very large connection time 
overheads, PgSQL does not.

This could be a big win since even a moderate
improvement at the connection
level will affect almost every user. Any chance of
that happening for 7.5?
Only if you do it yourself, probably. The calculation of the developers 
appears to be that the amount of time spent by the database on 
fork/connect will generally be dwarfed by the amount of time spent by 
the database actually doing work (this being a database, the actual 
workloads required of the backend are much higher than, say, for a web 
server). So the operational benefit of adding the complexity of a 
pre-fork system is not very high. And if you have the rare workload 
where a pre-fork actually *would* speed things up a great deal, you can 
solve the problem yourself with a connection-pooling middleware.

--
  __
 /
 | Paul Ramsey
 | Refractions Research
 \_
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: PostGIS dropgeometrycolumn function (Was: Re: [HACKERS] [7.4]

2004-02-03 Thread Paul Ramsey

Bitter experience... I am going to cc Dave here, because I could swear 
we went through many conniptions trying to make this work.

And yet I just did this:

create view mytables as select relname from pg_class where relam = 0 
and relname not like 'pg_%';

And it seems to work fine.

Oh, now I remember. The deal was not views, it was triggers. Since our 
geometry_columns contains some information not available via a query on 
existing data, a trigger was what we wanted, so we could harvest the 
information from a variety of places, and have some spare columns for 
things like the geometry selectivity stats.

Paul

On Tuesday, February 3, 2004, at 11:00 AM, Tom Lane wrote:

Paul Ramsey [EMAIL PROTECTED] writes:
In an idea world though, we would construct the thing as a view, so
that when you did a CREATE TABLE that included a geometry type, you
would automatically get a row in geometry_columns. That requires a 
view
on system tables though, and that just does not work. :/
Uh, what makes you say it doesn't work?

			regards, tom lane

 Paul Ramsey
 Refractions Research
 Email: [EMAIL PROTECTED]
 Phone: (250) 885-0632
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Multicolumn Indexing using R-Tree

2004-02-03 Thread Paul Ramsey

Try using GiST rtree (examples in contrib), GiST supports multi-key 
indexes.

On Tuesday, February 3, 2004, at 06:56 AM, Marcio Caetano wrote:

I'm using PostgreSQL 7.3.2 and I need to create a R-Tree index that
uses more than one column in a table.
When I run the instruction it appears this message bellow:

DefineIndex: access method rtree does not support multi-column
indexes
How can I solve this problem ?
Is it a limitation of PostgreSQL or the R-Tree concept  ?
Thank you in advance.

Márcio Caetano
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

[HACKERS] PostGIS Integration

2004-02-03 Thread Paul Ramsey

Actually, in my wet dream, we stored everything in system tables. 
Dimensionality and SRID became parameters of the geometry, the 
selectivity stats lived in the system stats table (as Mark's patch 
should hopefully do) and the geometry_columns view just pulled 
everything together into one user-convenient location.

CREATE TABLE foo ( mygeom POLYGON(4326) );
CREATE TABLE bar ( mygeom MULTILINESTRING(20711, 2 ) );
I think we had this discussion before though, and the parameterized 
types, like varchar(256), were not available for extended types, like 
our geometries.

P.

On Tuesday, February 3, 2004, at 12:06 PM, Tom Lane wrote:

Paul Ramsey [EMAIL PROTECTED] writes:
Oh, now I remember. The deal was not views, it was triggers.
Oh, okay.  You're right, we don't do triggers on system tables.  But
couldn't you combine a view on the system tables with storage of
additional data outside?
			regards, tom lane

 Paul Ramsey
 Refractions Research
 Email: [EMAIL PROTECTED]
 Phone: (250) 885-0632
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

1 2 >

1 - 100 of 127 matches

Mail list logo