Hi Regina,
Unfortunately, I don't think your solution works. I performed some
tests extracting points from a linestring (instead of a geometry from a
collection - generate_series has the same performance problems).
I took three linestrings, with ~5000, ~10000, and ~20000 points and
tried to extract the coordinates.
Here are my results:
-- A linestring with 5718 points:
EXPLAIN ANALYZE select pointn(the_geom, generate_series(1,
numpoints(the_geom))) from kneufeld.tmp;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on tmp (cost=0.00..31.53 rows=1230 width=32) (actual
time=1.518..3936.619 rows=5718 loops=1)
Total runtime: 3958.204 ms
Total runtime: 3961.324 ms
Total runtime: 3940.620 ms
-- A linestring with 10532 points
Total runtime: 18170.791 ms
Total runtime: 18179.784 ms
Total runtime: 18189.938 ms
-- A linestring with 21227 points
Total runtime: 77720.651 ms
Total runtime: 77740.417 ms
Total runtime: 77710.720 ms
As you can see the generate_series way does not scale well.
Here are the same three linestrings using your technique (good try by
the way!)
-- A linestring with 5718 points:
EXPLAIN ANALYZE
SELECT pointn(t.the_geom, g.n) As poly
FROM kneufeld.tmp t CROSS JOIN pgis_gs g
WHERE g.n <= numpoints(the_geom);
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..431332.72 rows=12300000 width=36) (actual
time=1.817..4278.020 rows=5718 loops=1)
-> Seq Scan on tmp t (cost=0.00..22.30 rows=1230 width=32) (actual
time=0.008..0.013 rows=1 loops=1)
-> Index Scan using pgis_gs_pkey on pgis_gs g (cost=0.00..175.66
rows=10000 width=4) (actual time=0.094..30.803 rows=5718 loops=1)
Index Cond: (g.n <= numpoints(t.the_geom))
Total runtime: 4301.425 ms
Total runtime: 4316.355 ms
Total runtime: 4351.617 ms
-- A linestring with 10532 points
Total runtime: 18170.791 ms
Total runtime: 18179.784 ms
Total runtime: 18189.938 ms
-- A linestring with 21227 points
Total runtime: 77833.419 ms
Total runtime: 77826.653 ms
Total runtime: 77793.823 ms
I think the problem has anything to do with the result key being indexed
or not. pg_class reveals that even the largest of the three geometries
fit in one page ... so the problem is also not related to toast tables.
I may have something to do with way PostgreSQL passes parameters
around... do you know if it passes parameters by value or by reference.
Does it make a copy of "the_geom" for every call to numpoints,
generate_series, and pointn? This would explain the exponential
computation time we're seeing.
Cheers,
Kevin
-------------
Kevin Neufeld
Software Developer
Refractions Research Inc.
300-1207 Douglas St.
Victoria, B.C., V8W 2E7
Phone: (250) 383-3022
Email: [EMAIL PROTECTED]
Obe, Regina wrote:
I know there has been some discussion in the past about how
generate_series doesn't scale well when you are dealing with large
numbers of geometries. My suspicion was that it was because
generate_series has to materialize the result before outputting it and
that the result key is not indexed.
Unfortunately I don't have geometries large enough to test the theory
out, but does seem to be faster for me even for structures where I have
more than 10 or geometries per multipolygon. Can some people try this
out. The below will take each multi geom and break it out into single
geom structures and will handle in theory a MULTI structure with up to
500,000 geoms and can easily handle more by increasing the number of
records in the materialized generate series table.
Can you compare this:
SELECT g.n
INTO pgis_gs
FROM generate_series(1,500000) g(n);
ALTER TABLE pgis_gs
ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n);
SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
FROM somtable t CROSS JOIN pgis_gs g
WHERE g.n <= ST_NumGeometries(the_geom);
With these:
--this will handle at most 5000 (for small sets of geometries - incurs a
huge penalty)
SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
FROM somtable t CROSS JOIN generate_series(1,5000) g(n)
WHERE g.n <= ST_NumGeometries(the_geom);
-this can handle more but I suspect gets really bad for large numbers
SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As poly
FROM somtable t;
Thanks,
Regina
-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users