It could be the reentry and prepare of the geom that is the culprit. I haven't looked at the code, but from thinking about it, would think there would be some repetitive setup involved in each case (even if passing were efficient) and then pulling out a single point from a reprepared array of points.
Given that PostgreSQL has this side effect behavior of allowing set returning functions in the SELECT part. Couldn't we then create a SPLIT BY geom function similar to DUMP, but returns a set of geoms instead of a dump object. Then Kevin's query would be reduced to SELECT t.gid, Geom_SplitBy(the_geom, 'POINT') as the_geom FROM kneufeld.tmp t You could even have an overloaded allows a range of geoms to pull that does SELECT t.gid, Geom_SplitBy(the_geom, 'POINT', 1,5) as the_geom FROM kneufeld.tmp t Types would map to our GeomN type functions like PointN, GeometryN, InteriorRingN. This would avoid any pass by value inefficiencies since the_geom would be passed only once and would allow for internal efficiencies of pulling multiple geometries. As a side note: A colony of mice is much more fun to play with than a puppy :) Thanks, Regina -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Davis Sent: Friday, April 18, 2008 5:25 PM To: PostGIS Users Discussion Subject: Re: [postgis-users] Does anyone have multipolygons withlarge numbersof polygons Yes, I absolutely agree - SPLIT BY should take "Inverse Aggregate" functions, which can be provided for *any* complex object. The example they give in the Manifold doc is classic - split a string into "chunks" at given characters (or regexes...). It's funny that they give that example - but it isn't actually supported by Manifold! So it looks like it's the pass-by-value semantics which are slowing things down, not the generate_series idiom. Hopefully a SPLIT BY facility would be able to avoid this problem - or would it? While we're waiting for that, Regina, I suggest maybe getting a puppy? 8^) Obe, Regina wrote: > I remember looking at Manifold's SPLIT BY function and thinking - > "that's what I'm missing in my life - a inverse GROUP BY." > > I was disappointed to find out that there SPLIT BY only supports 3 > split functions which makes it much more limiting than the > generate_series approach. It makes nice syntactic sugar at the very > least and perhaps if PostgreSQL could implement SPLIT TYPE functions > which would be the inverse of the AGGREGATE functions, then this > would be much more interesting. > > Looking at Kevin's results I am beginning to think the bottleneck may > be in the GeomN type functions and maybe he is right that its doing a > copy call for each or something silly like that or a memory leak (I > wasn't looking is it linear or quadratic as the number of geometries > increases?). > > I'm relieved the sub select thing came up with the same results as if > it was better it would have shattered some of my beliefs on how things > work. > > Kevin, > What happens if you leave out the PointN call altogether just to see > what kind of penalty is being incurred there. > e.g. > SELECT t.gid, generate_series(1, ST_NumPoints(the_geom)) FROM > kneufeld.tmp t > > Thanks, > Regina > > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Martin Davis > Sent: Friday, April 18, 2008 2:22 PM > To: PostGIS Users Discussion > Subject: Re: [postgis-users] Does anyone have multipolygons with large > numbersof polygons > > This is an interesting problem. I think the general pattern can be > stated as something like: "Split a complex column value across > multiple rows, along with other columns from the original record". (A > nice way of thinking of this is that it's the inverse of GROUP BY). > > This is surprisingly awkward to do in standard SQL. It's easier in > PostgreSQL since it has the generate_series() enhancement, but as you > are seeing it's still not a great general solution to the problem. > > The annoying thing is is that the underlying processing required to > compute the desired result is actually pretty trivial. This is one of > those situations where the fairly limited declaritive semantics of SQL > get in your way. > > Manifold SQL has an extension which adds a SPLIT BY clause to handle > exactly this problem. I suspect they have a pretty efficient > implementation, since it is implemented natively in the engine. > > http://www.manifold.net/doc/queries_and_geoms.htm > > Maybe it's time to lobby to get this added to PostgreSQL as well? > > Obe, Regina wrote: > >> I know there has been some discussion in the past about how >> generate_series doesn't scale well when you are dealing with large >> numbers of geometries. My suspicion was that it was because >> generate_series has to materialize the result before outputting it >> and that the result key is not indexed. >> >> Unfortunately I don't have geometries large enough to test the theory >> out, but does seem to be faster for me even for structures where I >> > have > >> more than 10 or geometries per multipolygon. Can some people try >> this out. The below will take each multi geom and break it out into >> single geom structures and will handle in theory a MULTI structure >> with up to 500,000 geoms and can easily handle more by increasing the >> number of records in the materialized generate series table. >> >> Can you compare this: >> >> SELECT g.n >> INTO pgis_gs >> FROM generate_series(1,500000) g(n); >> >> ALTER TABLE pgis_gs >> ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n); >> >> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly FROM somtable t >> CROSS JOIN pgis_gs g WHERE g.n <= ST_NumGeometries(the_geom); >> >> With these: >> >> --this will handle at most 5000 (for small sets of geometries - >> incurs >> > a > >> huge penalty) >> >> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly FROM somtable t >> CROSS JOIN generate_series(1,5000) g(n) WHERE g.n <= >> ST_NumGeometries(the_geom); >> >> -this can handle more but I suspect gets really bad for large numbers >> SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As >> > poly > >> FROM somtable t; >> >> Thanks, >> Regina >> ----------------------------------------- >> The substance of this message, including any attachments, may be >> confidential, legally privileged and/or exempt from disclosure >> pursuant to Massachusetts law. It is intended solely for the >> addressee. If you received this in error, please contact the sender >> and delete the material from any computer. >> >> _______________________________________________ >> postgis-users mailing list >> postgis-users@postgis.refractions.net >> http://postgis.refractions.net/mailman/listinfo/postgis-users >> >> >> > > -- Martin Davis Senior Technical Architect Refractions Research, Inc. (250) 383-3022 _______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users