Re: [postgis-users] Tiger 2008 edges optimizations

Stephen Woodbridge Fri, 05 Jun 2009 20:21:49 -0700

Alessandro,

I few other ideas/issues:

When you zoom out is the worse case because you have to look at all thedata. So anything you can do to organize the data to help this wouldimprove performance like:

Create separate tables for the different mtfcc codes that you want todisplay and have them be inherited from a master table. This way whenyou zoom out and only want the major roads you can just scan the majorroads tables and not all the other roads, but as you zoom in you canswitch to the master table that inherits all the roads.

you can also create separate geometry columns there one column issimplified for use when you are zoomed out.

Build your gist indexes and cluster on them will help performance whenyou are zoom in by requiring less disk seek because you data isclustered spatially which will likely require few page loads.


-Steve W
 http://imaptools.com/

Greg Williamson wrote:

Alessandro --

You might want to
a) post some details on your underlying file system (the part thatpostgres uses, anyway). Postgres can only use one CPU per query so forthis sort of testing, disck I/O is probably more of a bottleneck than CPU.
b) and on changes you have made to the postgres config file [if nonethen that right there is a hint] ... generally some tweaks need to bemade since the defaults settings are minimal and not suitable for anyreal processing (work mem in particular, but also other knobs whichdepend on your version of postgres ...
c) the output from postgis_full_version() and the details on postgres(version/rev level) and the OS.
The estimate predicts about twice as many rows as there are, so I'llask: Have you run "ANALYZE" for the table since loading / changing /deleting data ? A better estimate won't help this query directly butmight help in other areas. How many rows in that table, and how manywith "mtfcc" = 'S1100' ? If you have run a lot of updates or deletesmake sure to run a VACUUM as well (and check the logs to make sure thatthere are no complaints); this will eliminate dead rows that the indexcurrently (might) be seeing.
That said, a few seconds for such a large data set might not beunreasonable -- how big is that bounding box ? (I don't have access toany real data currently so I can't really look for myself ...)
If it is getting a large enough area the planner may be deciding onusing the S1100 condition and then filtering results through thegeometry (that's how I read the output, but I'm rusty at this stuff).Wading through lots of potential rows takes time for the backend, andpassing lots of rows takes yet more. If this BB is the whole US then theplanner is wisely ignoring your spatial index; try smaller scales andyou should see a GIST index being used.
As you point out, abstraction of this data would be mandatory for anyuse that is zoomed out too any significant degree (even a single countycan have a huge number of street blocks). Creating other tables withjust highways, consolidated polygons with just the perimiter, etc. willall enormously. When I had to deal with street data we made a table withtwo or three of the highest level (A1, A2 at least) street codes only;for parcel data we actually used a form of partioning, especially todeal with requirements like "I want to see any street with "HECH" in itsname.
HTH,

Greg Williamson



*From:* Alessandro Ferrucci <[email protected]>
*To:* [email protected]
*Sent:* Friday, June 5, 2009 7:29:54 AM
*Subject:* [postgis-users] Tiger 2008 edges optimizations

Hello,
I've loaded the entire roads data set into 1 postGIS table. I'vecreated an SLD and am using geoserver as the middle man. Rendering themap as you can imagine is quite slow at high levels. I've added a gistindex on the the_geom column and a btree index on the mtfcc column(which is a string variable). I've already made postgres optimizationson postgresql.conf and now I'm analyzing query times. I'm very much anewbie in the postgres/postgis arena.
the box specs are:
dual core system with two intel 5160 @ 3.00 ghz
16 gb ram
censOS
I've turned on query logging on teh postgres instance and I took one ofthe queries and ran EXPLAIN ANALYZE on it:
the query is:
SELECT "gid", "mtfcc",encode(asBinary(force_2d("the_geom"),'XDR'),'base64') FROM"public"."edges" WHERE ("the_geom" && GeometryFromText('POLYGON((-177.51278882324738 -13.626259663701076, -177.51278882324738103.63932637870316, 132.62382689416376 103.63932637870316,132.62382689416376 -13.626259663701076, -177.51278882324738-13.626259663701076))', 4269) AND "mtfcc" = 'S1100')
The analyze came back as:
"Bitmap Heap Scan on edges (cost=9602.70..992914.21 rows=725231width=821) (actual time=926.943..3785.268 rows=369827 loops=1)"
"  Recheck Cond: ((mtfcc)::text = 'S1100'::text)"
" Filter: (the_geom &&'0103000020AD100000010000000500000042331BC4683066C0D7CC4C1BA5402BC042331BC4683066C0030030B9EAE859408E99D163F6936040030030B9EAE859408E99D163F6936040D7CC4C1BA5402BC042331BC4683066C0D7CC4C1BA5402BC0'::geometry)"" -> Bitmap Index Scan on roads_idx (cost=0.00..9421.39 rows=725231width=0) (actual time=840.086..840.086 rows=369827 loops=1)"
"        Index Cond: ((mtfcc)::text = 'S1100'::text)"
"Total runtime: 3848.870 ms"
There are a lot of these queries that are run when I'm rendering theroads from a high zoom level (encompasses the entire U.S.). I will makeoptimization changes (like line reduction) as well but for now I want tomake sure that my indeces are being used properly. The "Bitmap HeapScan" looks like it's doing a table scan on 725231 rows ...
any assistance?
thanks
--
Signed,
Alessandro Ferrucci


------------------------------------------------------------------------

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] Tiger 2008 edges optimizations

Reply via email to