On Sat, Oct 16, 2010 at 9:41 PM, Jochen Topf <joc...@remote.org> wrote:

> On Sat, Oct 16, 2010 at 05:33:37PM +1100, Brett Henderson wrote:
> >    - Updated the pgsql schema (now version 6) to move all tags into
> hstore
> >    columns, add a ways.nodes column and CLUSTER the nodes and ways
> tables.
> >    Significant performance increases.
>
> Is there some magic involved or do you just call CLUSTER after the import?
> When I tested CLUSTER it helped with reads, but creating the cluster was
> very expensive.
>

No magic unfortunately.  The CLUSTER step is the most expensive part of the
process because it changes the location on disk of every record in the
database.  It took several days for a full planet when I tested it out.

I was performing bbox queries (similar ROMA or TRAPI but with my
implementation that uses true way geometries instead of just node locations)
and 1x1 degree queries for heavily populated areas (eg. Munich, London) were
taking up to an hour to perform.  After removing tags tables and performing
CLUSTER statements the time reduced to somewhere between 5 and 10 minutes
due to much reduced disk seeking.  The database is kept up to date with
hourly diffs so the CLUSTER step only has to be performed once (or once
every few months perhaps) but the read benefits are ongoing.  If you're
performing regular full imports it may not make sense.

If the overhead of performing the CLUSTER step outweighs the read benefits
(depends on the lifetime of your database) then you can skip the CLUSTER
step.  It makes no functional difference that I'm aware of.

Brett
_______________________________________________
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev

Reply via email to