[osmosis-dev] Re: Changes to Osmosis Pgsql Schema

Sarah Hoffmann Tue, 14 Sep 2010 07:47:06 -0700

Hi Brett,

as you were talking about making a new release, may I ask what the status of 
the implementation of the new schema is? Is the version in SVN something
I could already play with?


Sarah

Brett Henderson wrote:
> Hi All,
> 
> I'm currently working on some changes to the Osmosis "simple" schema which
> may be of interest to others.  I'd be interested to hear if anybody has any
> major issues with this, or any better suggestions.
> 
> The current schema performs poorly, largely due to the data for typical
> queries being spread across the disk.  It is well indexed, but retrieving
> large numbers of rows requires huge numbers of disk seeks.  Performance
> would be better if data was physically grouped according to geospatial
> location.  I am planning several changes to address this:
> 
>    - CLUSTER the nodes table by the geom column index, and ways column by
>    the (optional) linestring column index.  I've already tested this out for
>    bbox style queries and it makes queries on these tables significantly
>    quicker.  It takes a long time to perform the CLUSTER operation, but
>    subsequent queries are then improved.
>    - Move the tags tables into hstore tags columns on the nodes, ways and
>    relations tables.  This will avoid the need to join to external tables, and
>    will allow the tags data to also be clustered geospatially by the 
> geospatial
>    indexes.  For entities with large numbers of tags or large tags the data 
> may
>    be stored externally (
>    http://www.postgresql.org/docs/8.4/interactive/storage-toast.html), but
>    this should be the exception and most tags should fit inline in the table.
>    - Create a nodes column on the ways table.  This will contain an array
>    which holds only the ids of nodes that make up the way.  For typical
>    bounding box style queries this will allow "completeWays" style
>    functionality to be performed more efficiently without having to join to
>    large numbers of rows in the way_nodes table.  For bbox style queries in
>    some use cases it will also be possible to create synthetic node entities
>    (without tag or user info) for missing nodes lying outside the bounding box
>    which will further improve performance.
> 
> So far I've written a migration script for moving tags data into hstore
> columns, and I've figured out how to get Java and JDBC playing nicely with
> hstore columns.  The next step is to update existing tasks to use these
> columns.  As part of this change I will also change the way the bounding box
> queries work so that they store more data in the temporary tables to avoid
> having to join back to the main data tables.  Again, this will significantly
> reduce disk seeking.
> 
> I'll move onto the addition of a way.nodes column after I've finished the
> tags changes.
> 
> I'm not sure when I'll find time to finish all of this, but it's the main
> thing I'm working on.
> 
> Brett


_______________________________________________
osmosis-dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/osmosis-dev

[osmosis-dev] Re: Changes to Osmosis Pgsql Schema

Reply via email to