Hi Sarah,

The pgsql changes are complete and are available in SVN.  For bbox style
queries, with the linestring optional script installed the performance
improvements are quite drastic running approx 5-10 times faster on large
datasets.
http://svn.openstreetmap.org/applications/utils/osmosis/trunk

The new scripts are available in the package/script directory.  The
pgsql_simple.txt describes what each script is for.  You will typically want
pgsql_simple_schema_0.6.sql and possibly
pgsql_simple_schema_0.6_linestring.sql.  The database requires that postgis
and hstore extensions are installed.  Docs could probably be improved here,
I haven't had much time to spend on it lately.

If you checkout from SVN, type "ant publish" to build Osmosis.  That
requires a working ant installation.  You can then run osmosis with the
package/bin/osmosis launch script (or osmosis.bat if you're running
Windows).

Scott Crosby is still working on new binary tasks.  Once they're ready a
proper 0.37 binary release can be created.  I probably won't be around much
over the next week or two so I doubt if it will be released then unless
somebody else wants to do it.

Cheers,
Brett


On Wed, Sep 15, 2010 at 12:39 AM, Sarah Hoffmann <[email protected]> wrote:

> Hi Brett,
>
> as you were talking about making a new release, may I ask what the status
> of
> the implementation of the new schema is? Is the version in SVN something
> I could already play with?
>
> Sarah
>
> Brett Henderson wrote:
> > Hi All,
> >
> > I'm currently working on some changes to the Osmosis "simple" schema
> which
> > may be of interest to others.  I'd be interested to hear if anybody has
> any
> > major issues with this, or any better suggestions.
> >
> > The current schema performs poorly, largely due to the data for typical
> > queries being spread across the disk.  It is well indexed, but retrieving
> > large numbers of rows requires huge numbers of disk seeks.  Performance
> > would be better if data was physically grouped according to geospatial
> > location.  I am planning several changes to address this:
> >
> >    - CLUSTER the nodes table by the geom column index, and ways column by
> >    the (optional) linestring column index.  I've already tested this out
> for
> >    bbox style queries and it makes queries on these tables significantly
> >    quicker.  It takes a long time to perform the CLUSTER operation, but
> >    subsequent queries are then improved.
> >    - Move the tags tables into hstore tags columns on the nodes, ways and
> >    relations tables.  This will avoid the need to join to external
> tables, and
> >    will allow the tags data to also be clustered geospatially by the
> geospatial
> >    indexes.  For entities with large numbers of tags or large tags the
> data may
> >    be stored externally (
> >    http://www.postgresql.org/docs/8.4/interactive/storage-toast.html),
> but
> >    this should be the exception and most tags should fit inline in the
> table.
> >    - Create a nodes column on the ways table.  This will contain an array
> >    which holds only the ids of nodes that make up the way.  For typical
> >    bounding box style queries this will allow "completeWays" style
> >    functionality to be performed more efficiently without having to join
> to
> >    large numbers of rows in the way_nodes table.  For bbox style queries
> in
> >    some use cases it will also be possible to create synthetic node
> entities
> >    (without tag or user info) for missing nodes lying outside the
> bounding box
> >    which will further improve performance.
> >
> > So far I've written a migration script for moving tags data into hstore
> > columns, and I've figured out how to get Java and JDBC playing nicely
> with
> > hstore columns.  The next step is to update existing tasks to use these
> > columns.  As part of this change I will also change the way the bounding
> box
> > queries work so that they store more data in the temporary tables to
> avoid
> > having to join back to the main data tables.  Again, this will
> significantly
> > reduce disk seeking.
> >
> > I'll move onto the addition of a way.nodes column after I've finished the
> > tags changes.
> >
> > I'm not sure when I'll find time to finish all of this, but it's the main
> > thing I'm working on.
> >
> > Brett
>
>
> _______________________________________________
> osmosis-dev mailing list
> [email protected]
> http://lists.openstreetmap.org/listinfo/osmosis-dev
>
_______________________________________________
osmosis-dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/osmosis-dev

Reply via email to