Re: [osmosis-dev] Changes to Osmosis Pgsql Schema

Sarah Hoffmann Tue, 14 Sep 2010 23:55:13 -0700

Hi Brett,

thanks for the extensive info. Don't rush the release, I'm perfectly
happy with using the SVN version.


Cheers,
  Sarah

On Wed, Sep 15, 2010 at 09:29:49AM +1000, Brett Henderson wrote:
> Hi Sarah,
> 
> The pgsql changes are complete and are available in SVN.  For bbox style
> queries, with the linestring optional script installed the performance
> improvements are quite drastic running approx 5-10 times faster on large
> datasets.
> http://svn.openstreetmap.org/applications/utils/osmosis/trunk
> 
> The new scripts are available in the package/script directory.  The
> pgsql_simple.txt describes what each script is for.  You will typically want
> pgsql_simple_schema_0.6.sql and possibly
> pgsql_simple_schema_0.6_linestring.sql.  The database requires that postgis
> and hstore extensions are installed.  Docs could probably be improved here,
> I haven't had much time to spend on it lately.
> 
> If you checkout from SVN, type "ant publish" to build Osmosis.  That
> requires a working ant installation.  You can then run osmosis with the
> package/bin/osmosis launch script (or osmosis.bat if you're running
> Windows).
> 
> Scott Crosby is still working on new binary tasks.  Once they're ready a
> proper 0.37 binary release can be created.  I probably won't be around much
> over the next week or two so I doubt if it will be released then unless
> somebody else wants to do it.
> 
> Cheers,
> Brett
> 
> 
> On Wed, Sep 15, 2010 at 12:39 AM, Sarah Hoffmann <[email protected]> wrote:
> 
> > Hi Brett,
> >
> > as you were talking about making a new release, may I ask what the status
> > of
> > the implementation of the new schema is? Is the version in SVN something
> > I could already play with?
> >
> > Sarah
> >
> > Brett Henderson wrote:
> > > Hi All,
> > >
> > > I'm currently working on some changes to the Osmosis "simple" schema
> > which
> > > may be of interest to others.  I'd be interested to hear if anybody has
> > any
> > > major issues with this, or any better suggestions.
> > >
> > > The current schema performs poorly, largely due to the data for typical
> > > queries being spread across the disk.  It is well indexed, but retrieving
> > > large numbers of rows requires huge numbers of disk seeks.  Performance
> > > would be better if data was physically grouped according to geospatial
> > > location.  I am planning several changes to address this:
> > >
> > >    - CLUSTER the nodes table by the geom column index, and ways column by
> > >    the (optional) linestring column index.  I've already tested this out
> > for
> > >    bbox style queries and it makes queries on these tables significantly
> > >    quicker.  It takes a long time to perform the CLUSTER operation, but
> > >    subsequent queries are then improved.
> > >    - Move the tags tables into hstore tags columns on the nodes, ways and
> > >    relations tables.  This will avoid the need to join to external
> > tables, and
> > >    will allow the tags data to also be clustered geospatially by the
> > geospatial
> > >    indexes.  For entities with large numbers of tags or large tags the
> > data may
> > >    be stored externally (
> > >    http://www.postgresql.org/docs/8.4/interactive/storage-toast.html),
> > but
> > >    this should be the exception and most tags should fit inline in the
> > table.
> > >    - Create a nodes column on the ways table.  This will contain an array
> > >    which holds only the ids of nodes that make up the way.  For typical
> > >    bounding box style queries this will allow "completeWays" style
> > >    functionality to be performed more efficiently without having to join
> > to
> > >    large numbers of rows in the way_nodes table.  For bbox style queries
> > in
> > >    some use cases it will also be possible to create synthetic node
> > entities
> > >    (without tag or user info) for missing nodes lying outside the
> > bounding box
> > >    which will further improve performance.
> > >
> > > So far I've written a migration script for moving tags data into hstore
> > > columns, and I've figured out how to get Java and JDBC playing nicely
> > with
> > > hstore columns.  The next step is to update existing tasks to use these
> > > columns.  As part of this change I will also change the way the bounding
> > box
> > > queries work so that they store more data in the temporary tables to
> > avoid
> > > having to join back to the main data tables.  Again, this will
> > significantly
> > > reduce disk seeking.
> > >
> > > I'll move onto the addition of a way.nodes column after I've finished the
> > > tags changes.
> > >
> > > I'm not sure when I'll find time to finish all of this, but it's the main
> > > thing I'm working on.
> > >
> > > Brett
> >
> >
> > _______________________________________________
> > osmosis-dev mailing list
> > [email protected]
> > http://lists.openstreetmap.org/listinfo/osmosis-dev
> >

_______________________________________________
osmosis-dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/osmosis-dev

Re: [osmosis-dev] Changes to Osmosis Pgsql Schema

Reply via email to