Hi Sarah, The pgsql changes are complete and are available in SVN. For bbox style queries, with the linestring optional script installed the performance improvements are quite drastic running approx 5-10 times faster on large datasets. http://svn.openstreetmap.org/applications/utils/osmosis/trunk
The new scripts are available in the package/script directory. The pgsql_simple.txt describes what each script is for. You will typically want pgsql_simple_schema_0.6.sql and possibly pgsql_simple_schema_0.6_linestring.sql. The database requires that postgis and hstore extensions are installed. Docs could probably be improved here, I haven't had much time to spend on it lately. If you checkout from SVN, type "ant publish" to build Osmosis. That requires a working ant installation. You can then run osmosis with the package/bin/osmosis launch script (or osmosis.bat if you're running Windows). Scott Crosby is still working on new binary tasks. Once they're ready a proper 0.37 binary release can be created. I probably won't be around much over the next week or two so I doubt if it will be released then unless somebody else wants to do it. Cheers, Brett On Wed, Sep 15, 2010 at 12:39 AM, Sarah Hoffmann <[email protected]> wrote: > Hi Brett, > > as you were talking about making a new release, may I ask what the status > of > the implementation of the new schema is? Is the version in SVN something > I could already play with? > > Sarah > > Brett Henderson wrote: > > Hi All, > > > > I'm currently working on some changes to the Osmosis "simple" schema > which > > may be of interest to others. I'd be interested to hear if anybody has > any > > major issues with this, or any better suggestions. > > > > The current schema performs poorly, largely due to the data for typical > > queries being spread across the disk. It is well indexed, but retrieving > > large numbers of rows requires huge numbers of disk seeks. Performance > > would be better if data was physically grouped according to geospatial > > location. I am planning several changes to address this: > > > > - CLUSTER the nodes table by the geom column index, and ways column by > > the (optional) linestring column index. I've already tested this out > for > > bbox style queries and it makes queries on these tables significantly > > quicker. It takes a long time to perform the CLUSTER operation, but > > subsequent queries are then improved. > > - Move the tags tables into hstore tags columns on the nodes, ways and > > relations tables. This will avoid the need to join to external > tables, and > > will allow the tags data to also be clustered geospatially by the > geospatial > > indexes. For entities with large numbers of tags or large tags the > data may > > be stored externally ( > > http://www.postgresql.org/docs/8.4/interactive/storage-toast.html), > but > > this should be the exception and most tags should fit inline in the > table. > > - Create a nodes column on the ways table. This will contain an array > > which holds only the ids of nodes that make up the way. For typical > > bounding box style queries this will allow "completeWays" style > > functionality to be performed more efficiently without having to join > to > > large numbers of rows in the way_nodes table. For bbox style queries > in > > some use cases it will also be possible to create synthetic node > entities > > (without tag or user info) for missing nodes lying outside the > bounding box > > which will further improve performance. > > > > So far I've written a migration script for moving tags data into hstore > > columns, and I've figured out how to get Java and JDBC playing nicely > with > > hstore columns. The next step is to update existing tasks to use these > > columns. As part of this change I will also change the way the bounding > box > > queries work so that they store more data in the temporary tables to > avoid > > having to join back to the main data tables. Again, this will > significantly > > reduce disk seeking. > > > > I'll move onto the addition of a way.nodes column after I've finished the > > tags changes. > > > > I'm not sure when I'll find time to finish all of this, but it's the main > > thing I'm working on. > > > > Brett > > > _______________________________________________ > osmosis-dev mailing list > [email protected] > http://lists.openstreetmap.org/listinfo/osmosis-dev >
_______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
