Hi Brett, as you were talking about making a new release, may I ask what the status of the implementation of the new schema is? Is the version in SVN something I could already play with?
Sarah Brett Henderson wrote: > Hi All, > > I'm currently working on some changes to the Osmosis "simple" schema which > may be of interest to others. I'd be interested to hear if anybody has any > major issues with this, or any better suggestions. > > The current schema performs poorly, largely due to the data for typical > queries being spread across the disk. It is well indexed, but retrieving > large numbers of rows requires huge numbers of disk seeks. Performance > would be better if data was physically grouped according to geospatial > location. I am planning several changes to address this: > > - CLUSTER the nodes table by the geom column index, and ways column by > the (optional) linestring column index. I've already tested this out for > bbox style queries and it makes queries on these tables significantly > quicker. It takes a long time to perform the CLUSTER operation, but > subsequent queries are then improved. > - Move the tags tables into hstore tags columns on the nodes, ways and > relations tables. This will avoid the need to join to external tables, and > will allow the tags data to also be clustered geospatially by the > geospatial > indexes. For entities with large numbers of tags or large tags the data > may > be stored externally ( > http://www.postgresql.org/docs/8.4/interactive/storage-toast.html), but > this should be the exception and most tags should fit inline in the table. > - Create a nodes column on the ways table. This will contain an array > which holds only the ids of nodes that make up the way. For typical > bounding box style queries this will allow "completeWays" style > functionality to be performed more efficiently without having to join to > large numbers of rows in the way_nodes table. For bbox style queries in > some use cases it will also be possible to create synthetic node entities > (without tag or user info) for missing nodes lying outside the bounding box > which will further improve performance. > > So far I've written a migration script for moving tags data into hstore > columns, and I've figured out how to get Java and JDBC playing nicely with > hstore columns. The next step is to update existing tasks to use these > columns. As part of this change I will also change the way the bounding box > queries work so that they store more data in the temporary tables to avoid > having to join back to the main data tables. Again, this will significantly > reduce disk seeking. > > I'll move onto the addition of a way.nodes column after I've finished the > tags changes. > > I'm not sure when I'll find time to finish all of this, but it's the main > thing I'm working on. > > Brett _______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
