Hi Brett, thanks for the extensive info. Don't rush the release, I'm perfectly happy with using the SVN version.
Cheers, Sarah On Wed, Sep 15, 2010 at 09:29:49AM +1000, Brett Henderson wrote: > Hi Sarah, > > The pgsql changes are complete and are available in SVN. For bbox style > queries, with the linestring optional script installed the performance > improvements are quite drastic running approx 5-10 times faster on large > datasets. > http://svn.openstreetmap.org/applications/utils/osmosis/trunk > > The new scripts are available in the package/script directory. The > pgsql_simple.txt describes what each script is for. You will typically want > pgsql_simple_schema_0.6.sql and possibly > pgsql_simple_schema_0.6_linestring.sql. The database requires that postgis > and hstore extensions are installed. Docs could probably be improved here, > I haven't had much time to spend on it lately. > > If you checkout from SVN, type "ant publish" to build Osmosis. That > requires a working ant installation. You can then run osmosis with the > package/bin/osmosis launch script (or osmosis.bat if you're running > Windows). > > Scott Crosby is still working on new binary tasks. Once they're ready a > proper 0.37 binary release can be created. I probably won't be around much > over the next week or two so I doubt if it will be released then unless > somebody else wants to do it. > > Cheers, > Brett > > > On Wed, Sep 15, 2010 at 12:39 AM, Sarah Hoffmann <[email protected]> wrote: > > > Hi Brett, > > > > as you were talking about making a new release, may I ask what the status > > of > > the implementation of the new schema is? Is the version in SVN something > > I could already play with? > > > > Sarah > > > > Brett Henderson wrote: > > > Hi All, > > > > > > I'm currently working on some changes to the Osmosis "simple" schema > > which > > > may be of interest to others. I'd be interested to hear if anybody has > > any > > > major issues with this, or any better suggestions. > > > > > > The current schema performs poorly, largely due to the data for typical > > > queries being spread across the disk. It is well indexed, but retrieving > > > large numbers of rows requires huge numbers of disk seeks. Performance > > > would be better if data was physically grouped according to geospatial > > > location. I am planning several changes to address this: > > > > > > - CLUSTER the nodes table by the geom column index, and ways column by > > > the (optional) linestring column index. I've already tested this out > > for > > > bbox style queries and it makes queries on these tables significantly > > > quicker. It takes a long time to perform the CLUSTER operation, but > > > subsequent queries are then improved. > > > - Move the tags tables into hstore tags columns on the nodes, ways and > > > relations tables. This will avoid the need to join to external > > tables, and > > > will allow the tags data to also be clustered geospatially by the > > geospatial > > > indexes. For entities with large numbers of tags or large tags the > > data may > > > be stored externally ( > > > http://www.postgresql.org/docs/8.4/interactive/storage-toast.html), > > but > > > this should be the exception and most tags should fit inline in the > > table. > > > - Create a nodes column on the ways table. This will contain an array > > > which holds only the ids of nodes that make up the way. For typical > > > bounding box style queries this will allow "completeWays" style > > > functionality to be performed more efficiently without having to join > > to > > > large numbers of rows in the way_nodes table. For bbox style queries > > in > > > some use cases it will also be possible to create synthetic node > > entities > > > (without tag or user info) for missing nodes lying outside the > > bounding box > > > which will further improve performance. > > > > > > So far I've written a migration script for moving tags data into hstore > > > columns, and I've figured out how to get Java and JDBC playing nicely > > with > > > hstore columns. The next step is to update existing tasks to use these > > > columns. As part of this change I will also change the way the bounding > > box > > > queries work so that they store more data in the temporary tables to > > avoid > > > having to join back to the main data tables. Again, this will > > significantly > > > reduce disk seeking. > > > > > > I'll move onto the addition of a way.nodes column after I've finished the > > > tags changes. > > > > > > I'm not sure when I'll find time to finish all of this, but it's the main > > > thing I'm working on. > > > > > > Brett > > > > > > _______________________________________________ > > osmosis-dev mailing list > > [email protected] > > http://lists.openstreetmap.org/listinfo/osmosis-dev > > _______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
