Hi, I've been trying to keep up to date with the dumps and diffs from http://planet.openstreetmap.org/ , and I'm running into a number of bugs related to cutoff dates.
In keeping my Bay Area tiles (http://mike.teczno.com/notes/cascadenik-openstreetmap.html ) up to date, I've been grabbing complete planet.osm dumps about once per month, and filling in the intervening time with daily diffs. I've noticed some misalignments between the data in the dumps and the osm2pgsql importer that leads to unavoidable holes in the data. It seems that they could be fixed in either osm2pgsql, the planet files, or both. The final event in each weekly planet dump does not fall on an even day boundary. In the case of the most recent Oct. 22nd planet.osm, it was necessary to experiment with hourly diffs from that day to find that the boundary was approx. 2:00pm. Hourlies up to and including 2008102213-2008102214.osc.gz failed, hourlies after that succeeded. I could go more granular here, checking the minute diffs as well for a more precise breakpoint, but it seems odd that the planet dump does not break cleanly on a midnight boundary so that it's possible to pick up the differences moving forward. osm2pgsql itself notifies the user of inconsistencies by failing. I can see that effort has been put into making it more resilient (e.g. http://trac.openstreetmap.org/changeset/10464) . Does osm2pgsql have something like a `--force` switch? I haven't been able to find one. In looking at the diff files, it seems that it should be possible to ignore possible conflicts by simply overwriting whatever's in the DB with whatever's in the .osc file. Finally, the boundaries between the hourlies and dailies seem misaligned. After running the remaining hourlies for the 22nd, I attempted to pick up on the 23rd with a daily. The final hourly I used was 2008102223-2008102300.osc.gz. It's my expectation that I should be able to immediately follow that with 20081023-20081024.osc.gz, but this led to duplicate key violation suggesting that there's an overlap between the two files. Continuing with hourlies *works*, but is tedious and I suspect slower than the dailies. My sense from reading other people's experiences has been that it's a common pattern to rely solely on the weekly planet dumps, incurring the substantial overhead of parsing and importing the full 5GB dump once every week, and then re-rendering the complete set of tiles. My hope has been to proceed in a more incremental fashion, since this makes it possible to track what specific tiles need to be re-rendered on a near-constant schedule, based on actual content or activity, vs. simple cache expiration. Right now I'm doing this daily, I'd like to do it as often as hourly. I can see a few possible solutions. The cutoff times for files on planet.openstreetmap.org could behave more consistently. A weekly dump should end at 11:59pm so that dailies can immediately pick up user activity. Hourly and daily dumps should be synchronized. This seems more difficult. Or, osm2pgsql could be more fault-tolerant, so that potentially- overlapping .osm and .osc files can be safely used. As long as they are applied in chronological order, repetitions should be idempotent. Is this just a matter of futzing with the SQL commands to suppress index key collisions? -mike. ---------------------------------------------------------------- michal migurski- [EMAIL PROTECTED] 415.558.1610 _______________________________________________ talk mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk

