David Earl wrote: > Would it really be that much slower: yes it is more work, but OTOH, it > is fewer disk writes? The daily diffs are approximately 6MB of data to write but take a couple of minutes to produce. The disk overhead is negligible. But this also means that the compression overhead is probably also negligible and wouldn't add much to the overall time. When I first set this up we were dealing with TIGER imports and much larger daily files, the volumes are much smaller at the moment.
I'll set up a daily diff using the hourly/minute mechanism and see if any problems occur. > > I rely on these for the Namefinder updates, and I've always been > worried that they may not form a continuous sequence, especially if > something goes wrong, the consequence of which is to repair it I'd > have to do a full database import which takes a week or so to run. Understand, my aim is for this to be a reliable means of keeping in sync. > > It would be a simple matter to switch to gzip, so long as I know when > it is to change. I'll set up the new one in parallel for a little while. > > I noticed that the day after the empty file, the file was larger than > usual. Did it in fact catch the diffs since the previous file, or just > in the previous day? I've just had a quick look, I'm fairly sure data has been missed. The larger file is presumably because people had queued uploads. > > From the Namefinder POV, if I miss a file, I catch up with it later > (but it did break when you changed the convention to span two days a > while back after a failure, but I fixed that). But if there is a gap > in the sequence that's very hard to repair because I'd already have > applied later updates. > > David I have just re-generated the file from the 19th-20th. http://planet.openstreetmap.org/daily/daily-20080619-20080620.osc.bz2 If you wish to re-import, just import the files in sequence again. Assuming your import scripts won't break in some unexpected way, this will ensure that all updates are applied in the correct order. Note that osmosis now has a task called --read-change-interval which can download all hourly or minute diffs since the last invocation, merge them into a single changeset, and send to subsequent tasks in the osmosis pipeline. The consuming task can be an xml change file writer or a database writing task if you have one. It tracks the latest downloaded timestamp in a timestamp file and will generate empty changesets if no updates are available yet and will abort completely if the planet server can't be reached. _______________________________________________ talk mailing list [email protected] http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk

