Hi Frederik, On 24 January 2013 09:02, Frederik Ramm <[email protected]> wrote:
> Hi, > > I'm toying with the idea of offering regionalised diffs - i.e. a series > of daily diffs for every regional extract that download.geofabrik.de has > to offer. To make it easy for consumers to keep their extracts up to date, > I thought about making an Osmosis-style directory for each extract, e.g. > something like > > download.geofabrik.de/**openstreetmap/europe/germany/** > nordrhein-westfalen/000/000/**001.osc.gz<http://download.geofabrik.de/openstreetmap/europe/germany/nordrhein-westfalen/000/000/001.osc.gz> > > or so. Just to be safe: What are the conventions that I will have to > follow so that this works seamlessly with existing clients? Simply have a > xxx.osc.gz and matching xxx.state.txt in the leaf directory, count from 000 > to 999 then wrap to the next directory, and have the most recent state.txt > file at the root directory as well - anything else? > That sounds about right. Each state file should only need the sequenceNumber and timestamp fields (existing hour and day replication files only provide these two fields). The sequenceNumber is the most important (and easiest) to get right. The timestamp should be greater than or equal to the timestamp of the latest entity in the change file, but this is only critical when identifying a replication start point. The order in which you write files is important to avoid race conditions and cope with software failures. I always write the osc.gz file, then the state.txt file in the leaf directory, then finally the state file in the root directory. If I encounter any failures during processing and the root state.txt file isn't created, I simply start again and overwrite any existing osc.gz and state.txt files in the leaf directory. The state.txt file in the root directory is used to identify the current sequence number at the start of processing. It is critical to ensure that only a single process writes to the replication directory at a time. > > If the frequency wasn't exactly daily - if, say, because of some sort of > glitch there was extract for one day and therefore the diff is missing, or > if there were two extracts in one day - would that matter? > So long as the sequence number always increases by one you should be fine. Osmosis (not sure about other clients) bases most of its processing off the sequence number and doesn't care how far apart each time interval is. For example, the existing minute replication sometimes has much larger than 1 minute gaps if the replication process is halted for any reason. Brett
_______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
