Re: [OSM-dev] Timestamp in PBF files
Hi, On 12/05/2012 05:18 AM, Scott Crosby wrote: I've merged the pull request and I think I've added the right changelog, maven, and debian glue for the new version and pushed the v1.3.0 tag to github. Thank you. As of today, the Geofabrik .osm.pbf downloads contain replication information in the PBF header. The software most of you use to read PBF files is unlikely to make use of that information already, but will not be harmed by it. I've built preliminary support for these new fields into Osmium and you can get my version here: https://github.com/woodpeck/osmium/tree/timestamp - if you build the osmium_debug tool in examples it will dump the new headers to stdout. If you plan to read/write replication information in your own programs I suggest that you wait until support is available in Jochen's Osmium version as the interface is likely to change slightly. I expect that Marqqs' osmupdate utility either already supports these new fields or will do so in the very near future. It would be great if someone were to add support to Osmosis which is likely to be a bit tricky as you have to shove replication information through the pipeline, but if all else fails I might have a go at it during the holidays. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Dec 11, 2012 at 03:53:39PM +0100, Frederik Ramm wrote: stdout. If you plan to read/write replication information in your own programs I suggest that you wait until support is available in Jochen's Osmium version as the interface is likely to change slightly. Actually, no, you should not wait for my Osmium version. Just go ahead and do whatever you have to do. I think the implementation and interface of all this hasn't been thought through properly and until I or somebody else does, I am not planning to add this to Osmium. For starters this makes XML and PBF files incompatible which is not good. Next it has to be figured out what changes to the input data lead to changes in the output of these flags. Obviously when you apply a diff those headers should change, shouldn't they? Those things all have to be figured out and implemented properly. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 11.12.2012 15:53, Frederik Ramm wrote: I've built preliminary support for these new fields into Osmium and you can get my version here: https://github.com/woodpeck/osmium/tree/timestamp - if you build the osmium_debug tool in examples it will dump the new headers to stdout. I've also made my timestamp branch of Peter's osm-history-splitter (which I use to create these files) available on github, here: https://github.com/woodpeck/osm-history-splitter/tree/timestamp - it takes commandline switches that allow you to set the headers to whatever you want. This requires the timestamp-supporting version of Osmium linked above. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On 12 December 2012 01:53, Frederik Ramm frede...@remote.org wrote: Hi, snip It would be great if someone were to add support to Osmosis which is likely to be a bit tricky as you have to shove replication information through the pipeline, but if all else fails I might have a go at it during the holidays. I've done something similar with the streaming replication tasks (ie. --receive-replication, --replicate-apidb, --send-replication-data, --write-replication). They exchange state information from source to sink via the new task initialize method which accepts a map of arguments. Typically the source task at the start of the pipeline passes a ReplicationState object through the pipeline in a map key called replication.state (I think ... I'm not looking at the source code). The sink task then updates the state object with the current persisted state during the initialize call, and by the time the initialize call returns, the source task can use it to determine what replication point to start from. As part of that change I updated tasks such as --buffer to propagate the initialize information properly across threads. I believe other tasks such as --merge will still need to be updated. I doubt if I'll be able to provide much assistance in implementing this. I have another child due early in the New Year so I'll probably be off the radar for a while :-) ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 12/04/12 07:50, Jochen Topf wrote: That still isn't specific to Osmosis. Somebody else could implement this algorithm. Markus seems to have done so, albeit a bit differently. The algorithm should be documented somewhere and if you think there can be other algorithms, maybe this one should get a name. I'd call it the Osmosis algorithm, and therefore name the header fields osmosis_..., to make clear that they are intended for this algorithm. - I agree that it would be nice for the algorithm to be documented somewhere but I'm loathe to make this a prerequisite for the proposed changes to OSM-Binary because it will unnecessarily delay the process. But it should not be named after one of the programs that happen to implement it. That would then be pure coincidence. This is a similar issue as with the main OSM map, which was named Mapnik after the rendering program which lead to no end of confusion. Frankly, I don't care what it is called, I just want to get on with the show. Making up a new name for it now and telling everyone that this new name is what they've been using all the time is just as confusing but if anyone thinks this is important enough to spend the time to come up with a new name (or bug Brett to come up with one) then they're welcome to do so. Preferably within a couple of days. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Dec 4, 2012 at 9:12 AM, Frederik Ramm frede...@remote.org wrote: Frankly, I don't care what it is called, call it : a la Osmosis ;-) Pieren ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Dec 4, 2012 at 2:12 AM, Frederik Ramm frede...@remote.org wrote: Hi, On 12/04/12 07:50, Jochen Topf wrote: That still isn't specific to Osmosis. Somebody else could implement this algorithm. Markus seems to have done so, albeit a bit differently. The algorithm should be documented somewhere and if you think there can be other algorithms, maybe this one should get a name. I'd call it the Osmosis algorithm, and therefore name the header fields osmosis_..., to make clear that they are intended for this algorithm. - I agree that it would be nice for the algorithm to be documented somewhere but I'm loathe to make this a prerequisite for the proposed changes to OSM-Binary because it will unnecessarily delay the process. I think I'm willing to call it a consensus. Thank you everyone for the discussion. And Frederik thank you for sending me a pull request. I've committed it as-is. I'm happy to go with osmosis_* for the fieldnames. It's the osmosis algorithm, and we can always add on other metadata fields in the future. I've merged the pull request and I think I've added the right changelog, maven, and debian glue for the new version and pushed the v1.3.0 tag to github. If I've missed anything, please send me a followup email. Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 22.11.2012 00:18, Scott Crosby wrote: I think for Frederik's immediate needs, we should add a have a field called osmosis_replication_timestamp or osmosis_replication_state = 32, which contains a submessage containing a replication timestamp and other replication data that he feels is appropriate. As for the timestamp =18 field, Dennis, what was your intended use of this field? Marqqs, what is the intended use of your timestamp optional_features field? Since nobody has come forward with further requests, may I humbly suggest that we add three new fields: a 64bit integer osmosis_replication_timestamp for the replication timestamp, expressed in seconds since the epoch, otherwise the same value as in state.txt's timestamp=... field; a 64bit integer osmosis_replication_sequence_number for the replication sequence number (sequenceNumber=... in the state.txt file) which is, in practice, not required as Marqqs has explained but makes things easier for Osmosis, as Brett has explaiend; a variable lenth string osmosis_replication_base_url that points to the directory from where replication files are loaded (baseUrl=... in configuration.txt). It may make sense to have a start timestamp and start replication number in there as well but I don't have an immediate use case so I'm happy to defer that until there is one. I've sent you a pull request on GitHub for this change but I'd like to stress again that I wouldn't mind if it were done differently, with other fields, other types, other IDs - main thing for me is that you give it the nod and add it to your OSM-Binary repo which I consider to be the official one. Once the stuff is in there I can go on and make patches for programs that use PBF files in some way. (Not sure if I'll come as far as Osmosis but we'll see.) Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Mon, Dec 03, 2012 at 09:54:59PM +0100, Frederik Ramm wrote: On 22.11.2012 00:18, Scott Crosby wrote: I think for Frederik's immediate needs, we should add a have a field called osmosis_replication_timestamp or osmosis_replication_state = 32, which contains a submessage containing a replication timestamp and other replication data that he feels is appropriate. As for the timestamp =18 field, Dennis, what was your intended use of this field? Marqqs, what is the intended use of your timestamp optional_features field? Since nobody has come forward with further requests, may I humbly suggest that we add three new fields: a 64bit integer osmosis_replication_timestamp for the replication timestamp, expressed in seconds since the epoch, otherwise the same value as in state.txt's timestamp=... field; Why the osmosis in there? That seems rather strange to me. Either it is some general thing that works with all programs, then it shouldn't be named after a specific program. Or it is not, then it shouldn't be in a general file standard. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 03.12.2012 22:27, Jochen Topf wrote: Why the osmosis in there? That seems rather strange to me. Either it is some general thing that works with all programs, then it shouldn't be named after a specific program. Or it is not, then it shouldn't be in a general file standard. It is the replication technology used by Osmosis on the server side. It works with all programs that use the Osmosis algorithm. It doesn't work with every thinkable replication mechanism because those might require other data. Trying to invent something future proof seldom works. For example, the way the directories are structured below the replication URL (http://planet.openstreetmap.org/replication/minute/000/118/578.osc.gz) is something specific to the way Osmosis handles its replication; a program that consumes these files needs knowledge about that. If you wanted to encode some kind of generic replication information then you'd probably boil it down to a simple string field called replication_information and that would then contain something like replication_type=osmosis sequence_number=1234 url=http://something/replication/minute; or so. That would be possible, but it would force every single writer/consumer of these files to serialize/deserialize the replication information string (tabs or spaces? spaces allowed after the equal sign or not? order significant? type=osmosis or type=Osmosis? ...) - making them top-level fields saves us from that. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Dec 04, 2012 at 12:09:16AM +0100, Frederik Ramm wrote: On 03.12.2012 22:27, Jochen Topf wrote: Why the osmosis in there? That seems rather strange to me. Either it is some general thing that works with all programs, then it shouldn't be named after a specific program. Or it is not, then it shouldn't be in a general file standard. It is the replication technology used by Osmosis on the server side. It works with all programs that use the Osmosis algorithm. It doesn't work with every thinkable replication mechanism because those might require other data. Trying to invent something future proof seldom works. For example, the way the directories are structured below the replication URL (http://planet.openstreetmap.org/replication/minute/000/118/578.osc.gz) is something specific to the way Osmosis handles its replication; a program that consumes these files needs knowledge about that. If you wanted to encode some kind of generic replication information then you'd probably boil it down to a simple string field called replication_information and that would then contain something like replication_type=osmosis sequence_number=1234 url=http://something/replication/minute; or so. That would be possible, but it would force every single writer/consumer of these files to serialize/deserialize the replication information string (tabs or spaces? spaces allowed after the equal sign or not? order significant? type=osmosis or type=Osmosis? ...) - making them top-level fields saves us from that. That still isn't specific to Osmosis. Somebody else could implement this algorithm. Markus seems to have done so, albeit a bit differently. The algorithm should be documented somewhere and if you think there can be other algorithms, maybe this one should get a name. But it should not be named after one of the programs that happen to implement it. This is a similar issue as with the main OSM map, which was named Mapnik after the rendering program which lead to no end of confusion. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi Markus, On 24 November 2012 00:04, mar...@gmx.eu wrote: Hi Brett, *If* this information is intended to be used as an input into replication processes then the sequence number is essential. Osmosis writes a timestamp in the state.txt file, but it only for identifying the right sequence number to begin replication with. All replication processing requires the sequence number. Attempting to use a timestamp is theoretically possible but it's much less efficient and not how it was supposed to work. I think this is true for database based updates, however the sequence number is not really needed for file based updates we're presently talking about: For example, osmupdate downloads all change files, starting with the newest, going back in time until the the change file has been downloaded which is newer than the planet files timestamp. Then all these change files are merged to one big change file which is then applied to the planet file. Yep, that will work for patching planet files. The replication tasks in Osmosis can't operate that way though. The existing --read-replication-interval allows limits to be specified to restrict the amount of changesets downloaded at a time. This allows a local database to catch up in smaller steps if it is a long way behind. Catching up in smaller steps is preferable in this case because it deals better with the odd failure in processing (it's very frustrating to download weeks of changes only to fail near the end and have to start again), and because it prevents transaction sizes from growing unbounded. Having to wait several days for one huge catchup transaction to be processed is far from ideal, it's preferable to catch up in smaller steps. For patching planet files it's less of an issue because you'll almost always want all available changes to be applied, and because the number of files being downloaded will be much less (you'll typically be using daily or hourly files, not minute files) therefore you'll be less likely to run into an intermittent network connectivity problem, and patching a file is extremely unlikely to throw errors unless you run out of disk space or have a system crash. One other thing worth mentioning is that timestamps are not guaranteed to increase for every change file. In practice for anything down to minute files you're unlikely to see any issues, but if the database server clock skews for any reason there's nothing to prevent time running backwards. This could lead to consumers relying on timestamps to miss data. Sequence numbers on the other hand are guaranteed to always increase per change file. This is all a bit academic for patching planet files, but Osmosis doesn't make any assumptions about how short the changeset intervals are, or what is consuming changes at the other end of the pipeline. I could create a new task optimised for patching planet files, and perhaps that's what I (or somebody else if they wish to step in) will need to do if we embed replication information into PBF files, but it will have to remain separate from --read-replication-interval, so there'll be more code to maintain. I'm not opposed to it if it makes users lives easier though. In summary, I'd prefer to keep using sequence numbers if possible because it allows me to re-use more existing replication code, but it wouldn't be impossible to do without them. Osmosis may work differently, and it may need the sequence number to start this kind of file update - I really don't know. But if so, I totally agree, we should make it possible to store sequence numbers in PBF files. Could also be done with the key-val format I suggested... Cool, I don't have any strong opinions on how the information should be stored. I'm happy to leave that in the hands of those more familiar with the PBF format. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Fri, Nov 23, 2012 at 5:03 AM, mar...@gmx.eu wrote: Hi Scott, in brief to the 1-degrees granularity: 1. Do whole processing in 64 bit: This would mean to need much more RAM space when processing ways' coordinates. We should not do this unless this granularity is really required. If you want your program to do all processing with 100 nanodegree granularity instead of 1 nanodegree granularity, then you can use ints throughout. Your software will have the limitation that if a PBF file contains data with 1 nanodegree granularity that there will be data loss, which is probably not a limitation in practice. AFAIK, there are no PBF files with granularity that is not a multiple of 100 or with lat_offset and lon_offset != 0. 2. Your formula: latitude_int = ((lat_offset + granularity*lat)/50+1)/2 Good idea, but again, this would mean one more multiplication, one more division (and two additions, one shift). These operations usually can be done in no time, however that's different if you need to do them a Billion times. I'm curious, have you benchmarked the difference? There are still people out there who have 32 bit machines, I presume they do not have 64 bits hardware multiplication units, hence the processing time will increase. In any case, if the file has a granularity that is a multiple of 100, you can use this specialized formula instead: latitude_int = (lat_offset/50+1)/2 + (granularity/100)*lat // This calculation can be done using 32-bit ints. This can be further specialized for when the granularity is 100 to: latitude_int = (lat_offset/50+1)/2 + lat // This calculation can be done using 32-bit ints. 3. Process sequence: Using the granularity factor, lon/lat of every node in an OSMData fileblock must be read, stored temporarily and transformed later. Thus you have to access every data twice: first to read it, and a second time when you transform its granularity. This might be a flaw in PBF data model... Could we at least change this in that manner that the granularity information comes _before_ the real data? Same applies to lon/lat offset and date granularity. No can do. Google's protobuf format doesn't specifify the order in which the components of a message are serialized (this is to support concatenation of messages without decoding them). Their implementation serializes in tag-order, and I chose larger numbers for the granularity tags than for the primitive block tags. In the end - there always will be a lot of programs which do not need this quasi optional feature granularity and simply will not support it. Metadata... We had the same discussion a year ago. Do you remember? https://wiki.openstreetmap.org/wiki/Talk:PBF_Format#File_Timestamp.3F I'm curious if - and I hope that - we manage to extend the PBF data format this time. :-) The file time stamp I added was meant as an interim solution: I took the already defined optional feature and stored a key-val pair in it, for example timestamp=2011-10-16T15:45:00Z. I think this example shows what we really need: a flexible format for file related meta data. With key-val pairs, everyone could add optional data whenever they are needed in a toolchain. This is the flexibility we are used to have from OSM XML format. I understand the desire for this, but I want to put some thought into it to avoid the situation that created this thread, where the same metadata is stored in different locations, and in different formats. How about two types of metadata storage, one type is standardized in the OSMHeader object directly: message HeaderBlock { optional HeaderBBox bbox = 1; /* Additional tags to aid in parsing this dataset */ repeated string required_features = 4; repeated string optional_features = 5; /* Other ad-hoc metadata */ repeated AdHocMetadata adhoc_metadata = 6; // See below. optional string writingprogram = 16; optional string source = 17; // From the bbox field. optional string timestamp = 18; // from OSM planet header. optional int64 replication_timestamp = 19 // In microseconds since 1970 UTC. optional string copyright = 20; optional string contributors = 21; optional string license = 22; } (new fields taken from the new planet header). Question, since I haven't reviewed OSM replication options, do we want one timetsamp, two timestamps, and should they be fnt64 or string? To combine this flexibility with the advantages of Protobuf format (compressed storage of different data types) we need to allow meta formatted objects - or something like this: message HeaderBlock { ... repeated HeaderMeta = 20; } message HeaderMeta { required string HeaderKey = 1; optional HeaderMetaVarint = 10; optional HeaderMetaString = 12; // see type definitions there: https://wiki.openstreetmap.org/wiki/PBF#Format_example // Only _one_ of the three optional objects should be used; did not know how to define this in Protobuf without
Re: [OSM-dev] Timestamp in PBF files
From: Jochen Topf [mailto:joc...@remote.org] Sent: Thursday, November 22, 2012 8:19 AM Subject: Re: [OSM-dev] Timestamp in PBF files I don't know why there are no redacted nodes, Matt mentioned something that he hasn't implemented that yet. But that would mean we have non- ODbL-clean data in the full history dump. Frankly this gets all a bit too confusing for me. I hope the people who have implemented these things will at some point document them and/or fix those cases. It's also possible that the redacted nodes aren't included in the dump at all. Could you check for version 1 of node 551550983? It's a random redacted node. If it's present with positional information then the file isn't ODbL clean but if it's completely missing then it's a documentation issue. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi Scott, in brief to the 1-degrees granularity: 1. Do whole processing in 64 bit: This would mean to need much more RAM space when processing ways' coordinates. We should not do this unless this granularity is really required. 2. Your formula: latitude_int = ((lat_offset + granularity*lat)/50+1)/2 Good idea, but again, this would mean one more multiplication, one more division (and two additions, one shift). These operations usually can be done in no time, however that's different if you need to do them a Billion times. There are still people out there who have 32 bit machines, I presume they do not have 64 bits hardware multiplication units, hence the processing time will increase. 3. Process sequence: Using the granularity factor, lon/lat of every node in an OSMData fileblock must be read, stored temporarily and transformed later. Thus you have to access every data twice: first to read it, and a second time when you transform its granularity. This might be a flaw in PBF data model... Could we at least change this in that manner that the granularity information comes _before_ the real data? Same applies to lon/lat offset and date granularity. In the end - there always will be a lot of programs which do not need this quasi optional feature granularity and simply will not support it. Metadata... We had the same discussion a year ago. Do you remember? https://wiki.openstreetmap.org/wiki/Talk:PBF_Format#File_Timestamp.3F I'm curious if - and I hope that - we manage to extend the PBF data format this time. :-) The file time stamp I added was meant as an interim solution: I took the already defined optional feature and stored a key-val pair in it, for example timestamp=2011-10-16T15:45:00Z. I think this example shows what we really need: a flexible format for file related meta data. With key-val pairs, everyone could add optional data whenever they are needed in a toolchain. This is the flexibility we are used to have from OSM XML format. To combine this flexibility with the advantages of Protobuf format (compressed storage of different data types) we need to allow meta formatted objects - or something like this: message HeaderBlock { ... repeated HeaderMeta = 20; } message HeaderMeta { required string HeaderKey = 1; optional HeaderMetaVarint = 10; optional HeaderMetaString = 12; // see type definitions there: https://wiki.openstreetmap.org/wiki/PBF#Format_example // Only _one_ of the three optional objects should be used; did not know how to define this in Protobuf without an additional hierarchy layer. } What do you think about this suggestion? Markus ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
message HeaderMeta { required string HeaderKey = 1; optional HeaderMetaVarint = 10; optional HeaderMetaString = 12; // see type definitions there: https://wiki.openstreetmap.org/wiki/PBF#Format_example // Only _one_ of the three optional objects should be used; did not know how to define this in Protobuf without an additional hierarchy layer. } Sorry, I meant Only _one_ of the two - not tree. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On 21 November 2012 19:43, Frederik Ramm frede...@remote.org wrote: Hi, snip To be self-contained, it should be sufficient to include the baseURL from configuration.txt, no? So maybe: optional string writingprogram = 16; optional string source = 17; optional sint64 timestamp = 18; optional sint64 replication_timestamp = 19; optional string replication_url = 20; I don't know if the sequenceNumber from state.txt adds any value, if it does then one could throw that in as well. I've been explicitly cc'd on the original message so I should put in an appearance ;-) *If* this information is intended to be used as an input into replication processes then the sequence number is essential. Osmosis writes a timestamp in the state.txt file, but it only for identifying the right sequence number to begin replication with. All replication processing requires the sequence number. Attempting to use a timestamp is theoretically possible but it's much less efficient and not how it was supposed to work. However, utilising this new sequence number in Osmosis will require some significant changes. The current task that figures out what changes to download (ie. --read-replication-interval) is totally independent of the task that applies changes to a snapshot (ie. --apply-change). The simplest solution would be to write an uber task that is specifically aimed at patching planet files, but it will be an all-in-one task that can't be combined with others. It *may* be possible to modify pipeline initialisation to allow all tasks to synchronise replication numbers before beginning processing, but that will be a lot more complicated. Updating the timestamp and sequence number after processing will also require some changes because it impacts a number of tasks. All tasks will have to propagate the field (shouldn't be too difficult), but tasks such as --apply-change will need to be smart about which input source they use as the source of truth for the sequence number. It's all possible, but not a trivial change. Perhaps this is a non-issue if everybody uses osmupdate these days anyway :-) As for the PBF format itself, I don't have any opinions. I'm more than happy for those who are more familiar with it to come up with a solution. I'll do my best to accommodate it. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Wed, Nov 21, 2012 at 05:16:12PM -0600, Scott Crosby wrote: On Wed, Nov 21, 2012 at 3:46 AM, Jochen Topf joc...@remote.org wrote: On Tue, Nov 20, 2012 at 09:17:59PM -0600, Scott Crosby wrote: Not quite. The granularity of timestamps can go down to the milliseconds. https://github.com/DennisOSRM/OSM-binary/blob/master/src/osmformat.proto#L96 Ugh. Yes. That was always somewhat of a problem in the protocol IMHO. Nobody needs more granularity than seconds because the main database doesn't have it. Similar for the latitude/longitude granularity. Nobody uses that. And it just makes all the code reading PBF files a bit more complex and a bit slower. Today the database lacks those features, but the future can be different. The trivial complexity of that feature in readers allows many possible future features, without a breaking format change. The ones I had in mind were: Lower granularity makes it easy to create lower-precision excerpts that are smaller to send and easier to store. Allow OSM tooling to handle contour lines, or other grid-specified data, where making the granularity size matching the grid size can lead to vastly improved compression. Support future higher-precision data, e.g., generated from GPS block 3 satellites. Millisecond timestamps are much easier to use as unique changeset ID's than second-granularity timestamps. On the other hand it is rather unlikely that OSM will make those changes to its database anytime soon or that PBF is used for non-OSM data like contour lines (because there are better formats and tools for that). Having functionality that nobody actually uses means it is probably not implemented universally and properly (Markus already mentioned he doesn't implement them). In the best case software that doesn't implement it at least checks for it and complains, in the worse case there is some buggy code that never gets checked because nobody ever uses it so that if and when we actually use those features we can't rely on the software anyway. And we have changed the PBF format before and are in the process of changing it again, so it is not such a big deal to add support for these things later if they are actually needed. Oh well, this is rather academic, because I am not proposing we change the format now. I'd only do that if we have a larger overhaul of the format. The runtime cost of this is a couple of multiplications that loop-invariant code motion can remove; about 30 nanoseconds for each 8000 entity block, and is much much cheaper than the branch prediction failures of VarInt decoding. I use ints internally in Osmium for the lon/lat as does PBF. But there is this conversion in there and depending on the granularity factor I am not sure I can actually do that using just integers. I don't want to use doubles though. So this might break on some granularity factors, I don't know and I never tested it. I actually use a int to double conversion before the factor is applied and later convert back to int. And in the usual case for OSM I don't do this double conversion at all, I just use the int as is because it has the right granularity factor anyway. This extra check (one if that can be perfectly branch predicted because it never changes) makes the reading of the whole PBF file about 1% faster! double/int-conversions are slow. So even this seemingly small thing mean I spent too much time thinking about it and writing code I am not sure is perfectly right. :-( Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
As for the timestamp =18 field, Dennis, what was your intended use of this field? Marqqs, what is the intended use of your timestamp optional_features field? By this, I mean, what semantics are you attaching to these timestamps. I think its perfectly reasonable to have several timestamp fields, perhaps: The timestamp the file was generated. The state needed to resume replication of an extract/planet (which contains an internal timestamp)? The timestamp of the when the file was extracted/excerpted? The main purpose was to store the replication state. --Dennis If you two could give me a better idea of what your timestamps are used for, I could advise on how we can try to integrate them into one or more standard timestamp fields. And after that, we can then figure out how we might want to assign timestamps to field names/ids --- keeping in mind prior uses of those field names and numbers. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hello Scott, Thanks for your reply! I think what we need is a replacement for the timestamp which has been provided by .osm.bz2 files for years now. For example: $ wget -q planet.openstreetmap.org/planet/planet-latest.osm.bz2 -O - | bunzip2 | head -4 ?xml version=1.0 encoding=UTF-8? osm version=0.6 generator=OpenStreetMap planet.c copyright=OpenStreetMap and contributors attribution=http://www.openstreetmap.org/copyright/; license=http://opendatacommons.org/licenses/odbl/1.0/; timestamp=2012-11-14T01:10:07Z bound box=-90,-180,90,180 origin=http://www.openstreetmap.org/api/0.6; / node id=3 lat=50.1240327 lon=14.4524155 timestamp=2012-07-24T12:48:39Z version=7 changeset=12465837 user=OSMF Redaction Account uid=722137/ As you can see, there is a timestamp=2012-11-14T01:10:07Z which states the replication time - as far as I know. The PBF formatted planet file form the same day lacks this information. Thus, people who need this timestamp cannot use the PBF planet but are forced to download the old bzipped XML planet file (or to look for a suitable state.txt). My goal - and presumably Frederik's as well - is to eliminate this disadvantage of PBF formatted files. I have no objections to code this timestamp as signed Varint with id 32. This should result in two bytes (0x80 0x02) when PBF-coded. May I add this to the OSM Wiki page? Markus ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Wed, Nov 21, 2012 at 07:00:32PM +0100, Frederik Ramm wrote: On 11/21/12 18:46, Jochen Topf wrote: On Tue, Nov 20, 2012 at 09:17:59PM -0600, Scott Crosby wrote: How many nodes in the planet lack a latitude or longitude? Using a MAXINT encoding will cost about 8 bytes for each missing latitude or longitude. It's possible to reduce this to 2-3 bytes, but the format gets uglier/hackier. IMHO, probably not worth that cost. I just counted those cases. In the history dump from October 2012 there are 2344 nodes without coordinates. Hardly worth thinking about... That sounds implausibly low. Given that 1. every deleted node should be in that file without coordinates 2. we're currently at node id 2.03 billion, 3. there are 1.66 billion visible nodes in the database we should have something like 370 million deleted nodes. Hm, we probably have to remove from that number those nodes that were deleted in ancient times where we've meanwhile dropped the history, and maybe some from the first TIGER import where we manually removed them from the database, but still - at least every node deleted in the past couple of years *should* show up with visible=false in the full history dump, and any node with visible=false *should* not have coordinates. Either there's an error in my thinking, or in your count, or in the script that does the history export ;) I checked this in some more detail. The cases I found were cases from years ago (last is from May 2008). Apparently the OSM server did not check coordinates for validity back then. So all these nodes were in the database and lat and/or lon happened to have the MAXINT value I use to signify undefined coordinates. Of course they should never have had those values, but they did. So these cases are not the redacted node coordinates. I don't know why there are no redacted nodes, Matt mentioned something that he hasn't implemented that yet. But that would mean we have non-ODbL-clean data in the full history dump. Frankly this gets all a bit too confusing for me. I hope the people who have implemented these things will at some point document them and/or fix those cases. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
And we have changed the PBF format before and are in the process of changing it again, so it is not such a big deal to add support for these things later if they are actually needed. One of my goals was to reduce breaking changes, or making files that a program thinks it can read, but can't actually read. (e.g., history dumps) I use ints internally in Osmium for the lon/lat as does PBF. But there is this conversion in there and depending on the granularity factor I am not sure I can actually do that using just integers. I don't want to use doubles though. All units in PBF are in nano-degrees, so you can always use longs to do your calculation, as long as you do the right casts so that the arithmetic is done in longs instead of possibly overflowing ints. So this might break on some granularity factors, I don't know and I never tested it. I actually use a int to double conversion before the factor is applied and later convert back to int. And in the usual case for OSM I don't do this double conversion at all, I just use the int as is because it has the right granularity factor anyway. This extra check (one if that can be perfectly branch predicted because it never changes) makes the reading of the whole PBF file about 1% faster! double/int-conversions are slow. So even this seemingly small thing mean I spent too much time thinking about it and writing code I am not sure is perfectly right. :-( Reading a PBF file into code that uses 32-bit integers to represent latitudes and longitudes is probably safe on all current PBF files, but is potentially lossy operation; a latitude in in a 32-bit integer is only precise to 100 nanodegrees. if the PBF file happens to have measurements precise to 1 nanodegree, you must lose 2 digits of precision. Here is an alternate formula that only requires integer arithmatic that will go from a PBF file to a 32-bit integer and is correct for any granulatity. long lat = // Latitude encoded in the pbf. type must be a 64-bit int to avoid overflow in calculation. latitude_int = ((lat_offset + granularity*lat)/50+1)/2 // This calculation must be done with 64-bit longs. This formula will be correct for any granularity and lat_offset . The reason for the $/50+1)/2$ instead of $/100$ is to get better round-off behavior; it'll round-nearest instead of round-to-zero. http://en.wikipedia.org/wiki/Rounding If the granularity is 100, or any multiple of 100 (e.g., 200, 1000, 1, 700), you can simplify the above formula into: int lat = // This can be an 32-bit int without overflow. latitude_int = (lat_offset/50+1)/2 + (granularity/100)*lat // This calculation can be done using 32-bit ints. I don't want to put these formulas as part of the spec as they are the least-lossy approximations of the lossless formulas in the specification. Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 11/21/2012 04:27 AM, Scott Crosby wrote: Idea, why not put the entire state.txt file into the OSMHeader block? I tend to view the structure of state.txt as an Osmosis implementation detail and I'm not sure if it would be a good idea to require that PBF parsers not only decipher the PBF, but also have knowledge about how Osmosis builds its state.txt files. One thing I don't like about it is that the state.txt file is not self-contained: #Tue Nov 20 19:02:18 UTC 2012 sequenceNumber=1668 timestamp=2012-11-20T19\:00\:00Z It should have a planet URI (or a planet URI and a list of mirrors) of what planet it corresponds to. That way a user merely needs to say 'update planet' and everything else can be automated. To be self-contained, it should be sufficient to include the baseURL from configuration.txt, no? So maybe: optional string writingprogram = 16; optional string source = 17; optional sint64 timestamp = 18; optional sint64 replication_timestamp = 19; optional string replication_url = 20; I don't know if the sequenceNumber from state.txt adds any value, if it does then one could throw that in as well. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
-Original Message- From: Frederik Ramm [mailto:frede...@remote.org] Sent: 21 November 2012 08:44 To: Scott Crosby Cc: dev@openstreetmap.org Subject: Re: [OSM-dev] Timestamp in PBF files [Snip] To be self-contained, it should be sufficient to include the baseURL from configuration.txt, no? So maybe: optional string writingprogram = 16; optional string source = 17; optional sint64 timestamp = 18; optional sint64 replication_timestamp = 19; optional string replication_url = 20; I don't know if the sequenceNumber from state.txt adds any value, if it does then one could throw that in as well. I think including the sequenceNumber will be useful for making it easy to determine where to continue replication from once the PBF file is processed. Just to clarify that the replication_url will need to include the minute / hour / day as appropriate for the sequenceNumber to apply to the appropriate sequence, i.e. from the configuration.txt like you say. Gregory ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Nov 20, 2012 at 09:17:59PM -0600, Scott Crosby wrote: Not quite. The granularity of timestamps can go down to the milliseconds. https://github.com/DennisOSRM/OSM-binary/blob/master/src/osmformat.proto#L96 Ugh. Yes. That was always somewhat of a problem in the protocol IMHO. Nobody needs more granularity than seconds because the main database doesn't have it. Similar for the latitude/longitude granularity. Nobody uses that. And it just makes all the code reading PBF files a bit more complex and a bit slower. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Nov 20, 2012 at 08:40:39PM +0100, Frederik Ramm wrote: On 20.11.2012 20:12, Jochen Topf wrote: I guess the timestamp is somehow supposed to say which state of the OSM database this file represents. Yes. Bascially whatever was in Osmosis' state.txt file at the time this file was created. Thats not a definition. I create PBF files all the time without a state.txt file around. How is it supposed to work in history files? I think it would make sense to have a a comparable timestamp in history files. Currently there's no software that would be able to patch history files with freshly downloaded diffs so the discussion is rather academic though. Sure. Osmium can do that. Do we need two timestamps to define a range for history files? I'd suggest to wait until someone has an application that needs this. I'm a bit wary of throwing this discussion wide open because before too long we'll have all sorts of people suggesting helpful optional enhancements to the PBF format (while we're at it, can we maybe do X) and then nothing gets done again. All *I* want is one extra timestamp, and I would start using it tomorrow, it's not academic, there's software that would process it, there's a clear benefit to users. I'd prefer to use a standard but in the absence of an existing standard I'll just make something up and use that. (But we've seen how well that works - I did make something up for testing and Marqqs expected something else.) And what I am saying is that we should think this through so that we don't have the same problem again tomorrow. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 11/21/2012 10:42 AM, Jochen Topf wrote: Yes. Bascially whatever was in Osmosis' state.txt file at the time this file was created. Thats not a definition. I create PBF files all the time without a state.txt file around. Then copy the timestamp from the input PBF, or if you don't have an input PBF or that doesn't have a timestamp, leave it out. The timestamp I'm after is not some generic timestamp that you can make up, it must always refer to a replication process. No replication process - no timestamp. Therefore it is probably a good idea to make that clear in the field name - not timestamp but replication_timestamp or so. And what I am saying is that we should think this through so that we don't have the same problem again tomorrow. Then please think it through quickly and post the results ;) Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hello, How many nodes in the planet lack a latitude or longitude? Using a MAXINT encoding will cost about 8 bytes for each missing latitude or longitude. It's possible to reduce this to 2-3 bytes, but the format gets uglier/hackier. IMHO, probably not worth that cost. As far as I understood, only nodes with the attribute action=delete do not have (resp. do not need) lon/lat. On the other hand, it does not hurt to give them false lon/lat values. This is what osmconvert does when you apply the --fake-lonlat option. In PBF, lon/lat are delta coded, aren't they? Thus it would be best to write a delta of 0, i.e., to take the logical value of the previous node. A few steps later in the toolchain lon/lat values of action=delete objects will be deleted anyway (together with their objects). It should have a planet URI (or a planet URI and a list of mirrors) of what planet it corresponds to. That way a user merely needs to say 'update planet' and everything else can be automated. Please don't. These data aren't necessary. Same applies to sequence numbers. Since a year or so planet files can be updated by a single update command. This command first determines the age of the old file, then it downloads all needed planet change files, starting with the newest and ending with that change file which has been published right after the file timestamp of the old planet file. Syntax: https://wiki.openstreetmap.org/wiki/Osmupdate#Updating_OSM_Files Since the state.txt files from osm planet's server have to be parsed in the process anyway, there is no need to include them into PBF. No status, but if anyone wants my opinion, when authoring the format, I expected us to add metadata to planets, and expected it to be put into OSMHeader as in the OSRM clone you linked to above. I would vote to deprecate the use of the ISO timestring encoded into the optional_features array, but continue to write to it to avoid breaking old installs of Marqqs's tools. OK, this seems to be consensual: PBF id 18 in the header block for a signed int UNIX timestamp value. I will implement the appropriate read function in osmconvert at once. For reason of compatibility osmconvert will _write_ both file timestamp representations, the UNIX based _and_ the string based. There may be some tools which depend on the format we have used for a year now. Ugh. Yes. That was always somewhat of a problem in the protocol IMHO. Nobody needs more granularity than seconds because the main database doesn't have it. Similar for the latitude/longitude granularity. Nobody uses that. And it just makes all the code reading PBF files a bit more complex and a bit slower. I totally agree. osmconvert even cannot read any PBF files which do not use standard granularity. It rejects these files with an error message. No one has ever complained! Thus I guess nobody really needs this option. Besides, the format definition we have is kind of unfortunate: the granularity values may come _after_ the lon/lat values they refer to. This makes it necessary to process every data in a data block twice: first parse it and - in a second run - apply the granularity factor. And what I am saying is that we should think this through so that we don't have the same problem again tomorrow. Then please think it through quickly and post the results ;) Done. Any objections? ;-) Markus ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 11/21/2012 11:50 AM, mar...@gmx.eu wrote: It should have a planet URI (or a planet URI and a list of mirrors) of what planet it corresponds to. That way a user merely needs to say 'update planet' and everything else can be automated. Please don't. These data aren't necessary. Same applies to sequence numbers. Since a year or so planet files can be updated by a single update command. This command first determines the age of the old file, then it downloads all needed planet change files, ... by making hard-coded assumptions about where to get them from, and that's precisely what Scott meant. After all, there might be other projects using the OSM toolchain (e.g. fosm.org) and they publish their own diffs and might publish their own PBF files, and if you use their PBF file and try to update that from some openstreetmap.org URL that won't work. So if you really want to be able update the file without relying on some out-of-band knowledge (I downloaded this file from Geofabrik and I happen to know that they use openstreetmap.org as their data source), then you would need the URI in the file. The same if someone were to operate an OSM mirror and publish their own diffs, and you might choose to synchronize with them rather than with openstreetmap.org - even here, what the mirror publishes in a diff with the time stamp X is not necessarily identical with what OSM publishes in a diff with the same time stamp, and knowledge about where to update the file from would be essential. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 11/21/2012 11:50 AM, mar...@gmx.eu wrote: OK, this seems to be consensual: PBF id 18 in the header block for a signed int UNIX timestamp value. In both his messages, Scott had suggested PBF id 18 for a signed int epoch value of the file creation, not for a signed int epoch value of the replication state. It would probably be premature to call this a consensus for a replication state timestamp at PBF id 18. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Wed, Nov 21, 2012 at 11:50:38AM +0100, mar...@gmx.eu wrote: How many nodes in the planet lack a latitude or longitude? Using a MAXINT encoding will cost about 8 bytes for each missing latitude or longitude. It's possible to reduce this to 2-3 bytes, but the format gets uglier/hackier. IMHO, probably not worth that cost. As far as I understood, only nodes with the attribute action=delete do not have (resp. do not need) lon/lat. On the other hand, it does not hurt to give them false lon/lat values. This is what osmconvert does when you apply the --fake-lonlat option. In PBF, lon/lat are delta coded, aren't they? Thus it would be best to write a delta of 0, i.e., to take the logical value of the previous node. A few steps later in the toolchain lon/lat values of action=delete objects will be deleted anyway (together with their objects). You only have missing lon/lat in OSM files with history. And presumably you use them because you want to know when what objects were created and deleted and so on. So you can not just ignore deleted objects. And you want to know whether an object had no lon/lat as compared to the lon/lat of the object that happened to be right before it in the file. So your solution doesn't work. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Frederik, Jochen, sorry, you both are right, I really was too fast. But now? Please, let's risk one small step and standardize the file timestamp (replication time), whatever the protobuf ID will be. If not 18, then 19 or something else. Protobuf format is flexible enough to be extended again at any time. After this, we can continue caring about other file related meta data. Furthermore, we can think about introducing a new (or extended) dense node format for history files. Step by step... Markus ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Nov 20, 2012 at 09:17:59PM -0600, Scott Crosby wrote: How many nodes in the planet lack a latitude or longitude? Using a MAXINT encoding will cost about 8 bytes for each missing latitude or longitude. It's possible to reduce this to 2-3 bytes, but the format gets uglier/hackier. IMHO, probably not worth that cost. I just counted those cases. In the history dump from October 2012 there are 2344 nodes without coordinates. Hardly worth thinking about... Maybe we should just remove them alltogether? Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 11/21/12 18:46, Jochen Topf wrote: On Tue, Nov 20, 2012 at 09:17:59PM -0600, Scott Crosby wrote: How many nodes in the planet lack a latitude or longitude? Using a MAXINT encoding will cost about 8 bytes for each missing latitude or longitude. It's possible to reduce this to 2-3 bytes, but the format gets uglier/hackier. IMHO, probably not worth that cost. I just counted those cases. In the history dump from October 2012 there are 2344 nodes without coordinates. Hardly worth thinking about... That sounds implausibly low. Given that 1. every deleted node should be in that file without coordinates 2. we're currently at node id 2.03 billion, 3. there are 1.66 billion visible nodes in the database we should have something like 370 million deleted nodes. Hm, we probably have to remove from that number those nodes that were deleted in ancient times where we've meanwhile dropped the history, and maybe some from the first TIGER import where we manually removed them from the database, but still - at least every node deleted in the past couple of years *should* show up with visible=false in the full history dump, and any node with visible=false *should* not have coordinates. Either there's an error in my thinking, or in your count, or in the script that does the history export ;) Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Wed, Nov 21, 2012 at 3:46 AM, Jochen Topf joc...@remote.org wrote: On Tue, Nov 20, 2012 at 09:17:59PM -0600, Scott Crosby wrote: Not quite. The granularity of timestamps can go down to the milliseconds. https://github.com/DennisOSRM/OSM-binary/blob/master/src/osmformat.proto#L96 Ugh. Yes. That was always somewhat of a problem in the protocol IMHO. Nobody needs more granularity than seconds because the main database doesn't have it. Similar for the latitude/longitude granularity. Nobody uses that. And it just makes all the code reading PBF files a bit more complex and a bit slower. Today the database lacks those features, but the future can be different. The trivial complexity of that feature in readers allows many possible future features, without a breaking format change. The ones I had in mind were: Lower granularity makes it easy to create lower-precision excerpts that are smaller to send and easier to store. Allow OSM tooling to handle contour lines, or other grid-specified data, where making the granularity size matching the grid size can lead to vastly improved compression. Support future higher-precision data, e.g., generated from GPS block 3 satellites. Millisecond timestamps are much easier to use as unique changeset ID's than second-granularity timestamps. The runtime cost of this is a couple of multiplications that loop-invariant code motion can remove; about 30 nanoseconds for each 8000 entity block, and is much much cheaper than the branch prediction failures of VarInt decoding. Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Wed, Nov 21, 2012 at 5:26 AM, Frederik Ramm frede...@remote.org wrote: Hi, On 11/21/2012 11:50 AM, mar...@gmx.eu wrote: OK, this seems to be consensual: PBF id 18 in the header block for a signed int UNIX timestamp value. In both his messages, Scott had suggested PBF id 18 for a signed int epoch value of the file creation, not for a signed int epoch value of the replication state. It would probably be premature to call this a consensus for a replication state timestamp at PBF id 18. I think for Frederik's immediate needs, we should add a have a field called osmosis_replication_timestamp or osmosis_replication_state = 32, which contains a submessage containing a replication timestamp and other replication data that he feels is appropriate. As for the timestamp =18 field, Dennis, what was your intended use of this field? Marqqs, what is the intended use of your timestamp optional_features field? By this, I mean, what semantics are you attaching to these timestamps. I think its perfectly reasonable to have several timestamp fields, perhaps: The timestamp the file was generated. The state needed to resume replication of an extract/planet (which contains an internal timestamp)? The timestamp of the when the file was extracted/excerpted? If you two could give me a better idea of what your timestamps are used for, I could advise on how we can try to integrate them into one or more standard timestamp fields. And after that, we can then figure out how we might want to assign timestamps to field names/ids --- keeping in mind prior uses of those field names and numbers. Thoughts, Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Timestamp in PBF files
Hi, (message to dev list but explicitly Cc'ing Brett and Scott because I don't know if they follow dev) about a year ago, Marqqs tried to have a discussion on how to add timestamps to PBF files and hardly anyone was interested. I've had a couple people ask me whether I could somehow add timestamp information to the PBF files that I produce for download.geofabrik.de so I'd be interested in solving this somehow. I've found an experimental fork of OSM-binary by Dennis, here https://github.com/DennisOSRM/OSM-binary and used that to patch Osmosis (timestamp branch on https://github.com/woodpeck/osmium) and Peter's osm history splitter (timestamp branch on https://github.com/woodpeck/osm-history-splitter) accordingly; this now allows me to produce OSM extracts with a time stamp - proto definition is here https://github.com/DennisOSRM/OSM-binary/blob/master/src/osmformat.proto#L65 I don't have Osmosis support yet - I simply parse the state.txt file that Osmosis generates and use a command line switch of osm-history-splitter to inject that timestamp into the files that are created. The files thus generated work ok with standard PBF processing tools, they simply ignore the time stamp. Marqqs' tools however expect the timestamp to be an ISO timestamp string, not a Unix epoch integer (see http://wiki.openstreetmap.org/wiki/Talk:PBF_Format#File_Timestamp.3F) so they are unhappy with it. I really don't mind *how* it's done but I would really love to have one agreed way to place a timestamp in a PBF instead of everyone rolling their own. What's the current status of this discussion? Is there already an approved way to deal with this? Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Frederik Ramm writes: I really don't mind *how* it's done but I would really love to have one agreed way to place a timestamp in a PBF instead of everyone rolling their own. I would prefer epoch timestamps. That's a widely accepted way of storing time information without the need to worry about time zones and such. While we change the header: Could we also include a field to indicate a full history planet? After the redaction period the lat/lon is only a required field for non-redacted elements. Is it possible to express this in protobuf? If not, it would be fine to have at least a defined value for undefined we could document. If I remember correctly Jochen suggested to use MAXINT for this. Stephan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Nov 20, 2012 at 01:51:02PM +0100, Frederik Ramm wrote: about a year ago, Marqqs tried to have a discussion on how to add timestamps to PBF files and hardly anyone was interested. Before we get into the details of how this timestamp is implemented in the PBF format, maybe somebody can define what this timestamp is actually timestamping? Is it the time the file was created? The last changed object in the file? The time the database extract was created? Something else? I guess the timestamp is somehow supposed to say which state of the OSM database this file represents. How is it supposed to work in history files? Do we need two timestamps to define a range for history files? Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hello Jochen, very good question. From my point of view the file timestamp should be the in-file representation of the externally maintained state.txt timestamp, as in http://planet.openstreetmap.org/replication/hour/000/001/668.state.txt for example. This would it make very easy to update .osm.pbf files on a file basis. You would not need to care about externally maintained timestamp files. You would just say update this file and the update process could be done automatically. Regards Markus ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hi, On 20.11.2012 20:12, Jochen Topf wrote: I guess the timestamp is somehow supposed to say which state of the OSM database this file represents. Yes. Bascially whatever was in Osmosis' state.txt file at the time this file was created. How is it supposed to work in history files? I think it would make sense to have a a comparable timestamp in history files. Currently there's no software that would be able to patch history files with freshly downloaded diffs so the discussion is rather academic though. Do we need two timestamps to define a range for history files? I'd suggest to wait until someone has an application that needs this. I'm a bit wary of throwing this discussion wide open because before too long we'll have all sorts of people suggesting helpful optional enhancements to the PBF format (while we're at it, can we maybe do X) and then nothing gets done again. All *I* want is one extra timestamp, and I would start using it tomorrow, it's not academic, there's software that would process it, there's a clear benefit to users. I'd prefer to use a standard but in the absence of an existing standard I'll just make something up and use that. (But we've seen how well that works - I did make something up for testing and Marqqs expected something else.) Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hello, Yes. Bascially whatever was in Osmosis' state.txt file at the time this file was created. This is the use-case I had in mind when experimenting with time-stamps in PBF. Updating self-contained PBF files through Osmosis is a major advantage to using state.txt files. I, for one, plan to support such a time-stamp in OSRM from day one (or two). --Dennis ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Hello, did you know that PBF file timestamp has anniversary these days? :-) https://wiki.openstreetmap.org/w/index.php?title=Talk:PBF_Formatdiff=708490oldid=705430 After some thought... would it hurt if there were _two_ file timestamps in a PBF file? One string-formatted according to the definition from a year ago (see OSM Wiki on PBF), and a second one in UNIX time format. osmconvert would then write _both_ of these timestamps - for reasons of compatibility. Thus, Frederik's new PBF file timestamp could be processed even from now on. As soon as the decision has been made, one of both file timestamp procedures could be removed from the code. Markus Original-Nachricht Datum: Tue, 20 Nov 2012 21:50:26 +0100 Von: Dennis Luxen dennis.lu...@gmail.com An: Frederik Ramm frede...@remote.org CC: dev@openstreetmap.org dev@openstreetmap.org, Scott Crosby sc...@sacrosby.com Betreff: Re: [OSM-dev] Timestamp in PBF files Hello, Yes. Bascially whatever was in Osmosis' state.txt file at the time this file was created. This is the use-case I had in mind when experimenting with time-stamps in PBF. Updating self-contained PBF files through Osmosis is a major advantage to using state.txt files. I, for one, plan to support such a time-stamp in OSRM from day one (or two). --Dennis ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On 20.11.2012 20:28, mar...@gmx.eu wrote: Stephan, what did you mean we would need this undefined for? I should not have mixed topics. In case you process a full history pbf then it happens that nodes which were redacted are stored with MAXINT for lat/lon. This is because lat/lon are required fields. Jochen mentioned this a few mails back. A software reading history PBF might want to handle these elements in a special way... Stephan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Nov 20, 2012 at 1:09 PM, Jochen Topf joc...@remote.org wrote: On Tue, Nov 20, 2012 at 06:57:50PM +0100, Stephan Knauss wrote: Frederik Ramm writes: I really don't mind *how* it's done but I would really love to have one agreed way to place a timestamp in a PBF instead of everyone rolling their own. I would prefer epoch timestamps. That's a widely accepted way of storing time information without the need to worry about time zones and such. The other timestamps in PBF files (at all the objects) use 64 bit integers with seconds since epoch. So it would make sense to use the same format. Not quite. The granularity of timestamps can go down to the milliseconds. https://github.com/DennisOSRM/OSM-binary/blob/master/src/osmformat.proto#L96 You can have optional fields in protobuf, but unfortunately this doesn't help us in this case. There are two ways nodes can be stored in PBF files: as a series of Node objects or as DenseNode objects. Node objects have required fields lat and lon. We could change this to be optional. There would be a has_lat() or has_lon() call to check for this. Unfortunately in most cases the more space efficient DenseNode objects are used. In this case the latitude and longitude of all nodes of a block are stored in a special delta encoding. This doesn't allow for optional fields. As far as I can see we could either add a boolean for each node in a block that defines whether the coordinate field is valid or use a special value for an invalid coordinate. Correct. There is no way in the current DenseNodes format to encode 'no value' for a latitude or longitude. Changing the message buffer to include, (say) a boolean array for the hasLatitude()/hasLongitude() would be a breaking format change, and would add about 18-40 bytes to each block of 8000 nodes. How many nodes in the planet lack a latitude or longitude? Using a MAXINT encoding will cost about 8 bytes for each missing latitude or longitude. It's possible to reduce this to 2-3 bytes, but the format gets uglier/hackier. IMHO, probably not worth that cost. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
Idea, why not put the entire state.txt file into the OSMHeader block? /* Contains the file header. */ message HeaderBlock { optional HeaderBBox bbox = 1; /* Additional tags to aid in parsing this dataset */ repeated string required_features = 4; repeated string optional_features = 5; optional string writingprogram = 16; optional string source = 17; // From the bbox field. optional sint64 timestamp = 18; // Unix-Time encoded into varint. optional string osmosis_update_state = 19; // Encoding of the state.txt file. } One thing I don't like about it is that the state.txt file is not self-contained: #Tue Nov 20 19:02:18 UTC 2012 sequenceNumber=1668 timestamp=2012-11-20T19\:00\:00Z It should have a planet URI (or a planet URI and a list of mirrors) of what planet it corresponds to. That way a user merely needs to say 'update planet' and everything else can be automated. Scott On Tue, Nov 20, 2012 at 2:50 PM, Dennis Luxen dennis.lu...@gmail.comwrote: Hello, Yes. Bascially whatever was in Osmosis' state.txt file at the time this file was created. This is the use-case I had in mind when experimenting with time-stamps in PBF. Updating self-contained PBF files through Osmosis is a major advantage to using state.txt files. I, for one, plan to support such a time-stamp in OSRM from day one (or two). --Dennis ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Timestamp in PBF files
On Tue, Nov 20, 2012 at 6:51 AM, Frederik Ramm frede...@remote.org wrote: Hi, (message to dev list but explicitly Cc'ing Brett and Scott because I don't know if they follow dev) about a year ago, Marqqs tried to have a discussion on how to add timestamps to PBF files and hardly anyone was interested. I've had a couple people ask me whether I could somehow add timestamp information to the PBF files that I produce for download.geofabrik.de so I'd be interested in solving this somehow. I really don't mind *how* it's done but I would really love to have one agreed way to place a timestamp in a PBF instead of everyone rolling their own. What's the current status of this discussion? Is there already an approved way to deal with this? No status, but if anyone wants my opinion, when authoring the format, I expected us to add metadata to planets, and expected it to be put into OSMHeader as in the OSRM clone you linked to above. I would vote to deprecate the use of the ISO timestring encoded into the optional_features array, but continue to write to it to avoid breaking old installs of Marqqs's tools. I also think that we have more than one notion of timestamp. How does this sound: message HeaderBlock { optional HeaderBBox bbox = 1; /* Additional tags to aid in parsing this dataset */ repeated string required_features = 4; repeated string optional_features = 5; optional string writingprogram = 16; optional string source = 17; // From the bbox field optional sint64 timestamp = 18; // Unix-Time encoded into varint of when the file was generated. optional sint64 mirror_timestamp = 64; // Unit-Time timestamp of the last update the source. (used for mirroring) } What about adding other metadata or adding in a nanosecond timestamp while we're at it? Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev