Re: [OSM-dev] Incomplete diffs?
On 21 November 2011 06:06, Peter Körner wrote: > Am 05.11.2011 22:33, schrieb mar...@gmx.eu: > > > Hi Erik, > > > > thanks for your help. The missing node seems to be available via > minutely and hourly diff files but NOT via the daily file. > > > > Meanwhile I found an explanation in the Wiki: > > http://wiki.openstreetmap.org/wiki/Planet.osm/diffs > I've made a number of updates to the table on that page to (hopefully) better explain the various extracts available. I've also just created a day-replicate job to complement the existing minute and hour jobs. Now is probably a good time to consider the daily diffs deprecated in favour of the day-replicate diffs. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Incomplete diffs?
Am 05.11.2011 22:33, schrieb mar...@gmx.eu: > Hi Erik, > > thanks for your help. The missing node seems to be available via minutely and hourly diff files but NOT via the daily file. > > Meanwhile I found an explanation in the Wiki: > http://wiki.openstreetmap.org/wiki/Planet.osm/diffs > > ... > > If the problem is already known, why is the file not created with overlapping time borders, let's say from 00:00 to 24:01? This would be better than loosing data. This is what the replication idffs are used for. They are based on database transaction numbers and don't miss changes: http://planet.osm.org/hour-replicate/ http://planet.osm.org/minute-replicate/ Peter ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Incomplete diffs?
Hello Frederik, ok, it really must have been late. :-) Thank you for the explanation, sounds perfect. I wouldn't call it a bug at all because it may be necessary to keep such delete requests: Let's say you found an out-of-date .osm file and want to update it. You guess, the file is from last Saturday 12:00 but you're not sure. Therefore you cumulate replication diffs for the time range between Saturday 10:00 (2 hours earlier) and today. Let's further assume that a node had been created at 10:15 and was deleted at 11:45. This node would be excluded from an "ideal" simplified diff. If the old .osm file in question in fact has the state of Saturday 11:00, it would know about the created node but never become aware of its deletion. In the end: I'm happy about this "bug". :-) However this doesn't make it easier to determine how much data you lose in taking the normal diffs instead of the replicated ones. But eventually I will get the answer... somehow. Markus Original-Nachricht > Datum: Mon, 07 Nov 2011 09:06:32 +0100 > Von: Frederik Ramm > An: mar...@gmx.eu > CC: dev@openstreetmap.org > Betreff: Re: [OSM-dev] Incomplete diffs? > Hi, > > On 11/07/2011 02:24 AM, mar...@gmx.eu wrote: > > # normal diff > > $ zcat 2003-2004.osc.gz |grep -c "timestamp=\"2011-11-03T12:" > > 58968 > > > > # replication diff > > $ cat 1103-1104.osc |grep -c "timestamp=\"2011-11-03T12:" > > 59068 > > > > And yes, I thought on cumulating the version in the second file before I > started counting with grep. > > I think you may have found a bug in Osmosis' --simplify-change > algorithm. (Or, if you created the above 1103-1104.osc file yourself, > you have re-implemented a bug already present in Osmosis.) > > Both the normal diff and the daily diff are correct as far as I can see, > but the simplified version that you created - the one with 59068 > elements - is not. > > An object created earlier on that particular day and deleted between > 12:00 and 13:00 will not show up in the normal daily diff: > > $ zgrep -A1 -B1 ' $ > > It will show up twice in the replication diff, once for creation and > once for deletion: > > $ zgrep -A1 -B1 ' uid="419929" user="hoti" changeset="9728137" lat="47.4399545" > lon="16.4376938"/> > uid="547666" user="Igor Kurvanor" changeset="9728123" lat="45.7510611" > lon="6.2813975"/> > > > uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" > lon="6.2813975"/> > > $ > > Now if such a replication diff is simplified with Osmosis, in my opinion > it should drop the node altogether, but what it does is it always keeps > the highest version even if that corresponds to a deletion that > counteracts a previous creation: > > $ osmosis -q --read-xml-change 1103-1104.osc.gz --simc > --write-xml-change - | grep -A1 -B1 ' > uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" > lon="6.2813975"/> > > $ > > Now this is a minor bug because I don't know any consumer that will trip > on a deletion request for a non-exisitng object but still it is a > behaviour that I would not have expected. Anyway, it should explain the > discrepancy you are seeing. > > Bye > Frederik ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Incomplete diffs?
Hi, On 11/07/2011 02:24 AM, mar...@gmx.eu wrote: # normal diff $ zcat 2003-2004.osc.gz |grep -c "timestamp=\"2011-11-03T12:" 58968 # replication diff $ cat 1103-1104.osc |grep -c "timestamp=\"2011-11-03T12:" 59068 And yes, I thought on cumulating the version in the second file before I started counting with grep. I think you may have found a bug in Osmosis' --simplify-change algorithm. (Or, if you created the above 1103-1104.osc file yourself, you have re-implemented a bug already present in Osmosis.) Both the normal diff and the daily diff are correct as far as I can see, but the simplified version that you created - the one with 59068 elements - is not. An object created earlier on that particular day and deleted between 12:00 and 13:00 will not show up in the normal daily diff: $ zgrep -A1 -B1 'It will show up twice in the replication diff, once for creation and once for deletion: $ zgrep -A1 -B1 'uid="419929" user="hoti" changeset="9728137" lat="47.4399545" lon="16.4376938"/> uid="547666" user="Igor Kurvanor" changeset="9728123" lat="45.7510611" lon="6.2813975"/> uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" lon="6.2813975"/> $ Now if such a replication diff is simplified with Osmosis, in my opinion it should drop the node altogether, but what it does is it always keeps the highest version even if that corresponds to a deletion that counteracts a previous creation: $ osmosis -q --read-xml-change 1103-1104.osc.gz --simc --write-xml-change - | grep -A1 -B1 ' uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" lon="6.2813975"/> $ Now this is a minor bug because I don't know any consumer that will trip on a deletion request for a non-exisitng object but still it is a behaviour that I would not have expected. Anyway, it should explain the discrepancy you are seeing. Bye Frederik ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Incomplete diffs?
Hello, meanwhile osmupdate has been changed to download replication diffs only. But there is still an issue I cannot explain... I thought that the "normal" daily diffs would lack of some data which were provided around midnight, but there seem data to vanish from the middle of the day too: # normal diff $ zcat 2003-2004.osc.gz |grep -c "timestamp=\"2011-11-03T12:" 58968 # replication diff $ cat 1103-1104.osc |grep -c "timestamp=\"2011-11-03T12:" 59068 And yes, I thought on cumulating the version in the second file before I started counting with grep. Of course, it could easily be that I overlooked something - it's late in Germany. :-) Does anyone have an idea? Markus Original-Nachricht > Datum: Sun, 06 Nov 2011 00:32:49 +0100 > Von: mar...@gmx.eu > An: Frederik Ramm , dev@openstreetmap.org > Betreff: Re: [OSM-dev] Incomplete diffs? > Hi Frederik, > > thanks for the explanation! > > The _replication_ diffs are the right choice if you want to update a full > history file. > > Most people who update their OSM files on a regular basis do not need > replication diffs, they are satisfied with the newest version of each object > which has been changed. > > > So, if you want to use daily diffs but avoid the danger of missing > > edits, use the replication diff. > > Very good advice. > > Until today I chose NOT to use the diffs in the > planet.openstreetmap.org/history/ directory because they are outdated. They > usually come with a delay > of 25 hours. Du you know if the creation process could be accelerated > somehow? > > Now I will attend to osmupdate and try to change from daily normal diffs > to daily replication diffs. Seems to be better than loosing objects once in > a while. > > Meanwhile people can use Osmosis, or run osmupdate with the --hourly > option which will restrict the program to replication diffs. > > Markus > > ---- Original-Nachricht > > Datum: Sat, 05 Nov 2011 22:56:01 +0100 > > Von: Frederik Ramm > > An: dev@openstreetmap.org > > Betreff: Re: [OSM-dev] Incomplete diffs? > > > Hi, > > > > On 11/05/2011 06:58 PM, mar...@gmx.eu wrote: > > > Meanwhile I found out that this node simply did not appear in the > daily > > diffs: > > > http://www.openstreetmap.org/browse/node/1470178889 > > > > > > It was crated at 2011-10-16T23:58Z by a large changeset along with > > 23.000 other nodes. > > > Neither the 16/17 nor the 17/18 daily diff contain this node whereas > the > > hourly diff from October 17 01:00 does. > > > > There are two types of diffs; "replication diffs" and normal diffs. A > > replication diff contains everything that happened between two > > timestamps, including multiple changes of the same object, whereas a > > normal diff only contains the information required to get from state 1 > > to state 2. > > > > Also, replication diffs are created in a relatively fail-safe process > > with Osmosis whereas the normal diffs can miss changes in some cases > > when a long-running database transaction that was created before 0:00 > > extends past the time when the diff is created. (There was a time when > > we had only "normal" diffs, and it was near impossible to make sure the > > minutely/hourly ones did not miss anything.) > > > > For minutely and hourly diffs, we only offer replication diffs these > > days. For daily diffs, we have the normal ones under > > planet.openstreetmap.org/daily, as well as the replication diffs under > > planet.openstreetmap.org/history. > > > > The normal diff indeed lacks the node in question, but the daily > > replication diff under history/2011/1016-1017.osc.gz has it. > > > > So, if you want to use daily diffs but avoid the danger of missing > > edits, use the replication diff. > > > > Frankly I don't know why the normal daily diffs are still created at > > all; if one really wanted to offer a reduced-traffic version of the > > replication diffs then it would indeed make sense to simply deflate the > > replication diff using Osmosis' --simplify-change task. > > > > Bye > > Frederik > > > > -- > > Frederik Ramm ## eMail frede...@remote.org ## N49°00'09" > E008°23'33" > > > > ___ > > dev mailing list > > dev@openstreetmap.org > > http://lists.openstreetmap.org/listinfo/dev > > ___ > dev mailing list > dev@openstreetmap.org > http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Incomplete diffs?
Hi Frederik, thanks for the explanation! The _replication_ diffs are the right choice if you want to update a full history file. Most people who update their OSM files on a regular basis do not need replication diffs, they are satisfied with the newest version of each object which has been changed. > So, if you want to use daily diffs but avoid the danger of missing > edits, use the replication diff. Very good advice. Until today I chose NOT to use the diffs in the planet.openstreetmap.org/history/ directory because they are outdated. They usually come with a delay of 25 hours. Du you know if the creation process could be accelerated somehow? Now I will attend to osmupdate and try to change from daily normal diffs to daily replication diffs. Seems to be better than loosing objects once in a while. Meanwhile people can use Osmosis, or run osmupdate with the --hourly option which will restrict the program to replication diffs. Markus Original-Nachricht > Datum: Sat, 05 Nov 2011 22:56:01 +0100 > Von: Frederik Ramm > An: dev@openstreetmap.org > Betreff: Re: [OSM-dev] Incomplete diffs? > Hi, > > On 11/05/2011 06:58 PM, mar...@gmx.eu wrote: > > Meanwhile I found out that this node simply did not appear in the daily > diffs: > > http://www.openstreetmap.org/browse/node/1470178889 > > > > It was crated at 2011-10-16T23:58Z by a large changeset along with > 23.000 other nodes. > > Neither the 16/17 nor the 17/18 daily diff contain this node whereas the > hourly diff from October 17 01:00 does. > > There are two types of diffs; "replication diffs" and normal diffs. A > replication diff contains everything that happened between two > timestamps, including multiple changes of the same object, whereas a > normal diff only contains the information required to get from state 1 > to state 2. > > Also, replication diffs are created in a relatively fail-safe process > with Osmosis whereas the normal diffs can miss changes in some cases > when a long-running database transaction that was created before 0:00 > extends past the time when the diff is created. (There was a time when > we had only "normal" diffs, and it was near impossible to make sure the > minutely/hourly ones did not miss anything.) > > For minutely and hourly diffs, we only offer replication diffs these > days. For daily diffs, we have the normal ones under > planet.openstreetmap.org/daily, as well as the replication diffs under > planet.openstreetmap.org/history. > > The normal diff indeed lacks the node in question, but the daily > replication diff under history/2011/1016-1017.osc.gz has it. > > So, if you want to use daily diffs but avoid the danger of missing > edits, use the replication diff. > > Frankly I don't know why the normal daily diffs are still created at > all; if one really wanted to offer a reduced-traffic version of the > replication diffs then it would indeed make sense to simply deflate the > replication diff using Osmosis' --simplify-change task. > > Bye > Frederik > > -- > Frederik Ramm ## eMail frede...@remote.org ## N49°00'09" E008°23'33" > > ___ > dev mailing list > dev@openstreetmap.org > http://lists.openstreetmap.org/listinfo/dev ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Incomplete diffs?
Hi, On 11/05/2011 06:58 PM, mar...@gmx.eu wrote: Meanwhile I found out that this node simply did not appear in the daily diffs: http://www.openstreetmap.org/browse/node/1470178889 It was crated at 2011-10-16T23:58Z by a large changeset along with 23.000 other nodes. Neither the 16/17 nor the 17/18 daily diff contain this node whereas the hourly diff from October 17 01:00 does. There are two types of diffs; "replication diffs" and normal diffs. A replication diff contains everything that happened between two timestamps, including multiple changes of the same object, whereas a normal diff only contains the information required to get from state 1 to state 2. Also, replication diffs are created in a relatively fail-safe process with Osmosis whereas the normal diffs can miss changes in some cases when a long-running database transaction that was created before 0:00 extends past the time when the diff is created. (There was a time when we had only "normal" diffs, and it was near impossible to make sure the minutely/hourly ones did not miss anything.) For minutely and hourly diffs, we only offer replication diffs these days. For daily diffs, we have the normal ones under planet.openstreetmap.org/daily, as well as the replication diffs under planet.openstreetmap.org/history. The normal diff indeed lacks the node in question, but the daily replication diff under history/2011/1016-1017.osc.gz has it. So, if you want to use daily diffs but avoid the danger of missing edits, use the replication diff. Frankly I don't know why the normal daily diffs are still created at all; if one really wanted to offer a reduced-traffic version of the replication diffs then it would indeed make sense to simply deflate the replication diff using Osmosis' --simplify-change task. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09" E008°23'33" ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Incomplete diffs?
Hi Erik, thanks for your help. The missing node seems to be available via minutely and hourly diff files but NOT via the daily file. Meanwhile I found an explanation in the Wiki: http://wiki.openstreetmap.org/wiki/Planet.osm/diffs "The data in the file is the change in data between midnight on the first and second days, as identified by the timestamps on the current data. Because of the delay in creating the file it is very unlikely, but possible, that some data may be missing." If the problem is already known, why is the file not created with overlapping time borders, let's say from 00:00 to 24:01? This would be better than loosing data. Alternatively, the daily diff could easily be created by merging the latest 24 hourly diffs. (Just tried this, takes 10 seconds on my medium server.) Maybe we should propose to change the toolchain...? Markus Original-Nachricht > Datum: Sat, 5 Nov 2011 21:06:30 +0100 > Von: Erik Johansson > An: mar...@gmx.eu > Betreff: Re: [OSM-dev] Incomplete diffs? > On Sat, Nov 5, 2011 at 18:58, wrote: > > Hi all, > > > > this week, a user of osmupdate and osmfilter asked me for help. The > filter program's output contained a certain way but not all of its nodes. This > must not happen, of course, and I thought this would be a program error at > first. > > > > Meanwhile I found out that this node simply did not appear in the daily > diffs: > > http://www.openstreetmap.org/browse/node/1470178889 > > > > It was crated at 2011-10-16T23:58Z by a large changeset along with > 23.000 other nodes. > > Neither the 16/17 nor the 17/18 daily diff contain this node whereas the > hourly diff from October 17 01:00 does. > > > > How often does this happen? Is there a statistic about such gaps? > > > > Should I better switch to hourly diffs instead of daily ones? This would > increase the data traffic a bit as hourly diffs do not cumulate multiple > versions for each object. > > > > Just for reference I can't find that node in these dailies either: > http://planet.openstreetmap.org/daily/20111015-20111016.osc.gz > http://planet.openstreetmap.org/daily/20111016-20111017.osc.gz > http://planet.openstreetmap.org/daily/20111017-20111018.osc.gz > > But it's available in the minutely: > http://planet.openstreetmap.org/minute-replicate/001/028/610.gz > > > Somone on IRC just said this the other day: > > Weekly I think are still done with the planetdiff program > > everything else is done with osmosis [by merge from minutely]. > > I don't know if this is correct but try to merge two or three hours of > minutely with Osmosis, and see what you get. > > /Erik ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev