Re: [OSM-dev] Incomplete diffs?

2011-11-30 Thread Brett Henderson
On 21 November 2011 06:06, Peter Körner  wrote:

> Am 05.11.2011 22:33, schrieb mar...@gmx.eu:
>
> > Hi Erik,
> >
> > thanks for your help. The missing node seems to be available via
> minutely and hourly diff files but NOT via the daily file.
> >
> > Meanwhile I found an explanation in the Wiki:
> > http://wiki.openstreetmap.org/wiki/Planet.osm/diffs
>

I've made a number of updates to the table on that page to (hopefully)
better explain the various extracts available.

I've also just created a day-replicate job to complement the existing
minute and hour jobs.

Now is probably a good time to consider the daily diffs deprecated in
favour of the day-replicate diffs.

Brett
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Incomplete diffs?

2011-11-20 Thread Peter Körner

Am 05.11.2011 22:33, schrieb mar...@gmx.eu:
> Hi Erik,
>
> thanks for your help. The missing node seems to be available via 
minutely and hourly diff files but NOT via the daily file.

>
> Meanwhile I found an explanation in the Wiki:
> http://wiki.openstreetmap.org/wiki/Planet.osm/diffs
>
> ...
>
> If the problem is already known, why is the file not created with 
overlapping time borders, let's say from 00:00 to 24:01? This would be 
better than loosing data.


This is what the replication idffs are used for. They are based on 
database transaction numbers and don't miss changes:


http://planet.osm.org/hour-replicate/
http://planet.osm.org/minute-replicate/

Peter

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Incomplete diffs?

2011-11-07 Thread marqqs
Hello Frederik,

ok, it really must have been late. :-)
Thank you for the explanation, sounds perfect.

I wouldn't call it a bug at all because it may be necessary to keep such delete 
requests:

Let's say you found an out-of-date .osm file and want to update it. You guess, 
the file is from last Saturday 12:00 but you're not sure. Therefore you 
cumulate replication diffs for the time range between Saturday 10:00 (2 hours 
earlier) and today.

Let's further assume that a node had been created at 10:15 and was deleted at 
11:45. This node would be excluded from an "ideal" simplified diff.

If the old .osm file in question in fact has the state of Saturday 11:00, it 
would know about the created node but never become aware of its deletion.

In the end: I'm happy about this "bug". :-)

However this doesn't make it easier to determine how much data you lose in 
taking the normal diffs instead of the replicated ones. But eventually I will 
get the answer... somehow.

Markus


 Original-Nachricht 
> Datum: Mon, 07 Nov 2011 09:06:32 +0100
> Von: Frederik Ramm 
> An: mar...@gmx.eu
> CC: dev@openstreetmap.org
> Betreff: Re: [OSM-dev] Incomplete diffs?

> Hi,
> 
> On 11/07/2011 02:24 AM, mar...@gmx.eu wrote:
> > # normal diff
> > $ zcat 2003-2004.osc.gz |grep -c "timestamp=\"2011-11-03T12:"
> > 58968
> >
> > # replication diff
> > $ cat 1103-1104.osc |grep -c "timestamp=\"2011-11-03T12:"
> > 59068
> >
> > And yes, I thought on cumulating the version in the second file before I
> started counting with grep.
> 
> I think you may have found a bug in Osmosis' --simplify-change 
> algorithm. (Or, if you created the above 1103-1104.osc file yourself, 
> you have re-implemented a bug already present in Osmosis.)
> 
> Both the normal diff and the daily diff are correct as far as I can see, 
> but the simplified version that you created - the one with 59068 
> elements - is not.
> 
> An object created earlier on that particular day and deleted between 
> 12:00 and 13:00 will not show up in the normal daily diff:
> 
> $ zgrep -A1 -B1 ' $
> 
> It will show up twice in the replication diff, once for creation and 
> once for deletion:
> 
> $ zgrep -A1 -B1 '   uid="419929" user="hoti" changeset="9728137" lat="47.4399545" 
> lon="16.4376938"/>
>   uid="547666" user="Igor Kurvanor" changeset="9728123" lat="45.7510611" 
> lon="6.2813975"/>
>
>
>   uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" 
> lon="6.2813975"/>
>
> $
> 
> Now if such a replication diff is simplified with Osmosis, in my opinion 
> it should drop the node altogether, but what it does is it always keeps 
> the highest version even if that corresponds to a deletion that 
> counteracts a previous creation:
> 
> $ osmosis -q --read-xml-change 1103-1104.osc.gz --simc 
> --write-xml-change - | grep -A1 -B1 '
>   uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" 
> lon="6.2813975"/>
>
> $
> 
> Now this is a minor bug because I don't know any consumer that will trip 
> on a deletion request for a non-exisitng object but still it is a 
> behaviour that I would not have expected. Anyway, it should explain the 
> discrepancy you are seeing.
> 
> Bye
> Frederik

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Incomplete diffs?

2011-11-07 Thread Frederik Ramm

Hi,

On 11/07/2011 02:24 AM, mar...@gmx.eu wrote:

# normal diff
$ zcat 2003-2004.osc.gz |grep -c "timestamp=\"2011-11-03T12:"
58968

# replication diff
$ cat 1103-1104.osc |grep -c "timestamp=\"2011-11-03T12:"
59068

And yes, I thought on cumulating the version in the second file before I 
started counting with grep.


I think you may have found a bug in Osmosis' --simplify-change 
algorithm. (Or, if you created the above 1103-1104.osc file yourself, 
you have re-implemented a bug already present in Osmosis.)


Both the normal diff and the daily diff are correct as far as I can see, 
but the simplified version that you created - the one with 59068 
elements - is not.


An object created earlier on that particular day and deleted between 
12:00 and 13:00 will not show up in the normal daily diff:


$ zgrep -A1 -B1 'It will show up twice in the replication diff, once for creation and 
once for deletion:


$ zgrep -A1 -B1 'uid="419929" user="hoti" changeset="9728137" lat="47.4399545" 
lon="16.4376938"/>
uid="547666" user="Igor Kurvanor" changeset="9728123" lat="45.7510611" 
lon="6.2813975"/>

  
  
uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" 
lon="6.2813975"/>

  
$

Now if such a replication diff is simplified with Osmosis, in my opinion 
it should drop the node altogether, but what it does is it always keeps 
the highest version even if that corresponds to a deletion that 
counteracts a previous creation:


$ osmosis -q --read-xml-change 1103-1104.osc.gz --simc 
--write-xml-change - | grep -A1 -B1 '
  
uid="547666" user="Igor Kurvanor" changeset="9730094" lat="45.7510611" 
lon="6.2813975"/>

  
$

Now this is a minor bug because I don't know any consumer that will trip 
on a deletion request for a non-exisitng object but still it is a 
behaviour that I would not have expected. Anyway, it should explain the 
discrepancy you are seeing.


Bye
Frederik

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Incomplete diffs?

2011-11-06 Thread marqqs
Hello,

meanwhile osmupdate has been changed to download replication diffs only.

But there is still an issue I cannot explain...

I thought that the "normal" daily diffs would lack of some data which were 
provided around midnight, but there seem data to vanish from the middle of the 
day too:

# normal diff
$ zcat 2003-2004.osc.gz |grep -c "timestamp=\"2011-11-03T12:"
58968

# replication diff
$ cat 1103-1104.osc |grep -c "timestamp=\"2011-11-03T12:"
59068

And yes, I thought on cumulating the version in the second file before I 
started counting with grep.

Of course, it could easily be that I overlooked something - it's late in 
Germany. :-)
Does anyone have an idea?

Markus


 Original-Nachricht 
> Datum: Sun, 06 Nov 2011 00:32:49 +0100
> Von: mar...@gmx.eu
> An: Frederik Ramm , dev@openstreetmap.org
> Betreff: Re: [OSM-dev] Incomplete diffs?

> Hi Frederik,
> 
> thanks for the explanation!
> 
> The _replication_ diffs are the right choice if you want to update a full
> history file.
> 
> Most people who update their OSM files on a regular basis do not need
> replication diffs, they are satisfied with the newest version of each object
> which has been changed.
> 
> > So, if you want to use daily diffs but avoid the danger of missing 
> > edits, use the replication diff.
> 
> Very good advice.
> 
> Until today I chose NOT to use the diffs in the
> planet.openstreetmap.org/history/ directory because they are outdated. They 
> usually come with a delay
> of 25 hours. Du you know if the creation process could be accelerated
> somehow?
> 
> Now I will attend to osmupdate and try to change from daily normal diffs
> to daily replication diffs. Seems to be better than loosing objects once in
> a while.
> 
> Meanwhile people can use Osmosis, or run osmupdate with the --hourly
> option which will restrict the program to replication diffs.
> 
> Markus
> 
> ---- Original-Nachricht 
> > Datum: Sat, 05 Nov 2011 22:56:01 +0100
> > Von: Frederik Ramm 
> > An: dev@openstreetmap.org
> > Betreff: Re: [OSM-dev] Incomplete diffs?
> 
> > Hi,
> > 
> > On 11/05/2011 06:58 PM, mar...@gmx.eu wrote:
> > > Meanwhile I found out that this node simply did not appear in the
> daily
> > diffs:
> > > http://www.openstreetmap.org/browse/node/1470178889
> > >
> > > It was crated at 2011-10-16T23:58Z by a large changeset along with
> > 23.000 other nodes.
> > > Neither the 16/17 nor the 17/18 daily diff contain this node whereas
> the
> > hourly diff from October 17 01:00 does.
> > 
> > There are two types of diffs; "replication diffs" and normal diffs. A 
> > replication diff contains everything that happened between two 
> > timestamps, including multiple changes of the same object, whereas a 
> > normal diff only contains the information required to get from state 1 
> > to state 2.
> > 
> > Also, replication diffs are created in a relatively fail-safe process 
> > with Osmosis whereas the normal diffs can miss changes in some cases 
> > when a long-running database transaction that was created before 0:00 
> > extends past the time when the diff is created. (There was a time when 
> > we had only "normal" diffs, and it was near impossible to make sure the 
> > minutely/hourly ones did not miss anything.)
> > 
> > For minutely and hourly diffs, we only offer replication diffs these 
> > days. For daily diffs, we have the normal ones under 
> > planet.openstreetmap.org/daily, as well as the replication diffs under 
> > planet.openstreetmap.org/history.
> > 
> > The normal diff indeed lacks the node in question, but the daily 
> > replication diff under history/2011/1016-1017.osc.gz has it.
> > 
> > So, if you want to use daily diffs but avoid the danger of missing 
> > edits, use the replication diff.
> > 
> > Frankly I don't know why the normal daily diffs are still created at 
> > all; if one really wanted to offer a reduced-traffic version of the 
> > replication diffs then it would indeed make sense to simply deflate the 
> > replication diff using Osmosis' --simplify-change task.
> > 
> > Bye
> > Frederik
> > 
> > -- 
> > Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09"
> E008°23'33"
> > 
> > ___
> > dev mailing list
> > dev@openstreetmap.org
> > http://lists.openstreetmap.org/listinfo/dev
> 
> ___
> dev mailing list
> dev@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Incomplete diffs?

2011-11-05 Thread marqqs
Hi Frederik,

thanks for the explanation!

The _replication_ diffs are the right choice if you want to update a full 
history file.

Most people who update their OSM files on a regular basis do not need 
replication diffs, they are satisfied with the newest version of each object 
which has been changed.

> So, if you want to use daily diffs but avoid the danger of missing 
> edits, use the replication diff.

Very good advice.

Until today I chose NOT to use the diffs in the 
planet.openstreetmap.org/history/ directory because they are outdated. They 
usually come with a delay of 25 hours. Du you know if the creation process 
could be accelerated somehow?

Now I will attend to osmupdate and try to change from daily normal diffs to 
daily replication diffs. Seems to be better than loosing objects once in a 
while.

Meanwhile people can use Osmosis, or run osmupdate with the --hourly option 
which will restrict the program to replication diffs.

Markus

 Original-Nachricht 
> Datum: Sat, 05 Nov 2011 22:56:01 +0100
> Von: Frederik Ramm 
> An: dev@openstreetmap.org
> Betreff: Re: [OSM-dev] Incomplete diffs?

> Hi,
> 
> On 11/05/2011 06:58 PM, mar...@gmx.eu wrote:
> > Meanwhile I found out that this node simply did not appear in the daily
> diffs:
> > http://www.openstreetmap.org/browse/node/1470178889
> >
> > It was crated at 2011-10-16T23:58Z by a large changeset along with
> 23.000 other nodes.
> > Neither the 16/17 nor the 17/18 daily diff contain this node whereas the
> hourly diff from October 17 01:00 does.
> 
> There are two types of diffs; "replication diffs" and normal diffs. A 
> replication diff contains everything that happened between two 
> timestamps, including multiple changes of the same object, whereas a 
> normal diff only contains the information required to get from state 1 
> to state 2.
> 
> Also, replication diffs are created in a relatively fail-safe process 
> with Osmosis whereas the normal diffs can miss changes in some cases 
> when a long-running database transaction that was created before 0:00 
> extends past the time when the diff is created. (There was a time when 
> we had only "normal" diffs, and it was near impossible to make sure the 
> minutely/hourly ones did not miss anything.)
> 
> For minutely and hourly diffs, we only offer replication diffs these 
> days. For daily diffs, we have the normal ones under 
> planet.openstreetmap.org/daily, as well as the replication diffs under 
> planet.openstreetmap.org/history.
> 
> The normal diff indeed lacks the node in question, but the daily 
> replication diff under history/2011/1016-1017.osc.gz has it.
> 
> So, if you want to use daily diffs but avoid the danger of missing 
> edits, use the replication diff.
> 
> Frankly I don't know why the normal daily diffs are still created at 
> all; if one really wanted to offer a reduced-traffic version of the 
> replication diffs then it would indeed make sense to simply deflate the 
> replication diff using Osmosis' --simplify-change task.
> 
> Bye
> Frederik
> 
> -- 
> Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"
> 
> ___
> dev mailing list
> dev@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Incomplete diffs?

2011-11-05 Thread Frederik Ramm

Hi,

On 11/05/2011 06:58 PM, mar...@gmx.eu wrote:

Meanwhile I found out that this node simply did not appear in the daily diffs:
http://www.openstreetmap.org/browse/node/1470178889

It was crated at 2011-10-16T23:58Z by a large changeset along with 23.000 other 
nodes.
Neither the 16/17 nor the 17/18 daily diff contain this node whereas the hourly 
diff from October 17 01:00 does.


There are two types of diffs; "replication diffs" and normal diffs. A 
replication diff contains everything that happened between two 
timestamps, including multiple changes of the same object, whereas a 
normal diff only contains the information required to get from state 1 
to state 2.


Also, replication diffs are created in a relatively fail-safe process 
with Osmosis whereas the normal diffs can miss changes in some cases 
when a long-running database transaction that was created before 0:00 
extends past the time when the diff is created. (There was a time when 
we had only "normal" diffs, and it was near impossible to make sure the 
minutely/hourly ones did not miss anything.)


For minutely and hourly diffs, we only offer replication diffs these 
days. For daily diffs, we have the normal ones under 
planet.openstreetmap.org/daily, as well as the replication diffs under 
planet.openstreetmap.org/history.


The normal diff indeed lacks the node in question, but the daily 
replication diff under history/2011/1016-1017.osc.gz has it.


So, if you want to use daily diffs but avoid the danger of missing 
edits, use the replication diff.


Frankly I don't know why the normal daily diffs are still created at 
all; if one really wanted to offer a reduced-traffic version of the 
replication diffs then it would indeed make sense to simply deflate the 
replication diff using Osmosis' --simplify-change task.


Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Incomplete diffs?

2011-11-05 Thread marqqs
Hi Erik,

thanks for your help. The missing node seems to be available via minutely and 
hourly diff files but NOT via the daily file.

Meanwhile I found an explanation in the Wiki:
http://wiki.openstreetmap.org/wiki/Planet.osm/diffs

"The data in the file is the change in data between midnight on the first and 
second days, as identified by the timestamps on the current data. Because of 
the delay in creating the file it is very unlikely, but possible, that some 
data may be missing."

If the problem is already known, why is the file not created with overlapping 
time borders, let's say from 00:00 to 24:01? This would be better than loosing 
data.

Alternatively, the daily diff could easily be created by merging the latest 24 
hourly diffs.
(Just tried this, takes 10 seconds on my medium server.)

Maybe we should propose to change the toolchain...?

Markus

 Original-Nachricht 
> Datum: Sat, 5 Nov 2011 21:06:30 +0100
> Von: Erik Johansson 
> An: mar...@gmx.eu
> Betreff: Re: [OSM-dev] Incomplete diffs?

> On Sat, Nov 5, 2011 at 18:58,   wrote:
> > Hi all,
> >
> > this week, a user of osmupdate and osmfilter asked me for help. The
> filter program's output contained a certain way but not all of its nodes. This
> must not happen, of course, and I thought this would be a program error at
> first.
> >
> > Meanwhile I found out that this node simply did not appear in the daily
> diffs:
> > http://www.openstreetmap.org/browse/node/1470178889
> >
> > It was crated at 2011-10-16T23:58Z by a large changeset along with
> 23.000 other nodes.
> > Neither the 16/17 nor the 17/18 daily diff contain this node whereas the
> hourly diff from October 17 01:00 does.
> >
> > How often does this happen? Is there a statistic about such gaps?
> >
> > Should I better switch to hourly diffs instead of daily ones? This would
> increase the data traffic a bit as hourly diffs do not cumulate multiple
> versions for each object.
> >
> 
> Just for reference I can't find that node in these dailies either:
> http://planet.openstreetmap.org/daily/20111015-20111016.osc.gz
> http://planet.openstreetmap.org/daily/20111016-20111017.osc.gz
> http://planet.openstreetmap.org/daily/20111017-20111018.osc.gz
> 
> But it's available in the minutely:
> http://planet.openstreetmap.org/minute-replicate/001/028/610.gz
> 
> 
> Somone on IRC just said this the other day:
> > Weekly I think are still done with the planetdiff program
> > everything else is done with osmosis [by merge from minutely].
> 
> I don't know if this is correct but try to merge two or three hours of
> minutely with Osmosis, and see what you get.
> 
> /Erik

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev