Re: [OSM-dev] [overpass] OSM API lookups to complement minutely diffs?

2016-09-18 Thread mmd


Am 18.09.2016 um 15:27 schrieb Stefan Keller:

> 
> 0. The typical delay (augmented diff id) compared to current time is 1
> to 4 minutes.

Right, you can see the current delay in munin [1], and the current
timestamp via a dedicated API call [2].

[1]
http://overpass-api.de/munin/localdomain/localhost.localdomain/osm_db_lag.html
[2] http://overpass-api.de/api/timestamp

> 1. The id is always increasing but sometimes an id (or more) is/are
> left out between to consecutive deliveries.

At first sight, IDs may appear to be left out. But as the ID just
corresponds exactly to the database timestamp, there may be some
non-consecutive increases.

However, that's not really an issue. The reason for the larger gaps is
the way how minutely diffs are processed: if there's a backlog of a few
minutes, several diffs will be processed as one package to speed up
processing and the timestamp increases by more than a minute.

From the outside, augmented_diff_status appears to have jumped by
several numbers, but you still have to query every number in between,
even if it wasn't announced by augmented_diff_status.

> 2. The time interval for a new diff can be more than a minute
> (sometimes hours in blackouts).

Please remember that augmented diffs are calculated on the fly. Based on
the sequence id you provide, data from an interval of exactly one minute
is returned.

It also doesn't matter how the minutely diffs were originally posted to
the database. Augmented diffs just follow a fixed 60 second grid, with a
1:1 relationship between the id and the respective timeframe.

For reference, here's the respective code snippet. I think a few lines
of code tell a thousand words ;)

https://github.com/drolbr/Overpass-API/blob/master/src/cgi-bin/augmented_diff#L40-L48

> So there may be still missing diff files when fetching up (which means
> that an internal sequence id is not enough; there is also a need for a
> list of missing id's)?

There are really no missing ids, as mentioned, augmented_diff_status
just returns the maximum available number to be queried. You still need
to query every single id up to and including that number.

> 
> And the client can poll every minute (sometimes longer if client is
> busy or diff downloading takes time) without hitting load-limitations?
> 

Well, http://overpass-api.de/api/status should have a line containing
"slots available now" in there, otherwise, all slots are currently busy
and you will hit a HTTP 429 error. IIRC, Roland explained that idea
somewhere on the Overpass Dev list before...

-- 



___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] [overpass] OSM API lookups to complement minutely diffs?

2016-09-18 Thread Stefan Keller
Thanks mmd: That was exactly what I tried to summarize after some observations:

0. The typical delay (augmented diff id) compared to current time is 1
to 4 minutes.
1. The id is always increasing but sometimes an id (or more) is/are
left out between to consecutive deliveries.
2. The time interval for a new diff can be more than a minute
(sometimes hours in blackouts).
3. The interval delivering new diffs can be less than a minute (esp.
also when previous left out's are handed in later).

So a client polls augmented_diff_status and
a. waits to download diffs if there's no new id number from
augmented_diff_status,
b. or it fetches diffs up to and including to the augmented_diff_status id

So there may be still missing diff files when fetching up (which means
that an internal sequence id is not enough; there is also a need for a
list of missing id's)?

And the client can poll every minute (sometimes longer if client is
busy or diff downloading takes time) without hitting load-limitations?

:Stefan

P.S. I'm at SoTM too this week!


2016-09-18 14:58 GMT+02:00 mmd :
> Hi,
>
> Am 18.09.2016 um 14:41 schrieb Michael Larsen:
>> Hi,
>>
>> Last time I tried consuming augmented diffs on a minutely basis, I hit the
>> load-limitations which meant that I could not consume augmented diffs for 
>> some
>> time afterwards, i.e. this will lead to black holes in your history.
>
> the value returned by augmented_diff_status corresponds to current
> database timestamp (as number of minutes since the license change). It
> does not need to increase one-by-one(!), e.g. the database may process
> several minutely diffs in one go, due to some backlog. If you always
> download the number returned by augmented_diff_stauts, you will indeed
> get some holes!
>
> That situation can be avoided, if you keep your own internal sequence
> id, and fetch augmented diffs up to and including to the value returned
> by augmented_diff_status.
>
> If augmented_diff_status does not return any value (due to overload),
> just wait some time and try again. The same applies to downloading
> augmented_diffs: you may get HTTP 429 or HTTP 504 in case of overload,
> or if you exceed your quota (see /api/status for details). In that case,
> don't increase your internal sequence id yet, but try downloading the
> same augmented diff again.
>
>>
>> Also, using timestamp start/end to fetch diffs for a given timestamp (like
>> avachi) is problematic with some changesets that stay open for > 1 hours 
>> (this
>> happens quite ofter). The live service running on osm.expandable.dk use the
>> API as described previously to get the augmented diff for a changeset. If
>> there where a better way I'm all ears!
>>
>
> I hope the situation will improve a bit once the database has moved to
> version 0.7.53 and a different compression. If you're at SotM next week,
> you could maybe ask Roland for a current status.
>
> --
>
>

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev