Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-04 Thread Michael Reichert
Hi,

Am 04/02/2020 um 15.25 schrieb Frederik Ramm:
> Hm, the wording is a bit unfortunate really. Of course this "internal
> use only" applies to the personal data in the file which according to
> (the LWG's interpretation of) the GDPR is ok to use for OSM's own
> purposes but not for blasting it out into the world. The
> non-personal-data history is free for everyone to use, and Geofabrik
> *could* actually make two different history files available, one with
> and one without user data, it's just that these files are a niche
> interest anyway so we thought one version is sufficient.
> 
> This means that if you derive anything like an animated map from the
> data, that's totally fine; only if you were to publish something that
> involves user data should you think twice about your data protection
> regulations.

You can strip the history files from osm-internal.download.geofabrik.de
using the following Osmium command to get OSM data with fewer legal
risks in terms of personal data:

osmium cat --output-format pbf,metadata=version+timestamp \
  -o output.osh.pbf input.osh.pbf

Best regards

Michael



signature.asc
Description: OpenPGP digital signature
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-04 Thread Frederik Ramm
Hi,

On 04.02.20 14:10, Colin Smale wrote:
>> The Geofabrik download server has full history files for every region it
>> offers. Unlike the non-history extracts. these files are only available
>> for users who log in with their OSM user name.
 
> Aah, thanks Frederik, I didn't know about this. But the text on the site
> seems to imply that it is only for "internal use" and I cannot use this
> data for a public service, e.g. a website to animate changes in admin
> boundaries. Can I get round that by cleaning out certain data?

Hm, the wording is a bit unfortunate really. Of course this "internal
use only" applies to the personal data in the file which according to
(the LWG's interpretation of) the GDPR is ok to use for OSM's own
purposes but not for blasting it out into the world. The
non-personal-data history is free for everyone to use, and Geofabrik
*could* actually make two different history files available, one with
and one without user data, it's just that these files are a niche
interest anyway so we thought one version is sufficient.

This means that if you derive anything like an animated map from the
data, that's totally fine; only if you were to publish something that
involves user data should you think twice about your data protection
regulations.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-04 Thread Colin Smale
On 2020-02-04 13:36, Frederik Ramm wrote:

> Hi,
> 
> On 04.02.20 13:22, Colin Smale wrote: 
> 
>> Correct me if I am wrong, but I don't remember ever seeing
>> regional full history files.
> 
> The Geofabrik download server has full history files for every region it
> offers. Unlike the non-history extracts. these files are only available
> for users who log in with their OSM user name.

Aah, thanks Frederik, I didn't know about this. But the text on the site
seems to imply that it is only for "internal use" and I cannot use this
data for a public service, e.g. a website to animate changes in admin
boundaries. Can I get round that by cleaning out certain data? 

> that will be millions of API calls to get the full history
> of every node, way and relation involved. If it has to be, then it has
> to be.
> Famous last words before being blocked on the API ;)

At least I would only have to do it once, throttle the rate and leave it
running for a few days, and then keep it updated with the periodical
diffs.___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-04 Thread Frederik Ramm
Hi,

On 04.02.20 13:22, Colin Smale wrote:
> Correct me if I am wrong, but I don't remember ever seeing
> regional full history files.

The Geofabrik download server has full history files for every region it
offers. Unlike the non-history extracts. these files are only available
for users who log in with their OSM user name.

> that will be millions of API calls to get the full history
> of every node, way and relation involved. If it has to be, then it has
> to be.

Famous last words before being blocked on the API ;)

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-04 Thread Colin Smale
I wonder how many users actually need a planet-wide planet file. Surely
there are loads of cases where a regional extract would suffice for the
use case in hand. How about encouraging people to consider using a
regional download? 

Something else, only slightly off-topic: I have often had ideas in my
head about looking at the dynamics of particular bits of data - i.e. OSM
histories. Correct me if I am wrong, but I don't remember ever seeing
regional full history files. I think there is a download available for
the entire planet with full history, but that is going to be a monster
and a real challenge to keep up-to-date. But it's the only way to get
historical data, except through the API. If I want to trace for example
the history of changes to county boundaries in the UK, or the alignment
of motorways, that will be millions of API calls to get the full history
of every node, way and relation involved. If it has to be, then it has
to be.___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-04 Thread Simon Poole
We are talking literally about a one command "pipeline" that already does 
everything right and consumes 1% of the volume of a weekly download of the 
planet. 

Not to mention that you get a daily (or whatever you want) updated planet out 
of it that contains a defined set of diffs (contrary to the planet dump).

Simon 
-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit Kaiten Mail gesendet.
-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit Kaiten Mail gesendet.
-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit Kaiten Mail gesendet.___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-04 Thread Jorge Sanz
A similar discussion has happened recently when BitTorrent downloads were
announced as an experimental feature (wiki discussion
, twitter
, reddit
,
yc ) and the reception seems
to be pretty good. My understanding is that there's a use case for projects
that don't need to have a permanently updated planet but want to do a full
update in a low pace frequency (say twice per year or so). Helping those
projects to achieve that part using a parallel download through several
mirrors (or BitTorrent) makes sense to me. The increase of bandwidth, and
eventually the need for more mirrors/peers could be considered a
correlation with the increase of adoption of OSM data, right?

Also, this tool is also meant to help to automate the download of data from
other extract sources (geofabrik, openstreetmap fr, bbbike) for smaller
areas, which is also useful in itself for projects aiming at a smaller
geographical context.

On Tue, 4 Feb 2020 at 01:42, Yuri Astrakhan  wrote:

> Andy, I agree that being frugal with bandwidth is important.  Yet, there
> is a significant operations cost involved here, that I suspect very few
> will actually be willing to spend, unless it is made trivial -- the cost of
> setting up an independent planet file update pipeline - i.e. a docker image
> that could be pointed to a planet file, and has an easy way to suspend and
> resume update when the file needs to be static during the loading process.
>
> I think having an optimized download tool that can both download+validate
> planet/areas, and could provide other services like diff updating would
> solve both goals.  If the current process is manually pick a mirror,
> manually validate, and re-download if failed, vs use a dedicated tool that
> will optimize the download and crash recovery, all the better. Especially
> if it offers us a way to add more functionality later, like patch updating.
>
> Also, lets not fall into the premature optimization trap, as that only
> achieves local rather than global maximums.  As a theoretical example -
> what is better - wider OSM adaption with higher number of planet downloads
> (i.e. some wasted bandwidth), or lower adaption and more optimized download
> process?  I would think OSM project would be better off from wider adaption
> at a cost of a 10-50% extra bandwidth, right?  If the startup cost (human
> hours) is lower, more people would participate.  My numbers could be
> totally off, but keeping the eyes on bigger picture is very important when
> deciding such matters.  Convenience/low manual labor cost is a big
> contributor to wider adaption.
>
> P.S. Am I correct to assume that every data consumer would need to
> download each diff file twice -- once for planet file update, and once to
> update the PostgreSQL database?
>
>
> On Sun, Feb 2, 2020 at 4:17 PM Andy Townsend  wrote:
>
>> ... that's why I said "... and there are ways of keeping a *.pbf* up to
>> date" in my message. - the idea is to avoid large downloads wherever
>> possible (emphasis new in this message; won't make it to plain text
>> archive).
>>
> Andy,
>
>
>> For more info see:
>>
>> https://docs.osmcode.org/pyosmium/latest/tools_uptodate.html
>>
>> or if that's somehow not an option:
>>
>>
>> https://wiki.openstreetmap.org/wiki/User:EdLoach#Osmosis_to_keep_a_local_copy_of_a_.osm_file_updated
>>
>> (from a few years back)
>>
>> * Automation / easy adaptation.  Providing an out-of-the box way to set
>> up your own server is much easier if you have a tool that automatically
>> downloads and validates the planet file or a portion of it,
>>
>> Sure - an automated process is much easier to follow than a manual
>> do-it-yourself one, but the fact that a process is automated doesn't mean
>> that it has to be wasteful.  Ultimately, someone's paying for the bandwidth
>> that downloads from each of the mirrors at
>> https://wiki.openstreetmap.org/wiki/Planet.osm#Planet.osm_mirrors use,
>> so it seems only fair to not be profligate with it.
>>
>> Best Regards,
>>
>> Andy
>>
>>
>>
>> ___
> talk mailing list
> talk@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk
>


-- 
Jorge Sanz
http://twitter.com/xurxosanz
http://jorgesanz.net
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-03 Thread Yuri Astrakhan
Andy, I agree that being frugal with bandwidth is important.  Yet, there is
a significant operations cost involved here, that I suspect very few will
actually be willing to spend, unless it is made trivial -- the cost of
setting up an independent planet file update pipeline - i.e. a docker image
that could be pointed to a planet file, and has an easy way to suspend and
resume update when the file needs to be static during the loading process.

I think having an optimized download tool that can both download+validate
planet/areas, and could provide other services like diff updating would
solve both goals.  If the current process is manually pick a mirror,
manually validate, and re-download if failed, vs use a dedicated tool that
will optimize the download and crash recovery, all the better. Especially
if it offers us a way to add more functionality later, like patch updating.

Also, lets not fall into the premature optimization trap, as that only
achieves local rather than global maximums.  As a theoretical example -
what is better - wider OSM adaption with higher number of planet downloads
(i.e. some wasted bandwidth), or lower adaption and more optimized download
process?  I would think OSM project would be better off from wider adaption
at a cost of a 10-50% extra bandwidth, right?  If the startup cost (human
hours) is lower, more people would participate.  My numbers could be
totally off, but keeping the eyes on bigger picture is very important when
deciding such matters.  Convenience/low manual labor cost is a big
contributor to wider adaption.

P.S. Am I correct to assume that every data consumer would need to download
each diff file twice -- once for planet file update, and once to update the
PostgreSQL database?


On Sun, Feb 2, 2020 at 4:17 PM Andy Townsend  wrote:

> ... that's why I said "... and there are ways of keeping a *.pbf* up to
> date" in my message. - the idea is to avoid large downloads wherever
> possible (emphasis new in this message; won't make it to plain text
> archive).
>
Andy,


> For more info see:
>
> https://docs.osmcode.org/pyosmium/latest/tools_uptodate.html
>
> or if that's somehow not an option:
>
>
> https://wiki.openstreetmap.org/wiki/User:EdLoach#Osmosis_to_keep_a_local_copy_of_a_.osm_file_updated
>
> (from a few years back)
>
> * Automation / easy adaptation.  Providing an out-of-the box way to set up
> your own server is much easier if you have a tool that automatically
> downloads and validates the planet file or a portion of it,
>
> Sure - an automated process is much easier to follow than a manual
> do-it-yourself one, but the fact that a process is automated doesn't mean
> that it has to be wasteful.  Ultimately, someone's paying for the bandwidth
> that downloads from each of the mirrors at
> https://wiki.openstreetmap.org/wiki/Planet.osm#Planet.osm_mirrors use, so
> it seems only fair to not be profligate with it.
>
> Best Regards,
>
> Andy
>
>
>
>
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-02 Thread Andy Townsend

On 02/02/2020 16:39, Yuri Astrakhan wrote:


* Anyone working on an evolving project like OpenMapTiles would attest 
that the import schema constantly changes.


Indeed, but ...


Every time schema changes, one needs to download newest planet, import 
it based on the new schema, and run diffs from that point.


... that's why I said "... and there are ways of keeping a *.pbf* up to 
date" in my message. - the idea is to avoid large downloads wherever 
possible (emphasis new in this message; won't make it to plain text 
archive).


For more info see:

https://docs.osmcode.org/pyosmium/latest/tools_uptodate.html

or if that's somehow not an option:

https://wiki.openstreetmap.org/wiki/User:EdLoach#Osmosis_to_keep_a_local_copy_of_a_.osm_file_updated

(from a few years back)

* Automation / easy adaptation.  Providing an out-of-the box way to 
set up your own server is much easier if you have a tool that 
automatically downloads and validates the planet file or a portion of it,


Sure - an automated process is much easier to follow than a manual 
do-it-yourself one, but the fact that a process is automated doesn't 
mean that it has to be wasteful.  Ultimately, someone's paying for the 
bandwidth that downloads from each of the mirrors at 
https://wiki.openstreetmap.org/wiki/Planet.osm#Planet.osm_mirrors use, 
so it seems only fair to not be profligate with it.


Best Regards,

Andy



___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-02 Thread Yves
While keeping a planet file up to date is really easy, it is probably not the 
first idea that comes to mind when you first plan to do something with the data.
This service looks like a good idea, but filtering abusers must be kept in mind 
anyway.
Yves ___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-02 Thread Yuri Astrakhan
Andy, two major reasons:

* Anyone working on an evolving project like OpenMapTiles would attest that
the import schema constantly changes. Every time schema changes, one needs
to download newest planet, import it based on the new schema, and run diffs
from that point.

* Automation / easy adaptation.  Providing an out-of-the box way to set up
your own server is much easier if you have a tool that automatically
downloads and validates the planet file or a portion of it, rather than
forcing each user to find the proper mirror, wait for an hour to download
it, only to find out that somehow that mirror has invalid data (my tool
finds when one mirror has slightly off file lengths and ignores it -
already happened several times, and it also auto-validates the file with
md5)

So yes, not having a tool is worse than having it for the above reasons.
Also, considering that others have already twitted about it, and a lot of
other OSM/geo community has liked it, it seems the tool has value to at
least some people, thus warrants inclusion in a newsletter.

On Sun, Feb 2, 2020 at 11:02 AM Andy Townsend  wrote:

> > For those who download OSM data regularly, there is now a simple way to
> reduce the load on the primary OSM servers, while also making download much
> faster and ensure the data is correct.
>
> Apologies if this has been done to death already, but surely if you are
> downloading the entire planet regularly you are quite simply "doing it
> wrong"?
>
> Minutely (and other frequency) diffs exist, and there are ways of keeping
> a .pbf up to date (both osmium-based and osmosis-based, if for some reason
> the former doesn't work for you).
>
> A "tool" that downloads from all mirrors in parallel just leads to a
> "tragedy of the commons" situation, and isn't a solution to the underlying
> problem of scarce resources.
>
> Best Regards,
> Andy
>
>
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-02 Thread Andy Townsend
 > For those who download OSM data regularly, there is now a simple way to reduce the load on the primary OSM servers, while also making download much faster and ensure the data is correct.Apologies if this has been done to death already, but surely if you are downloading the entire planet regularly you are quite simply "doing it wrong"?Minutely (and other frequency) diffs exist, and there are ways of keeping a .pbf up to date (both osmium-based and osmosis-based, if for some reason the former doesn't work for you).A "tool" that downloads from all mirrors in parallel just leads to a "tragedy of the commons" situation, and isn't a solution to the underlying problem of scarce resources.Best Regards,Andy   ___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] weeklyOSM #497 2020-01-21-2020-01-27

2020-02-02 Thread Yuri Astrakhan
The news mentions that downloads from Planet OSM are currently rate limited
to 400 kB/s and suggest to use mirrors, but does not mention the related
announcement about the new tool to simplify such downloads. I think it will
help anyone downloading, and it might be worth including in the next
weekly. Original announcement:



For those who download OSM data regularly, there is now a simple way to
reduce the load on the primary OSM servers, while also making download much
faster and ensure the data is correct.

OpenMapTiles new tool downloads the planet from all mirrors in parallel. It
usually takes just a few minutes, and it automatically verifies md5
checksum.  The tool will not use the primary planet source by default.  The
tool can also download and validate regional extracts from geofabrik,
bbbike, and osmfr.  Internally the tool uses aria2c.

Easiest is to use it with the docker -- share current dir with docker and
save the file there. Anything after the "--" is passed to aria2c. Here's a
Linux/Mac command, but should be runnable from Windows with the minor
command adjustment.

  docker run --rm -it -v $PWD:/download openmaptiles/openmaptiles-tools
download-osm planet -- -d /download

Use  --dry-run (-n)  to run it without the actual download (i.e. to see
which file it would download and from what mirrors). You may also add
--verbose (-v)

  docker run --rm -it openmaptiles/openmaptiles-tools download-osm planet
--dry-run

Use --help for all arguments.  See source and documentation here:
https://github.com/openmaptiles/openmaptiles-tools#multi-streamed-osm-data-downloader

P.S. Conceptually, the script is doing for OSM data what torrents were
designed to do, but sadly there is no well established web of torrents that
would offer similar functionality.

On Sun, Feb 2, 2020 at 9:29 AM weeklyteam  wrote:

> The weekly round-up of OSM news, issue # 497,
> is now available online in English, giving as always a summary of a lot of
> things happening in the openstreetmap world:
>
>  http://www.weeklyosm.eu/en/archives/12814/
>
> Enjoy!
>
> Did you know that you can also submit messages for the weeklyOSM? Just log
> in to https://osmbc.openstreetmap.de/login with your OSM account. Read
> more about how to write a post here:
> http://www.weeklyosm.eu/this-news-should-be-in-weeklyosm
>
> weeklyOSM?
> who: https://wiki.openstreetmap.org/wiki/WeeklyOSM#Available_Languages
> where?:
> https://umap.openstreetmap.fr/en/map/weeklyosm-is-currently-produced-in_56718#2/8.6/108.3
> ___
> talk mailing list
> talk@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk
>
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk