Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Martin Koppenhoefer
2018-02-14 17:49 GMT+01:00 Simon Poole :

> Generally I would prefer if we could simply have two versions of
> everything, one with metadata for authenticated users/consumers one
> without.
>


+1. This sounds like sane measures, the metadata is really an important
part for the community to work with the map.

I believe it is an overreaction to speculate about privacy issues with osm
metadata, which is pseudonymous data. You cannot deanonimize it without
other, additional data (e.g. real name, address, ideally combined with the
same nickname elsewhere, habits, interests, etc.). Yes, you can find the
center of activity of an active mapper, in some cases even the interests,
but that doesn't mean you can tell the residence or identity (save maybe
very few situations of people living in very low density areas). There also
isn't a very direct correlation of your edit and you being at a place (IP
addresses shouldn't be released of course), you can (and many do) add
something weeks, months or even years after you have observed it, you might
have used aerial imagery, or internet research, or mapillary, or edited for
a friend...

If it is required nonetheless, I'm with Roland, we should ask for explicit
permission.

Cheers,
Martin
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Simon Poole
The LWG won't be making any "decisions" we will be making
recommendations to the board, which may or may not take action on them.

Generally I would prefer if we could simply have two versions of
everything, one with metadata for authenticated users/consumers one
without. This likely will not be feasible for everything, but at least
for the important stuff.


Am 14.02.2018 um 17:17 schrieb Roland Olbricht:
> Hi,
>
>> - timestamps however cannot only potentially be used in lieu of
>> changeset ids to group contributions, the information itself is
>> problematic because it allows to profile contributions over time
>
> Timestamps are necessary to correctly figure out which nodes have
> belonged to a certain version of a way, and similarly for ways and
> nodes belonging to relations.
>
> More generally:
>
> - What is planned with regard to minute diffs? Stripping extra
> information will inevitably break tools like Achavi
>
> - Tools will need substantial time (I would estimate 3-6 months for
> Overpass API) to adapt in a meaningful way. What is the schedule of
> the LWG to take decisions?
>
The deadline is more or less clear for things that we consider really
touchy, they need to be fixed by the end of May.

Wrt Overpass API, there is no reason why you couldn't consume diffs as
up to now, as long as the output is sanitized (regardless of what the
OSMF says and does, the GDPR doesn't go away for you, so you need to
consider your options in any case).

> - How about simply asking the users for consent? We could then
> -- make a clear-cut last complete history dump before the date
> -- start with a planet dump without history before that date
> afterwards that then accumulates history only from users that have
> given consent
>

The problem is that that doesn't solve anything as, recently confirmed
by the EU, consent is only considered freely given and valid, if it can
be withdrawn, and from a practical pov that essentially forces two
distribution streams on different terms (one that can be used without
any privacy related restrictions and one with with all the trouble).

> Personally, I would prefer a solution as easy as dropping usernames
> and uids but retaining changeset ids, timestamps and the geometry/tag
> data.
> That way we display goodwill, but do not cripple the tools that have
> proven useful or crucial to run the project.

Unluckily what I would prefer is not the question :-/.

Simon

>
> Please note that in the context of an API without user interface, it
> is a substantial challenge in itself to have any form of (OAuth or so)
> authentification.
>
> Cheers,
> Roland
>
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev




signature.asc
Description: OpenPGP digital signature
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Roland Olbricht

Hi,

- timestamps however cannot only potentially be used in lieu of 
changeset ids to group contributions, the information itself is 
problematic because it allows to profile contributions over time


Timestamps are necessary to correctly figure out which nodes have 
belonged to a certain version of a way, and similarly for ways and nodes 
belonging to relations.


More generally:

- What is planned with regard to minute diffs? Stripping extra 
information will inevitably break tools like Achavi


- Tools will need substantial time (I would estimate 3-6 months for 
Overpass API) to adapt in a meaningful way. What is the schedule of the 
LWG to take decisions?


- How about simply asking the users for consent? We could then
-- make a clear-cut last complete history dump before the date
-- start with a planet dump without history before that date afterwards 
that then accumulates history only from users that have given consent


Personally, I would prefer a solution as easy as dropping usernames and 
uids but retaining changeset ids, timestamps and the geometry/tag data.
That way we display goodwill, but do not cripple the tools that have 
proven useful or crucial to run the project.


Please note that in the context of an API without user interface, it is 
a substantial challenge in itself to have any form of (OAuth or so) 
authentification.


Cheers,
Roland

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Simon Poole
General comments:

- we are just considering removing metadata from what is publicly
available outside of the OSM community, the current thinking is that it
can remain available to authenticated users

- while there might be a tiny bit of leakage from providing version
numbers we haven't considered them to be a large concern, and a good
argument can be made while they need to be public (see below)

- timestamps however cannot only potentially be used in lieu of
changeset ids to group contributions, the information itself is
problematic because it allows to profile contributions over time

Neither uid/display name and timestamp of an existing object version are
required to create a modified version for upload to the API, the version
number however is.

Simon


Am 14.02.2018 um 10:30 schrieb Michael Reichert:
> Hi,
>
> people are talking about potential changes to the amount of (personal)
> data distributed by OSM, in the light of new data protection laws
> becoming effective in the EU this May. There haven't been any official
> statements by the OSMF but discussions are going on in the LWG [1].
>
> Even though it is still unclear what the concrete steps will be, I have
> done some experiments. How well do our existing tools behave if you feed
> them with OSM data that has less metadata than usual, or no metadata at
> all? I have set up a test suite which tests Osmium-Tool (which uses the
> Libosmium library; master branch), Osmosis 0.44.1 and Osmconvert 0.6.
>
> The test suite is availabe at
> https://github.com/geofabrik/metadata-test/
> and consists of a Bash script. You need to have osmium, osmosis and
> osmconvert in your path (or you have to modify the script a bit). The
> test suite comes with its own hand crafted test data which will be first
> converted to PBF by Osmium. Afterwards all three tools will prove
> themselves in the following challenges:
>
> - converting XML to PBF
> - converting PBF to XML
> - converting XML to XML
> - applying a diff
> - deriving changes between two OSM files
>
> All challenges are run four times, one iteration with full metadata, one
> with timestamp and version fields, one with version field only and one
> without any metadata. Some PBF challenges will also have two variants –
> one with DenseNodes and one without.
>
> The results are files located in the output/ directory. You have to
> inspect them manually, I have not written a tool to parse them and
> output how many tests failed.
>
> *Results*
> I compiled the results into a spreadsheet. You can download it at
> https://github.com/geofabrik/metadata-test/raw/master/table.ods
>
> To sum them up:
> - Osmium is the only programme which passes all format conversion tests.
>
> - Osmosis cannot read any XML (OSM and OSC) files without timestamp and
> version fields.
>
> - Osmosis and Osmconvert [2] treat all metadata fields in the DenseInfo
> message of the PBF format as mandatory. However, the format
> specification doesn't declare these fields as mandatory. Therefore, they
> write default values into PBF files if the input lacks these fields:
> version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" (Osmosis [3]),
> timestamp="1970-01-01T00:00:01Z" changeset="1" version="1" (Osmconvert)
> This partially applies to the XML output of Osmosis, too.
>
> - Deriving a diff file of the changes between two OSM files only works
> if both files have the same amount of metadata. If one file contains
> less or more metadata, all objects will appear in the diff file with
> their new metadata and bloat it up. The question is whether this is the
> desired behaviour (i.e. the ability to clean a file from metadata using
> large diffs) or if this behaviour is not desired and the tools
> generating diffs should compare the tags, location and members of
> objects which have the same ID but different metadata.
>
> - Some tools have bugs which lead to wrong diffs (e.g. missing
> modifications) if some metadata fields are missing.
>
> Best regards
>
> Michael
>
>
> [1]
> https://wiki.osmfoundation.org/wiki/Working_Group_Minutes#Licensing_Working_Group
> [2] Osmium also had this bug. But it was fixed on the master branch a
> few days ago.
> [3] Osmium cannot parse negative version numbers and throws an exception.
>
>
>
>
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev



signature.asc
Description: OpenPGP digital signature
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Christoph Hormann
On Wednesday 14 February 2018, Darafei "Komяpa" Praliaskouski wrote:
> >
> > While this seems a useful test to do i wonder how the timestamp and
> > version fields are relevant regarding privacy and personal data
> > protection?
>
> If OSM API were fast enough, it would allow to rather easily group
> changes back to changesets. Number of changesets gets you back number
> of mappers. Classifying a returning mapper by edit pattern would
> allow to get back the geometric median of their edits, which brings
> you to knowing where they live.

No, even if the API was infinitely fast limited bandwidth and the 
possibility to append new edits to existing open changesets would make 
this impossible.  

And even if you could identify the changeset a certain feature was last 
modified in that would still not allow you to conclude which changesets 
are created by the same user.

If you practically want to reverse engineer user identities from a 
planet file with user info stripped looking at the data itself and 
mapper specific data charateristics (tag combinations, the way 
geometries are drawn) would likely be more useful than versions and 
timestamps.

-- 
Christoph Hormann
http://www.imagico.de/

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Komяpa
ср, 14 февр. 2018 г. в 17:47, Christoph Hormann :

> On Wednesday 14 February 2018, Michael Reichert wrote:
> >
> > All challenges are run four times, one iteration with full metadata,
> > one with timestamp and version fields, one with version field only
> > and one without any metadata. [...]
>
> While this seems a useful test to do i wonder how the timestamp and
> version fields are relevant regarding privacy and personal data
> protection?
>

If OSM API were fast enough, it would allow to rather easily group changes
back to changesets. Number of changesets gets you back number of mappers.
Classifying a returning mapper by edit pattern would allow to get back the
geometric median of their edits, which brings you to knowing where they
live.

(To make OSM API upload faster don't forget to join the efforts in
https://github.com/zerebubuth/openstreetmap-cgimap/issues/140)



> I know a possible answer could be that in combination with personal data
> (like user names or ids) it would provide additional information on
> people.  But this argument applies to *any data* including the geometry
> (i.e. coordinates) and tags.
>
> So what is the special thing from a legal standpoint about versions and
> timestamps compared to geometries and tags?
>
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Christoph Hormann
On Wednesday 14 February 2018, Michael Reichert wrote:
>
> All challenges are run four times, one iteration with full metadata,
> one with timestamp and version fields, one with version field only
> and one without any metadata. [...]

While this seems a useful test to do i wonder how the timestamp and 
version fields are relevant regarding privacy and personal data 
protection?

I know a possible answer could be that in combination with personal data 
(like user names or ids) it would provide additional information on 
people.  But this argument applies to *any data* including the geometry 
(i.e. coordinates) and tags.

So what is the special thing from a legal standpoint about versions and 
timestamps compared to geometries and tags?

-- 
Christoph Hormann
http://www.imagico.de/

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Frederik Ramm
Hi,

On 14.02.2018 15:23, Martin Koppenhoefer wrote:
> it seems Brexit could become effective March next year. Maybe we just wait?

We would still have to apply EU regulations to processing the data of EU
citizens.

> I really hope we will not obfuscate or remove meta data because of some
> EU privacy regulation, please do not overreact.

The LWG is, or has been, discussing this with lawyers so I hope they
will come up with sensible recommendations. I don't think the new
regulations will be without consequences though.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Martin Koppenhoefer
2018-02-14 10:30 GMT+01:00 Michael Reichert :

> Hi,
>
> people are talking about potential changes to the amount of (personal)
> data distributed by OSM, in the light of new data protection laws
> becoming effective in the EU this May.



it seems Brexit could become effective March next year. Maybe we just wait?
What is the UK position regarding the planned EU data protection amendments?



> All challenges are run four times, one iteration with full metadata, one
> with timestamp and version fields, one with version field only and one
> without any metadata.



if you consider timestamps and version fields private data, you could also
consider object ids private data (they are assigned consecutively, you
could create delete frequently "test" objects to correlate object ids and
timestamps ;-) ).

I really hope we will not obfuscate or remove meta data because of some EU
privacy regulation, please do not overreact.

Cheers,
Martin
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Michael Reichert
Hi,

people are talking about potential changes to the amount of (personal)
data distributed by OSM, in the light of new data protection laws
becoming effective in the EU this May. There haven't been any official
statements by the OSMF but discussions are going on in the LWG [1].

Even though it is still unclear what the concrete steps will be, I have
done some experiments. How well do our existing tools behave if you feed
them with OSM data that has less metadata than usual, or no metadata at
all? I have set up a test suite which tests Osmium-Tool (which uses the
Libosmium library; master branch), Osmosis 0.44.1 and Osmconvert 0.6.

The test suite is availabe at
https://github.com/geofabrik/metadata-test/
and consists of a Bash script. You need to have osmium, osmosis and
osmconvert in your path (or you have to modify the script a bit). The
test suite comes with its own hand crafted test data which will be first
converted to PBF by Osmium. Afterwards all three tools will prove
themselves in the following challenges:

- converting XML to PBF
- converting PBF to XML
- converting XML to XML
- applying a diff
- deriving changes between two OSM files

All challenges are run four times, one iteration with full metadata, one
with timestamp and version fields, one with version field only and one
without any metadata. Some PBF challenges will also have two variants –
one with DenseNodes and one without.

The results are files located in the output/ directory. You have to
inspect them manually, I have not written a tool to parse them and
output how many tests failed.

*Results*
I compiled the results into a spreadsheet. You can download it at
https://github.com/geofabrik/metadata-test/raw/master/table.ods

To sum them up:
- Osmium is the only programme which passes all format conversion tests.

- Osmosis cannot read any XML (OSM and OSC) files without timestamp and
version fields.

- Osmosis and Osmconvert [2] treat all metadata fields in the DenseInfo
message of the PBF format as mandatory. However, the format
specification doesn't declare these fields as mandatory. Therefore, they
write default values into PBF files if the input lacks these fields:
version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" (Osmosis [3]),
timestamp="1970-01-01T00:00:01Z" changeset="1" version="1" (Osmconvert)
This partially applies to the XML output of Osmosis, too.

- Deriving a diff file of the changes between two OSM files only works
if both files have the same amount of metadata. If one file contains
less or more metadata, all objects will appear in the diff file with
their new metadata and bloat it up. The question is whether this is the
desired behaviour (i.e. the ability to clean a file from metadata using
large diffs) or if this behaviour is not desired and the tools
generating diffs should compare the tags, location and members of
objects which have the same ID but different metadata.

- Some tools have bugs which lead to wrong diffs (e.g. missing
modifications) if some metadata fields are missing.

Best regards

Michael


[1]
https://wiki.osmfoundation.org/wiki/Working_Group_Minutes#Licensing_Working_Group
[2] Osmium also had this bug. But it was fixed on the master branch a
few days ago.
[3] Osmium cannot parse negative version numbers and throws an exception.


-- 
Michael Reichert  www.geofabrik.de
Geofabrik GmbHHandelsregister: HRB Mannheim 703657
Amalienstr. 44Geschaeftsfuehrung: C. Karch, F. Ramm
76133 Karlsruhe   Tel: 0721-1803560-3
reich...@geofabrik.de Fax: 0721-1803560-9



signature.asc
Description: OpenPGP digital signature
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev