Re: [OSM-dev] Working with OSM data with less or no metadata
2018-02-14 17:49 GMT+01:00 Simon Poole : > Generally I would prefer if we could simply have two versions of > everything, one with metadata for authenticated users/consumers one > without. > +1. This sounds like sane measures, the metadata is really an important part for the community to work with the map. I believe it is an overreaction to speculate about privacy issues with osm metadata, which is pseudonymous data. You cannot deanonimize it without other, additional data (e.g. real name, address, ideally combined with the same nickname elsewhere, habits, interests, etc.). Yes, you can find the center of activity of an active mapper, in some cases even the interests, but that doesn't mean you can tell the residence or identity (save maybe very few situations of people living in very low density areas). There also isn't a very direct correlation of your edit and you being at a place (IP addresses shouldn't be released of course), you can (and many do) add something weeks, months or even years after you have observed it, you might have used aerial imagery, or internet research, or mapillary, or edited for a friend... If it is required nonetheless, I'm with Roland, we should ask for explicit permission. Cheers, Martin ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Working with OSM data with less or no metadata
The LWG won't be making any "decisions" we will be making recommendations to the board, which may or may not take action on them. Generally I would prefer if we could simply have two versions of everything, one with metadata for authenticated users/consumers one without. This likely will not be feasible for everything, but at least for the important stuff. Am 14.02.2018 um 17:17 schrieb Roland Olbricht: > Hi, > >> - timestamps however cannot only potentially be used in lieu of >> changeset ids to group contributions, the information itself is >> problematic because it allows to profile contributions over time > > Timestamps are necessary to correctly figure out which nodes have > belonged to a certain version of a way, and similarly for ways and > nodes belonging to relations. > > More generally: > > - What is planned with regard to minute diffs? Stripping extra > information will inevitably break tools like Achavi > > - Tools will need substantial time (I would estimate 3-6 months for > Overpass API) to adapt in a meaningful way. What is the schedule of > the LWG to take decisions? > The deadline is more or less clear for things that we consider really touchy, they need to be fixed by the end of May. Wrt Overpass API, there is no reason why you couldn't consume diffs as up to now, as long as the output is sanitized (regardless of what the OSMF says and does, the GDPR doesn't go away for you, so you need to consider your options in any case). > - How about simply asking the users for consent? We could then > -- make a clear-cut last complete history dump before the date > -- start with a planet dump without history before that date > afterwards that then accumulates history only from users that have > given consent > The problem is that that doesn't solve anything as, recently confirmed by the EU, consent is only considered freely given and valid, if it can be withdrawn, and from a practical pov that essentially forces two distribution streams on different terms (one that can be used without any privacy related restrictions and one with with all the trouble). > Personally, I would prefer a solution as easy as dropping usernames > and uids but retaining changeset ids, timestamps and the geometry/tag > data. > That way we display goodwill, but do not cripple the tools that have > proven useful or crucial to run the project. Unluckily what I would prefer is not the question :-/. Simon > > Please note that in the context of an API without user interface, it > is a substantial challenge in itself to have any form of (OAuth or so) > authentification. > > Cheers, > Roland > > ___ > dev mailing list > dev@openstreetmap.org > https://lists.openstreetmap.org/listinfo/dev signature.asc Description: OpenPGP digital signature ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Working with OSM data with less or no metadata
Hi, - timestamps however cannot only potentially be used in lieu of changeset ids to group contributions, the information itself is problematic because it allows to profile contributions over time Timestamps are necessary to correctly figure out which nodes have belonged to a certain version of a way, and similarly for ways and nodes belonging to relations. More generally: - What is planned with regard to minute diffs? Stripping extra information will inevitably break tools like Achavi - Tools will need substantial time (I would estimate 3-6 months for Overpass API) to adapt in a meaningful way. What is the schedule of the LWG to take decisions? - How about simply asking the users for consent? We could then -- make a clear-cut last complete history dump before the date -- start with a planet dump without history before that date afterwards that then accumulates history only from users that have given consent Personally, I would prefer a solution as easy as dropping usernames and uids but retaining changeset ids, timestamps and the geometry/tag data. That way we display goodwill, but do not cripple the tools that have proven useful or crucial to run the project. Please note that in the context of an API without user interface, it is a substantial challenge in itself to have any form of (OAuth or so) authentification. Cheers, Roland ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Working with OSM data with less or no metadata
General comments: - we are just considering removing metadata from what is publicly available outside of the OSM community, the current thinking is that it can remain available to authenticated users - while there might be a tiny bit of leakage from providing version numbers we haven't considered them to be a large concern, and a good argument can be made while they need to be public (see below) - timestamps however cannot only potentially be used in lieu of changeset ids to group contributions, the information itself is problematic because it allows to profile contributions over time Neither uid/display name and timestamp of an existing object version are required to create a modified version for upload to the API, the version number however is. Simon Am 14.02.2018 um 10:30 schrieb Michael Reichert: > Hi, > > people are talking about potential changes to the amount of (personal) > data distributed by OSM, in the light of new data protection laws > becoming effective in the EU this May. There haven't been any official > statements by the OSMF but discussions are going on in the LWG [1]. > > Even though it is still unclear what the concrete steps will be, I have > done some experiments. How well do our existing tools behave if you feed > them with OSM data that has less metadata than usual, or no metadata at > all? I have set up a test suite which tests Osmium-Tool (which uses the > Libosmium library; master branch), Osmosis 0.44.1 and Osmconvert 0.6. > > The test suite is availabe at > https://github.com/geofabrik/metadata-test/ > and consists of a Bash script. You need to have osmium, osmosis and > osmconvert in your path (or you have to modify the script a bit). The > test suite comes with its own hand crafted test data which will be first > converted to PBF by Osmium. Afterwards all three tools will prove > themselves in the following challenges: > > - converting XML to PBF > - converting PBF to XML > - converting XML to XML > - applying a diff > - deriving changes between two OSM files > > All challenges are run four times, one iteration with full metadata, one > with timestamp and version fields, one with version field only and one > without any metadata. Some PBF challenges will also have two variants – > one with DenseNodes and one without. > > The results are files located in the output/ directory. You have to > inspect them manually, I have not written a tool to parse them and > output how many tests failed. > > *Results* > I compiled the results into a spreadsheet. You can download it at > https://github.com/geofabrik/metadata-test/raw/master/table.ods > > To sum them up: > - Osmium is the only programme which passes all format conversion tests. > > - Osmosis cannot read any XML (OSM and OSC) files without timestamp and > version fields. > > - Osmosis and Osmconvert [2] treat all metadata fields in the DenseInfo > message of the PBF format as mandatory. However, the format > specification doesn't declare these fields as mandatory. Therefore, they > write default values into PBF files if the input lacks these fields: > version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" (Osmosis [3]), > timestamp="1970-01-01T00:00:01Z" changeset="1" version="1" (Osmconvert) > This partially applies to the XML output of Osmosis, too. > > - Deriving a diff file of the changes between two OSM files only works > if both files have the same amount of metadata. If one file contains > less or more metadata, all objects will appear in the diff file with > their new metadata and bloat it up. The question is whether this is the > desired behaviour (i.e. the ability to clean a file from metadata using > large diffs) or if this behaviour is not desired and the tools > generating diffs should compare the tags, location and members of > objects which have the same ID but different metadata. > > - Some tools have bugs which lead to wrong diffs (e.g. missing > modifications) if some metadata fields are missing. > > Best regards > > Michael > > > [1] > https://wiki.osmfoundation.org/wiki/Working_Group_Minutes#Licensing_Working_Group > [2] Osmium also had this bug. But it was fixed on the master branch a > few days ago. > [3] Osmium cannot parse negative version numbers and throws an exception. > > > > > ___ > dev mailing list > dev@openstreetmap.org > https://lists.openstreetmap.org/listinfo/dev signature.asc Description: OpenPGP digital signature ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Working with OSM data with less or no metadata
On Wednesday 14 February 2018, Darafei "Komяpa" Praliaskouski wrote: > > > > While this seems a useful test to do i wonder how the timestamp and > > version fields are relevant regarding privacy and personal data > > protection? > > If OSM API were fast enough, it would allow to rather easily group > changes back to changesets. Number of changesets gets you back number > of mappers. Classifying a returning mapper by edit pattern would > allow to get back the geometric median of their edits, which brings > you to knowing where they live. No, even if the API was infinitely fast limited bandwidth and the possibility to append new edits to existing open changesets would make this impossible. And even if you could identify the changeset a certain feature was last modified in that would still not allow you to conclude which changesets are created by the same user. If you practically want to reverse engineer user identities from a planet file with user info stripped looking at the data itself and mapper specific data charateristics (tag combinations, the way geometries are drawn) would likely be more useful than versions and timestamps. -- Christoph Hormann http://www.imagico.de/ ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Working with OSM data with less or no metadata
ср, 14 февр. 2018 г. в 17:47, Christoph Hormann : > On Wednesday 14 February 2018, Michael Reichert wrote: > > > > All challenges are run four times, one iteration with full metadata, > > one with timestamp and version fields, one with version field only > > and one without any metadata. [...] > > While this seems a useful test to do i wonder how the timestamp and > version fields are relevant regarding privacy and personal data > protection? > If OSM API were fast enough, it would allow to rather easily group changes back to changesets. Number of changesets gets you back number of mappers. Classifying a returning mapper by edit pattern would allow to get back the geometric median of their edits, which brings you to knowing where they live. (To make OSM API upload faster don't forget to join the efforts in https://github.com/zerebubuth/openstreetmap-cgimap/issues/140) > I know a possible answer could be that in combination with personal data > (like user names or ids) it would provide additional information on > people. But this argument applies to *any data* including the geometry > (i.e. coordinates) and tags. > > So what is the special thing from a legal standpoint about versions and > timestamps compared to geometries and tags? > ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Working with OSM data with less or no metadata
On Wednesday 14 February 2018, Michael Reichert wrote: > > All challenges are run four times, one iteration with full metadata, > one with timestamp and version fields, one with version field only > and one without any metadata. [...] While this seems a useful test to do i wonder how the timestamp and version fields are relevant regarding privacy and personal data protection? I know a possible answer could be that in combination with personal data (like user names or ids) it would provide additional information on people. But this argument applies to *any data* including the geometry (i.e. coordinates) and tags. So what is the special thing from a legal standpoint about versions and timestamps compared to geometries and tags? -- Christoph Hormann http://www.imagico.de/ ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Working with OSM data with less or no metadata
Hi, On 14.02.2018 15:23, Martin Koppenhoefer wrote: > it seems Brexit could become effective March next year. Maybe we just wait? We would still have to apply EU regulations to processing the data of EU citizens. > I really hope we will not obfuscate or remove meta data because of some > EU privacy regulation, please do not overreact. The LWG is, or has been, discussing this with lawyers so I hope they will come up with sensible recommendations. I don't think the new regulations will be without consequences though. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09" E008°23'33" ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Working with OSM data with less or no metadata
2018-02-14 10:30 GMT+01:00 Michael Reichert : > Hi, > > people are talking about potential changes to the amount of (personal) > data distributed by OSM, in the light of new data protection laws > becoming effective in the EU this May. it seems Brexit could become effective March next year. Maybe we just wait? What is the UK position regarding the planned EU data protection amendments? > All challenges are run four times, one iteration with full metadata, one > with timestamp and version fields, one with version field only and one > without any metadata. if you consider timestamps and version fields private data, you could also consider object ids private data (they are assigned consecutively, you could create delete frequently "test" objects to correlate object ids and timestamps ;-) ). I really hope we will not obfuscate or remove meta data because of some EU privacy regulation, please do not overreact. Cheers, Martin ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Working with OSM data with less or no metadata
Hi, people are talking about potential changes to the amount of (personal) data distributed by OSM, in the light of new data protection laws becoming effective in the EU this May. There haven't been any official statements by the OSMF but discussions are going on in the LWG [1]. Even though it is still unclear what the concrete steps will be, I have done some experiments. How well do our existing tools behave if you feed them with OSM data that has less metadata than usual, or no metadata at all? I have set up a test suite which tests Osmium-Tool (which uses the Libosmium library; master branch), Osmosis 0.44.1 and Osmconvert 0.6. The test suite is availabe at https://github.com/geofabrik/metadata-test/ and consists of a Bash script. You need to have osmium, osmosis and osmconvert in your path (or you have to modify the script a bit). The test suite comes with its own hand crafted test data which will be first converted to PBF by Osmium. Afterwards all three tools will prove themselves in the following challenges: - converting XML to PBF - converting PBF to XML - converting XML to XML - applying a diff - deriving changes between two OSM files All challenges are run four times, one iteration with full metadata, one with timestamp and version fields, one with version field only and one without any metadata. Some PBF challenges will also have two variants – one with DenseNodes and one without. The results are files located in the output/ directory. You have to inspect them manually, I have not written a tool to parse them and output how many tests failed. *Results* I compiled the results into a spreadsheet. You can download it at https://github.com/geofabrik/metadata-test/raw/master/table.ods To sum them up: - Osmium is the only programme which passes all format conversion tests. - Osmosis cannot read any XML (OSM and OSC) files without timestamp and version fields. - Osmosis and Osmconvert [2] treat all metadata fields in the DenseInfo message of the PBF format as mandatory. However, the format specification doesn't declare these fields as mandatory. Therefore, they write default values into PBF files if the input lacks these fields: version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" (Osmosis [3]), timestamp="1970-01-01T00:00:01Z" changeset="1" version="1" (Osmconvert) This partially applies to the XML output of Osmosis, too. - Deriving a diff file of the changes between two OSM files only works if both files have the same amount of metadata. If one file contains less or more metadata, all objects will appear in the diff file with their new metadata and bloat it up. The question is whether this is the desired behaviour (i.e. the ability to clean a file from metadata using large diffs) or if this behaviour is not desired and the tools generating diffs should compare the tags, location and members of objects which have the same ID but different metadata. - Some tools have bugs which lead to wrong diffs (e.g. missing modifications) if some metadata fields are missing. Best regards Michael [1] https://wiki.osmfoundation.org/wiki/Working_Group_Minutes#Licensing_Working_Group [2] Osmium also had this bug. But it was fixed on the master branch a few days ago. [3] Osmium cannot parse negative version numbers and throws an exception. -- Michael Reichert www.geofabrik.de Geofabrik GmbHHandelsregister: HRB Mannheim 703657 Amalienstr. 44Geschaeftsfuehrung: C. Karch, F. Ramm 76133 Karlsruhe Tel: 0721-1803560-3 reich...@geofabrik.de Fax: 0721-1803560-9 signature.asc Description: OpenPGP digital signature ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev