Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-17 Thread Jochen Topf
On Wed, Feb 14, 2018 at 05:17:09PM +0100, Roland Olbricht wrote:
> > - timestamps however cannot only potentially be used in lieu of
> > changeset ids to group contributions, the information itself is
> > problematic because it allows to profile contributions over time
> 
> Timestamps are necessary to correctly figure out which nodes have belonged
> to a certain version of a way, and similarly for ways and nodes belonging to
> relations.

Just want to hilight this. Timestamps are not optional when working with
history data, without them it isn't possible to figure out which object
of a specific versions refers to other specific object versions. This is
a result of objects refering to other objects by id only and not by (id,
version) pair.

So when working with non-history (ie only current data), you don't need
any metadata at all (to correctly interpret the geodata). But when
working with history data, you need both, version and timestamp.

Jochen
-- 
Jochen Topf  joc...@remote.org  https://www.jochentopf.com/  +49-351-31778688

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-16 Thread Simon Poole


Am 16.02.2018 um 14:24 schrieb Martin Koppenhoefer:
>
>
> I don't share the interpretation that OSMF processes personal data
> (besides the e-mail addresses and maybe IP addresses used by its
> contributors, which are neither distributed nor public), because I
> don't think that our mappers can be identified with the data and
> metadata of their contributions. I.E. they are not identifiable
> natural persons because they cannot be identified, directly or
> indirectly.

Naturally we have the case of the licence change which proved the exact
opposite.

But that doesn't matter in any case as the GDPR does not require that
what qualifies as personal data be directly associated with an
individual by personal name (which seems to be what you are thinking
of), I quote "an identifiable natural person is one who can be
identified, directly or indirectly, in particular by reference to an
identifier such as a name, an identification number, location data, an
online identifier or to one or more factors specific to the physical,
physiological, genetic, mental, economic, cultural or social identity of
that natural person;"
> Yes, if you know who they are you can see what they did, but you
> cannot see from what they did who they are. At best you can guess, but
> it only works if you have additional information that the person (or
> someone else) would have to provide you with. What we have according
> to these definitions is "pseudonymisation" (because OSMF has the
> sign-up e-mail address associated with the user number, and is
> therefor in a position to make personal data from the contributions).
>
> If someone tries to reverse the pseudonymisation of our contributor's
> data and metadata, it would be this person to be in breach of the law.

Pseudonymisation is one of the data protection safe guards proposed by
the GDPR, use of it does not make the data itself less "Personal Data"
see Recital 26 /"Personal data which have undergone
//pseudonymisation//, which could be attributed to a natural person by
the use of additional information should be considered to be information
on an identifiable natural person". /, it just may make some processing
possible of such data that otherwise would not be permissible.

>
> An exception might occur in very rare cases in areas where the
> contributor is the only person being there within a big distance, i.e.
> extremely remote areas, and probably not in the European Union.

Again, see above, we know first hand how many of our contributors can be
identified alone from display name, location of initial edits, other
hints and so on. Not quite sure why you are in denial about this as you
were present when that took place.

Simon

>
> For reference,
>
> General Data Protection Regulation
> https://ec.europa.eu/info/law/law-topic/data-protection/data-protection-eu_en
>
>
> Cheers,
> Martin



signature.asc
Description: OpenPGP digital signature
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-16 Thread Martin Koppenhoefer
2018-02-16 13:37 GMT+01:00 Simon Poole :

> The intellectual property rights (I re-quote: "that is restricted by
> copyright, database right or any related right") have nothing to do with
> the subject at hand, the data privacy rights of the individual data
> subject. As a consequence the contributor terms have no bearing, in any
> form, at all, even in an alternative universe, on the matter.
>


I really have no idea what "related right" means, not even if it relates to
"copyright and database right" or to "Contents".




>
> If you look at our recommendation document you will note that we believe
> that we currently do not have consent as defined by the GDPR for the
> processing we do. As a consequence we will likely recommend  asking for
> explicit consent somewhere in the sign up process (from a content pov this
> already exists in the privacy policy but it needs to be re-jigged to work
> as part of the terms of use that will have to be explicitly agreed to for
> account creation).
>
> However having valid consent for current processing does not remove the
> issue that Paul has pointed out (again) that consent can be redrawn and
> that such a withdrawal applies retroactively. The main cause why we one way
> or the other should change what data we distribute to the general public.
>


by asking explicitly we would confirm we believe that privacy rights are
relevant, and it could indeed become more of a problem as people revoke.

You are refering to this document:
https://docs.google.com/document/d/1EjccQNm3awl7eQlk1jGYyoGJVavJG_bEfX8iCMEuC9U/edit#

The relevant paragraph is "Does the OSMF process ‘personal data’?"

I don't share the interpretation that OSMF processes personal data (besides
the e-mail addresses and maybe IP addresses used by its contributors, which
are neither distributed nor public), because I don't think that our mappers
can be identified with the data and metadata of their contributions. I.E.
they are not identifiable natural persons because they cannot be
identified, directly or indirectly. Yes, if you know who they are you can
see what they did, but you cannot see from what they did who they are. At
best you can guess, but it only works if you have additional information
that the person (or someone else) would have to provide you with. What we
have according to these definitions is "pseudonymisation" (because OSMF has
the sign-up e-mail address associated with the user number, and is therefor
in a position to make personal data from the contributions).

If someone tries to reverse the pseudonymisation of our contributor's data
and metadata, it would be this person to be in breach of the law.

An exception might occur in very rare cases in areas where the contributor
is the only person being there within a big distance, i.e. extremely remote
areas, and probably not in the European Union.

For reference,

General Data Protection Regulation
https://ec.europa.eu/info/law/law-topic/data-protection/data-protection-eu_en


Cheers,
Martin
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-16 Thread Simon Poole


Am 16.02.2018 um 13:09 schrieb Martin Koppenhoefer:
> 2018-02-16 0:04 GMT+01:00 Paul Norman  >:
>
> On 2/14/2018 8:17 AM, Roland Olbricht wrote:
>
>
> - How about simply asking the users for consent? We could then
> -- make a clear-cut last complete history dump before the date
> -- start with a planet dump without history before that date
> afterwards that then accumulates history only from users that
> have given consent
>
>
> Consent is revocable. If we didn't have to deal with people
> revoking consent and account deletion requests, it would all be
> much easier.
>
>
>
>
> We are asking for "a worldwide, royalty-free, non-exclusive,
> perpetual, irrevocable licence to do any act that is restricted by
> copyright, database right or any related right over anything within
> the Contents, whether in the original medium or any other." Do you
> have reason to believe the "irrevocable" part is invalid?
No, because you can give an irrevocable licence in intellectual property
matters (that is a rough generalisation, I know, as certain
jurisdictions actually limit that).
>
> "Contents" means "data and/or any other content (collectively,
> “Contents”)" [which the user contributes] "to the geo-database of the
> OpenStreetMap project"
> https://wiki.osmfoundation.org/wiki/Licence/Contributor_Terms
>
> Account deletions are another issue, but don't seem complicated:
> remove the human readable account alias and e-mail forwarding and
> prevent it from editing.
>
The intellectual property rights (I re-quote: "that is restricted by
copyright, database right or any related right") have nothing to do with
the subject at hand, the data privacy rights of the individual data
subject. As a consequence the contributor terms have no bearing, in any
form, at all, even in an alternative universe, on the matter.

If you look at our recommendation document you will note that we believe
that we currently do not have consent as defined by the GDPR for the
processing we do. As a consequence we will likely recommend  asking for
explicit consent somewhere in the sign up process (from a content pov
this already exists in the privacy policy but it needs to be re-jigged
to work as part of the terms of use that will have to be explicitly
agreed to for account creation).

However having valid consent for current processing does not remove the
issue that Paul has pointed out (again) that consent can be redrawn and
that such a withdrawal applies retroactively. The main cause why we one
way or the other should change what data we distribute to the general
public.

Simon

> Cheers,
> Martin
>
>
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev



signature.asc
Description: OpenPGP digital signature
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-16 Thread Martin Koppenhoefer
2018-02-16 0:04 GMT+01:00 Paul Norman :

> On 2/14/2018 8:17 AM, Roland Olbricht wrote:
>
>>
>> - How about simply asking the users for consent? We could then
>> -- make a clear-cut last complete history dump before the date
>> -- start with a planet dump without history before that date afterwards
>> that then accumulates history only from users that have given consent
>>
>
> Consent is revocable. If we didn't have to deal with people revoking
> consent and account deletion requests, it would all be much easier.




We are asking for "a worldwide, royalty-free, non-exclusive, perpetual,
irrevocable licence to do any act that is restricted by copyright, database
right or any related right over anything within the Contents, whether in
the original medium or any other." Do you have reason to believe the
"irrevocable" part is invalid?

"Contents" means "data and/or any other content (collectively, “Contents”)"
[which the user contributes] "to the geo-database of the OpenStreetMap
project"
https://wiki.osmfoundation.org/wiki/Licence/Contributor_Terms

Account deletions are another issue, but don't seem complicated: remove the
human readable account alias and e-mail forwarding and prevent it from
editing.

Cheers,
Martin
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-15 Thread Paul Norman

On 2/14/2018 8:17 AM, Roland Olbricht wrote:


- How about simply asking the users for consent? We could then
-- make a clear-cut last complete history dump before the date
-- start with a planet dump without history before that date 
afterwards that then accumulates history only from users that have 
given consent 


Consent is revocable. If we didn't have to deal with people revoking 
consent and account deletion requests, it would all be much easier.


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Martin Koppenhoefer
2018-02-14 17:49 GMT+01:00 Simon Poole :

> Generally I would prefer if we could simply have two versions of
> everything, one with metadata for authenticated users/consumers one
> without.
>


+1. This sounds like sane measures, the metadata is really an important
part for the community to work with the map.

I believe it is an overreaction to speculate about privacy issues with osm
metadata, which is pseudonymous data. You cannot deanonimize it without
other, additional data (e.g. real name, address, ideally combined with the
same nickname elsewhere, habits, interests, etc.). Yes, you can find the
center of activity of an active mapper, in some cases even the interests,
but that doesn't mean you can tell the residence or identity (save maybe
very few situations of people living in very low density areas). There also
isn't a very direct correlation of your edit and you being at a place (IP
addresses shouldn't be released of course), you can (and many do) add
something weeks, months or even years after you have observed it, you might
have used aerial imagery, or internet research, or mapillary, or edited for
a friend...

If it is required nonetheless, I'm with Roland, we should ask for explicit
permission.

Cheers,
Martin
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Simon Poole
The LWG won't be making any "decisions" we will be making
recommendations to the board, which may or may not take action on them.

Generally I would prefer if we could simply have two versions of
everything, one with metadata for authenticated users/consumers one
without. This likely will not be feasible for everything, but at least
for the important stuff.


Am 14.02.2018 um 17:17 schrieb Roland Olbricht:
> Hi,
>
>> - timestamps however cannot only potentially be used in lieu of
>> changeset ids to group contributions, the information itself is
>> problematic because it allows to profile contributions over time
>
> Timestamps are necessary to correctly figure out which nodes have
> belonged to a certain version of a way, and similarly for ways and
> nodes belonging to relations.
>
> More generally:
>
> - What is planned with regard to minute diffs? Stripping extra
> information will inevitably break tools like Achavi
>
> - Tools will need substantial time (I would estimate 3-6 months for
> Overpass API) to adapt in a meaningful way. What is the schedule of
> the LWG to take decisions?
>
The deadline is more or less clear for things that we consider really
touchy, they need to be fixed by the end of May.

Wrt Overpass API, there is no reason why you couldn't consume diffs as
up to now, as long as the output is sanitized (regardless of what the
OSMF says and does, the GDPR doesn't go away for you, so you need to
consider your options in any case).

> - How about simply asking the users for consent? We could then
> -- make a clear-cut last complete history dump before the date
> -- start with a planet dump without history before that date
> afterwards that then accumulates history only from users that have
> given consent
>

The problem is that that doesn't solve anything as, recently confirmed
by the EU, consent is only considered freely given and valid, if it can
be withdrawn, and from a practical pov that essentially forces two
distribution streams on different terms (one that can be used without
any privacy related restrictions and one with with all the trouble).

> Personally, I would prefer a solution as easy as dropping usernames
> and uids but retaining changeset ids, timestamps and the geometry/tag
> data.
> That way we display goodwill, but do not cripple the tools that have
> proven useful or crucial to run the project.

Unluckily what I would prefer is not the question :-/.

Simon

>
> Please note that in the context of an API without user interface, it
> is a substantial challenge in itself to have any form of (OAuth or so)
> authentification.
>
> Cheers,
> Roland
>
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev




signature.asc
Description: OpenPGP digital signature
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Roland Olbricht

Hi,

- timestamps however cannot only potentially be used in lieu of 
changeset ids to group contributions, the information itself is 
problematic because it allows to profile contributions over time


Timestamps are necessary to correctly figure out which nodes have 
belonged to a certain version of a way, and similarly for ways and nodes 
belonging to relations.


More generally:

- What is planned with regard to minute diffs? Stripping extra 
information will inevitably break tools like Achavi


- Tools will need substantial time (I would estimate 3-6 months for 
Overpass API) to adapt in a meaningful way. What is the schedule of the 
LWG to take decisions?


- How about simply asking the users for consent? We could then
-- make a clear-cut last complete history dump before the date
-- start with a planet dump without history before that date afterwards 
that then accumulates history only from users that have given consent


Personally, I would prefer a solution as easy as dropping usernames and 
uids but retaining changeset ids, timestamps and the geometry/tag data.
That way we display goodwill, but do not cripple the tools that have 
proven useful or crucial to run the project.


Please note that in the context of an API without user interface, it is 
a substantial challenge in itself to have any form of (OAuth or so) 
authentification.


Cheers,
Roland

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Simon Poole
General comments:

- we are just considering removing metadata from what is publicly
available outside of the OSM community, the current thinking is that it
can remain available to authenticated users

- while there might be a tiny bit of leakage from providing version
numbers we haven't considered them to be a large concern, and a good
argument can be made while they need to be public (see below)

- timestamps however cannot only potentially be used in lieu of
changeset ids to group contributions, the information itself is
problematic because it allows to profile contributions over time

Neither uid/display name and timestamp of an existing object version are
required to create a modified version for upload to the API, the version
number however is.

Simon


Am 14.02.2018 um 10:30 schrieb Michael Reichert:
> Hi,
>
> people are talking about potential changes to the amount of (personal)
> data distributed by OSM, in the light of new data protection laws
> becoming effective in the EU this May. There haven't been any official
> statements by the OSMF but discussions are going on in the LWG [1].
>
> Even though it is still unclear what the concrete steps will be, I have
> done some experiments. How well do our existing tools behave if you feed
> them with OSM data that has less metadata than usual, or no metadata at
> all? I have set up a test suite which tests Osmium-Tool (which uses the
> Libosmium library; master branch), Osmosis 0.44.1 and Osmconvert 0.6.
>
> The test suite is availabe at
> https://github.com/geofabrik/metadata-test/
> and consists of a Bash script. You need to have osmium, osmosis and
> osmconvert in your path (or you have to modify the script a bit). The
> test suite comes with its own hand crafted test data which will be first
> converted to PBF by Osmium. Afterwards all three tools will prove
> themselves in the following challenges:
>
> - converting XML to PBF
> - converting PBF to XML
> - converting XML to XML
> - applying a diff
> - deriving changes between two OSM files
>
> All challenges are run four times, one iteration with full metadata, one
> with timestamp and version fields, one with version field only and one
> without any metadata. Some PBF challenges will also have two variants –
> one with DenseNodes and one without.
>
> The results are files located in the output/ directory. You have to
> inspect them manually, I have not written a tool to parse them and
> output how many tests failed.
>
> *Results*
> I compiled the results into a spreadsheet. You can download it at
> https://github.com/geofabrik/metadata-test/raw/master/table.ods
>
> To sum them up:
> - Osmium is the only programme which passes all format conversion tests.
>
> - Osmosis cannot read any XML (OSM and OSC) files without timestamp and
> version fields.
>
> - Osmosis and Osmconvert [2] treat all metadata fields in the DenseInfo
> message of the PBF format as mandatory. However, the format
> specification doesn't declare these fields as mandatory. Therefore, they
> write default values into PBF files if the input lacks these fields:
> version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" (Osmosis [3]),
> timestamp="1970-01-01T00:00:01Z" changeset="1" version="1" (Osmconvert)
> This partially applies to the XML output of Osmosis, too.
>
> - Deriving a diff file of the changes between two OSM files only works
> if both files have the same amount of metadata. If one file contains
> less or more metadata, all objects will appear in the diff file with
> their new metadata and bloat it up. The question is whether this is the
> desired behaviour (i.e. the ability to clean a file from metadata using
> large diffs) or if this behaviour is not desired and the tools
> generating diffs should compare the tags, location and members of
> objects which have the same ID but different metadata.
>
> - Some tools have bugs which lead to wrong diffs (e.g. missing
> modifications) if some metadata fields are missing.
>
> Best regards
>
> Michael
>
>
> [1]
> https://wiki.osmfoundation.org/wiki/Working_Group_Minutes#Licensing_Working_Group
> [2] Osmium also had this bug. But it was fixed on the master branch a
> few days ago.
> [3] Osmium cannot parse negative version numbers and throws an exception.
>
>
>
>
> ___
> dev mailing list
> dev@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev



signature.asc
Description: OpenPGP digital signature
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Christoph Hormann
On Wednesday 14 February 2018, Darafei "Komяpa" Praliaskouski wrote:
> >
> > While this seems a useful test to do i wonder how the timestamp and
> > version fields are relevant regarding privacy and personal data
> > protection?
>
> If OSM API were fast enough, it would allow to rather easily group
> changes back to changesets. Number of changesets gets you back number
> of mappers. Classifying a returning mapper by edit pattern would
> allow to get back the geometric median of their edits, which brings
> you to knowing where they live.

No, even if the API was infinitely fast limited bandwidth and the 
possibility to append new edits to existing open changesets would make 
this impossible.  

And even if you could identify the changeset a certain feature was last 
modified in that would still not allow you to conclude which changesets 
are created by the same user.

If you practically want to reverse engineer user identities from a 
planet file with user info stripped looking at the data itself and 
mapper specific data charateristics (tag combinations, the way 
geometries are drawn) would likely be more useful than versions and 
timestamps.

-- 
Christoph Hormann
http://www.imagico.de/

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Komяpa
ср, 14 февр. 2018 г. в 17:47, Christoph Hormann :

> On Wednesday 14 February 2018, Michael Reichert wrote:
> >
> > All challenges are run four times, one iteration with full metadata,
> > one with timestamp and version fields, one with version field only
> > and one without any metadata. [...]
>
> While this seems a useful test to do i wonder how the timestamp and
> version fields are relevant regarding privacy and personal data
> protection?
>

If OSM API were fast enough, it would allow to rather easily group changes
back to changesets. Number of changesets gets you back number of mappers.
Classifying a returning mapper by edit pattern would allow to get back the
geometric median of their edits, which brings you to knowing where they
live.

(To make OSM API upload faster don't forget to join the efforts in
https://github.com/zerebubuth/openstreetmap-cgimap/issues/140)



> I know a possible answer could be that in combination with personal data
> (like user names or ids) it would provide additional information on
> people.  But this argument applies to *any data* including the geometry
> (i.e. coordinates) and tags.
>
> So what is the special thing from a legal standpoint about versions and
> timestamps compared to geometries and tags?
>
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Christoph Hormann
On Wednesday 14 February 2018, Michael Reichert wrote:
>
> All challenges are run four times, one iteration with full metadata,
> one with timestamp and version fields, one with version field only
> and one without any metadata. [...]

While this seems a useful test to do i wonder how the timestamp and 
version fields are relevant regarding privacy and personal data 
protection?

I know a possible answer could be that in combination with personal data 
(like user names or ids) it would provide additional information on 
people.  But this argument applies to *any data* including the geometry 
(i.e. coordinates) and tags.

So what is the special thing from a legal standpoint about versions and 
timestamps compared to geometries and tags?

-- 
Christoph Hormann
http://www.imagico.de/

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Frederik Ramm
Hi,

On 14.02.2018 15:23, Martin Koppenhoefer wrote:
> it seems Brexit could become effective March next year. Maybe we just wait?

We would still have to apply EU regulations to processing the data of EU
citizens.

> I really hope we will not obfuscate or remove meta data because of some
> EU privacy regulation, please do not overreact.

The LWG is, or has been, discussing this with lawyers so I hope they
will come up with sensible recommendations. I don't think the new
regulations will be without consequences though.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Martin Koppenhoefer
2018-02-14 10:30 GMT+01:00 Michael Reichert :

> Hi,
>
> people are talking about potential changes to the amount of (personal)
> data distributed by OSM, in the light of new data protection laws
> becoming effective in the EU this May.



it seems Brexit could become effective March next year. Maybe we just wait?
What is the UK position regarding the planned EU data protection amendments?



> All challenges are run four times, one iteration with full metadata, one
> with timestamp and version fields, one with version field only and one
> without any metadata.



if you consider timestamps and version fields private data, you could also
consider object ids private data (they are assigned consecutively, you
could create delete frequently "test" objects to correlate object ids and
timestamps ;-) ).

I really hope we will not obfuscate or remove meta data because of some EU
privacy regulation, please do not overreact.

Cheers,
Martin
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Working with OSM data with less or no metadata

2018-02-14 Thread Michael Reichert
Hi,

people are talking about potential changes to the amount of (personal)
data distributed by OSM, in the light of new data protection laws
becoming effective in the EU this May. There haven't been any official
statements by the OSMF but discussions are going on in the LWG [1].

Even though it is still unclear what the concrete steps will be, I have
done some experiments. How well do our existing tools behave if you feed
them with OSM data that has less metadata than usual, or no metadata at
all? I have set up a test suite which tests Osmium-Tool (which uses the
Libosmium library; master branch), Osmosis 0.44.1 and Osmconvert 0.6.

The test suite is availabe at
https://github.com/geofabrik/metadata-test/
and consists of a Bash script. You need to have osmium, osmosis and
osmconvert in your path (or you have to modify the script a bit). The
test suite comes with its own hand crafted test data which will be first
converted to PBF by Osmium. Afterwards all three tools will prove
themselves in the following challenges:

- converting XML to PBF
- converting PBF to XML
- converting XML to XML
- applying a diff
- deriving changes between two OSM files

All challenges are run four times, one iteration with full metadata, one
with timestamp and version fields, one with version field only and one
without any metadata. Some PBF challenges will also have two variants –
one with DenseNodes and one without.

The results are files located in the output/ directory. You have to
inspect them manually, I have not written a tool to parse them and
output how many tests failed.

*Results*
I compiled the results into a spreadsheet. You can download it at
https://github.com/geofabrik/metadata-test/raw/master/table.ods

To sum them up:
- Osmium is the only programme which passes all format conversion tests.

- Osmosis cannot read any XML (OSM and OSC) files without timestamp and
version fields.

- Osmosis and Osmconvert [2] treat all metadata fields in the DenseInfo
message of the PBF format as mandatory. However, the format
specification doesn't declare these fields as mandatory. Therefore, they
write default values into PBF files if the input lacks these fields:
version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" (Osmosis [3]),
timestamp="1970-01-01T00:00:01Z" changeset="1" version="1" (Osmconvert)
This partially applies to the XML output of Osmosis, too.

- Deriving a diff file of the changes between two OSM files only works
if both files have the same amount of metadata. If one file contains
less or more metadata, all objects will appear in the diff file with
their new metadata and bloat it up. The question is whether this is the
desired behaviour (i.e. the ability to clean a file from metadata using
large diffs) or if this behaviour is not desired and the tools
generating diffs should compare the tags, location and members of
objects which have the same ID but different metadata.

- Some tools have bugs which lead to wrong diffs (e.g. missing
modifications) if some metadata fields are missing.

Best regards

Michael


[1]
https://wiki.osmfoundation.org/wiki/Working_Group_Minutes#Licensing_Working_Group
[2] Osmium also had this bug. But it was fixed on the master branch a
few days ago.
[3] Osmium cannot parse negative version numbers and throws an exception.


-- 
Michael Reichert  www.geofabrik.de
Geofabrik GmbHHandelsregister: HRB Mannheim 703657
Amalienstr. 44Geschaeftsfuehrung: C. Karch, F. Ramm
76133 Karlsruhe   Tel: 0721-1803560-3
reich...@geofabrik.de Fax: 0721-1803560-9



signature.asc
Description: OpenPGP digital signature
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev