Re: [OSM-dev] Harmless edits (was: Change in wtfe.gryph.de "Quick History Service" API)

2011-12-03 Thread Sarah Hoffmann
Hi,

On Sat, Dec 03, 2011 at 01:13:14AM +0100, Frederik Ramm wrote:
>I have finalized a script that can analyze an object's history
> and determine if certain edits are "non-edits" (i.e. nothing of note
> was changed at all), or "harmess" (i.e. the object was changed and
> might have to be rolled back if the contributor does not agree to
> the license change, but the rollback will likely not affect the
> quality much).
> 
> [...]
> 
> You can try out my script here, by adding a way/node/relation id to
> the URL like so:
> 
> http://wtfe.gryph.de/harmless/way/40103577
> 
> The output is a break-down of what my script thinks has happened to
> the object, and which edits are zero-edits ("severity: 0") or
> harmless ("severity: 1"). After the version analysis, it summarizes
> the user contributions - each user is afforded the highest severity
> of all his changes.

There is a small oddity: when a non-agreeing user deleted an object
then the script notes that down as a zero-edit and ignores the fact
that the object is gone. Example:

http://wtfe.gryph.de/harmless/node/1300187843

see history: http://www.openstreetmap.org/browse/node/1300187843/history

Might happen only if there are no tags.

I'm also slightly confused by the user summary. What do 'relevant change'
and 'no change' mean? I would have expected 'relevant change' to be only
those that are non-zero edits by non-agreers but that does not seem to be
the case.


Sarah

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Harmless edits (was: Change in wtfe.gryph.de "Quick History Service" API)

2011-12-03 Thread Sarah Hoffmann
On Sat, Dec 03, 2011 at 01:13:14AM +0100, Frederik Ramm wrote:
> Everyone is invited to play with this script and see what happens. I
> plan to make this the basis of the v2 WTFE service, meaning that in
> the future editors will likely *not* highlight stuff that my script
> deems harmless.
> 
> Here's the - hacky, perly - source code: http://wtfe.gryph.de/harmless.pl
> 
> Please don't do mass evaluations with this web service, as it runs a
> "history" query against the API in the backend and this is quite
> costly. If you want to run this on a large area, download the .pl
> file and make yourself a full history extract with Peter Koerner's
> history splitter, then run the perl script on the XML. It can
> process anything up to the complete planet if you have the patience.

Just a reminder on the side: Simon Poole provides history extracts for 
some countries here: http://odbl.poole.ch/extracts/ They are softcut
using the polygons from Geofabrik. Might come in handy here.

Sarah

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Harmless edits (was: Change in wtfe.gryph.de "Quick History Service" API)

2011-12-02 Thread Frederik Ramm

Hi,

   I have finalized a script that can analyze an object's history and 
determine if certain edits are "non-edits" (i.e. nothing of note was 
changed at all), or "harmess" (i.e. the object was changed and might 
have to be rolled back if the contributor does not agree to the license 
change, but the rollback will likely not affect the quality much).


The idea behind this is to provide some help in prioritizing the 
re-mapping effort. If someone who doesn't agree to the contributor terms 
has made an important contribution then we want to re-map that soon; in 
places where the same guy has just removed a few created_by tags we can 
ignore that for now.


My analysis does not mean that something I classify as "harmless" will 
not be reverted when the license change comes; it might well be. But if 
it gets reverted, the consequences will be neglectable.


What I'm doing is basically look at the object history, identify each 
contributor, and find out:


* have they made at least one "normal" contribution to the object - 
added a node to a way, added or changed a tag, moved a node by more than 
one metre?


* if not, have they made at least one "harmless" contribution - removed 
a tag, a node, or a member; moved a node by less than one metre?


* if not, then they are a "zero contributor" to that object.

We do indeed have a number of "zero contributors", from times where 
different editors had different malfunctions - e.g. for a while, if you 
did a "select all" in JOSM then removed a tag, all objects would be 
marked as changed even if they did not contain the tag, and you would 
appera in the object's edit history even though you never changed it. Or 
Potlatch at some time used to mark a ways' member nodes as changed when 
you changed the way.


(If an object is reverted to an earlier state, then all intermediate 
edits count as "zero contributions" as well - they might have been 
valuable but they are not part of the visible object any more.)


You can try out my script here, by adding a way/node/relation id to the 
URL like so:


http://wtfe.gryph.de/harmless/way/40103577

The output is a break-down of what my script thinks has happened to the 
object, and which edits are zero-edits ("severity: 0") or harmless 
("severity: 1"). After the version analysis, it summarizes the user 
contributions - each user is afforded the highest severity of all his 
changes.


The most important output of my script is if it finds that an object 
that currently looks "tainted" because someone who does not agree to the 
license change has touched it, is not really problematic at all because 
the change in question was harmless.


This is the case in the above "way 40103577" example. The version 
history contains an edit by non-agreeing user 263596, therefore the 
whole object looks problematic. My script finds out that this edit is 
simply a tag deletion, and because all other edits are by people who 
have agreed to the license change, the object does not have to be a top 
priority for remapping.


Everyone is invited to play with this script and see what happens. I 
plan to make this the basis of the v2 WTFE service, meaning that in the 
future editors will likely *not* highlight stuff that my script deems 
harmless.


Here's the - hacky, perly - source code: http://wtfe.gryph.de/harmless.pl

Please don't do mass evaluations with this web service, as it runs a 
"history" query against the API in the backend and this is quite costly. 
If you want to run this on a large area, download the .pl file and make 
yourself a full history extract with Peter Koerner's history splitter, 
then run the perl script on the XML. It can process anything up to the 
complete planet if you have the patience.


Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev