The DWG periodically receives request to redact content from the database. As the person generally responsible for running those redactions, I thought I'd share some detail on what information I need to redact something. This is not going to cover the legal reasons to redact something. Those are not handled by the DWG but by the LWG. Most redaction cases are legally pretty simple. This is also not going to cover DMCA requests which require "Identification of the material that is claimed to be infringing or to be the subject of infringing activity and that is to be removed or access to which is to be disabled, and information reasonably sufficient to permit the service provider to locate the material."
I am using bad data as shorthand for data that needs to be redacted. I'm also not going to cover verifying that something needs to be redacted. The changeset redaction bot is a ruby program that uses the same logic as the redaction bot run on the database with the ODbL changeover. As it doesn't have DB access it uses API calls and does not run as quickly so if a massive redaction ever needed to be done it might not be suitable. Aside from assorted configuration options, it can take a list of what to redact in three ways: 1. A changeset. This is the most common way to call it and generally it is easiest to deal with a redaction request if they have provided a list of changesets. I have scripts that will take a list of changesets and call it for each changeset, verifying the results. The changeset is downloaded as a .osc file and processed 2. A .osc file. This is mainly used for changesets that are too large to download through the API. It is parsed to get a list of objects and versions. 3. A list of objects where bad data was added. An example entry from a list would be "w1234v5" which indicates that version 5 of way 1234 introduced bad data. n is used for node and r for relation. Typically it will be version 1 of an object where bad data was introduced. 4. An object and a version range. There is a special script that can be used to redact specific objects from the database without applying the normal logic to determine what versions. This is not normally used and was only used about 5 times total. Once it has got a list of objects, it then proceeds to download them from the API and use its logic to determine what needs to be removed. It then does two things 1. Deletes data as required using changesets. In many cases this essentially the same as reverting the changeset so if someone has already reverted it nothing is done here. 2. Uses the redaction API call to hide old versions of the objects that cannot be shown. What does this mean for you as a mapper requesting a redaction? The preferred way to request a redaction is generally to give a list of changesets that need to be processed. Anything other requests generally have to be turned into a list of changesets. Sometimes only a small part of a changeset needs to be redacted. In those cases a list of objects may cause less damage from redacting unnecessary content. I have assorted tools including a changeset database and a pgsnapshot database. I can turn information like "all changesets by this user with '$foo' in the comment" into a list of changesets fairly easily. The DWG also has experience identifying exactly what to redact but it is preferable to request them in one of the above formats. Requests that need investigation to determine what to do will take much longer. _______________________________________________ talk mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk

