Re: [osmosis-dev] Un-Redacting Stuff

Brett Henderson Thu, 07 Feb 2013 01:36:26 -0800

Hi Frederik,

On 28 January 2013 20:49, Frederik Ramm <[email protected]> wrote:


> Hi,
>
>    with the license change we introduced the concept of "redacted"
> objects. Since "redacting" an old version touches that version in the
> database, initially such redactions made Osmosis issue diffs that contained
> that old version; we then introduced a quick fix to stop that:
>
> https://github.com/**openstreetmap/osmosis/blob/**
> master/apidb/src/main/java/**org/openstreetmap/osmosis/**
> apidb/v0_6/impl/EntityDao.**java#L450<https://github.com/openstreetmap/osmosis/blob/master/apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/EntityDao.java#L450>
>
> We're now also using "redaction" to suppress objects where a copyright
> violation has occurred - but mistakes are possible, so we need to have a
> way to un-redact things if necessary, i.e. remove the "redaction_id" from a
> historic version again.
>
> Simply setting the column to NULL will, again, make Osmosis issue a diff
> that contains the old version; this is unwanted.
>
> How could we proceed?
>
> Ideas:
>
> 1. Introduce special value "0" (not NULL) to denote an un-redacted object;
> leave Osmosis unchanged (so it treats NULL and 0 differently, will only
> issue .osc for objects with redaction_id=NULL), and modify other API code
> to treat 0 and NULL the same (so historic versions can be accessed through
> the API if redaction_id=NULL or 0). Cheap, easy, but a bit ugly.
>
> 2. Introduce an additional column "suppress_diff" to nodes/ways/relations
> tables; on un-redaction, set redaction_id=NULL and suppress_diff=TRUE;
> modify Osmosis by assing an "and not suppress_diff" to the SQL query. Would
> increase database size by something like 4 GB for the extra column.
>
> 3. Introduce an additional table "un-redacted objects", store object type,
> version, and id; when an object is un-redacted, add it to that table and
> clear the object's redaction_id, then modify the Osmosis query to only
> output objects that are not found in that table. Uses little space but
> makes diff creation slower.
>

4. Modify the API (or possibly use a trigger) to populate a column called
create_xid with the current transaction id instead of relying on the
implicit xmin column.  It will only be set during initial row insert and
won't get changed if redaction_id is modified.  Modify Osmosis to base
replication off this new create_xid column instead of xmin.  Drop the
existing xmin index and add one to the new create_xid column.  Osmosis
should ignore redacted objects in case historical replication jobs are run
(ie. honour the redaction_id column), although the existing query probably
already does this.  This approach adds an unsigned 32-bit integer column to
all entity history tables (ie. nodes, ways and relations) which will grow
the db somewhat (4 bytes x 2.5 billion rows ~= 100GB??).

Option 4 feels the least like a workaround to me, it lets replication work
the way it was supposed to and allows the redaction_id column to be updated
without side effects.  I don't mind option 1 because it is very simple.  2
and 3 add overhead which might be okay if they were elegant, but they still
feel like workarounds for something broken.  Does option 2 only increase
the db size by 4 GB?  I would have thought that even if a boolean takes 1
byte (I should check ...) it would take approx 25GB but my maths may be
failing me :-)

Having said all that, I don't care too strongly about which approach is
used.

Brett

_______________________________________________
osmosis-dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/osmosis-dev

Re: [osmosis-dev] Un-Redacting Stuff

Reply via email to