Hi Frederik, On 28 January 2013 20:49, Frederik Ramm <[email protected]> wrote:
> Hi, > > with the license change we introduced the concept of "redacted" > objects. Since "redacting" an old version touches that version in the > database, initially such redactions made Osmosis issue diffs that contained > that old version; we then introduced a quick fix to stop that: > > https://github.com/**openstreetmap/osmosis/blob/** > master/apidb/src/main/java/**org/openstreetmap/osmosis/** > apidb/v0_6/impl/EntityDao.**java#L450<https://github.com/openstreetmap/osmosis/blob/master/apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/EntityDao.java#L450> > > We're now also using "redaction" to suppress objects where a copyright > violation has occurred - but mistakes are possible, so we need to have a > way to un-redact things if necessary, i.e. remove the "redaction_id" from a > historic version again. > > Simply setting the column to NULL will, again, make Osmosis issue a diff > that contains the old version; this is unwanted. > > How could we proceed? > > Ideas: > > 1. Introduce special value "0" (not NULL) to denote an un-redacted object; > leave Osmosis unchanged (so it treats NULL and 0 differently, will only > issue .osc for objects with redaction_id=NULL), and modify other API code > to treat 0 and NULL the same (so historic versions can be accessed through > the API if redaction_id=NULL or 0). Cheap, easy, but a bit ugly. > > 2. Introduce an additional column "suppress_diff" to nodes/ways/relations > tables; on un-redaction, set redaction_id=NULL and suppress_diff=TRUE; > modify Osmosis by assing an "and not suppress_diff" to the SQL query. Would > increase database size by something like 4 GB for the extra column. > > 3. Introduce an additional table "un-redacted objects", store object type, > version, and id; when an object is un-redacted, add it to that table and > clear the object's redaction_id, then modify the Osmosis query to only > output objects that are not found in that table. Uses little space but > makes diff creation slower. > 4. Modify the API (or possibly use a trigger) to populate a column called create_xid with the current transaction id instead of relying on the implicit xmin column. It will only be set during initial row insert and won't get changed if redaction_id is modified. Modify Osmosis to base replication off this new create_xid column instead of xmin. Drop the existing xmin index and add one to the new create_xid column. Osmosis should ignore redacted objects in case historical replication jobs are run (ie. honour the redaction_id column), although the existing query probably already does this. This approach adds an unsigned 32-bit integer column to all entity history tables (ie. nodes, ways and relations) which will grow the db somewhat (4 bytes x 2.5 billion rows ~= 100GB??). Option 4 feels the least like a workaround to me, it lets replication work the way it was supposed to and allows the redaction_id column to be updated without side effects. I don't mind option 1 because it is very simple. 2 and 3 add overhead which might be okay if they were elegant, but they still feel like workarounds for something broken. Does option 2 only increase the db size by 4 GB? I would have thought that even if a boolean takes 1 byte (I should check ...) it would take approx 25GB but my maths may be failing me :-) Having said all that, I don't care too strongly about which approach is used. Brett
_______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
