https://bugzilla.wikimedia.org/show_bug.cgi?id=68931

--- Comment #3 from [email protected] ---
On-list [1] Kevin said
> Christian: before I prioritize it, can you scope out how much work
> would be required?

The items that immediatedly come mind are:

* Clarify which schemas are meant to get purged.

* Clarify how to handle future data (We're still seeing those events
  getting logged). We have no machinery in place to guard against data
  entering raw-logs.

* Clarify whether or not purging EventLogging's “raw-logs” is sufficient
  (Since the relevant part of the data flow starts at the caches, it
  goes through both the udp2log and kafka pipeline)

* Clarify if the event data got sent to universities (through udp2log
  forwards).

* If the event data got sent to universities (see above item), clarify
  how to proceed there.

* Get data removed from database
  (Either we get access, or we need to discuss with Sean or Ops)

* Get data removed from all relevant files in
     vanadium:/var/log/eventlogging/...

* Make sure the cleansed files from vanadium get rsynced over to
  stats1002, and stats1003.

* If necessary (see 3rd item), remove the data from kafka cosumers
  (Might be easier to just nuke current data, as we repaved Hadoop
  some days ago anyways)

* If necessary (see 3rd item), remove the data from udp2log consumers
  (Not sure. Might turn out that effectively no udp2log filter is
  actually selecting this data)

Taking a quick look, it seems data-collection might have started in
April 2014.

The 2nd and 3rd item probably need more discussion with Steven
(probably also legal, as some items are costly).

As our team lacks the required access for most of those parts, we
either need to get access [2], or consume more Ops time (which
requires more preparations on our end).

As the above list of items have some “Clarify” and “If” items, it's
hard to give an estimate. If those items do not resolve to much extra
work: Maybe 1-2 weeks total wall-clock time. But most of this time
will be waiting time. So maybe one or two man-days.




[1] http://lists.wikimedia.org/pipermail/analytics/2014-August/002367.html

[2] I already applied when receiving Steven's first email, and Toby
approved. But those items just require three days waiting.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to