https://bugzilla.wikimedia.org/show_bug.cgi?id=68931
--- Comment #3 from [email protected] --- On-list [1] Kevin said > Christian: before I prioritize it, can you scope out how much work > would be required? The items that immediatedly come mind are: * Clarify which schemas are meant to get purged. * Clarify how to handle future data (We're still seeing those events getting logged). We have no machinery in place to guard against data entering raw-logs. * Clarify whether or not purging EventLogging's “raw-logs” is sufficient (Since the relevant part of the data flow starts at the caches, it goes through both the udp2log and kafka pipeline) * Clarify if the event data got sent to universities (through udp2log forwards). * If the event data got sent to universities (see above item), clarify how to proceed there. * Get data removed from database (Either we get access, or we need to discuss with Sean or Ops) * Get data removed from all relevant files in vanadium:/var/log/eventlogging/... * Make sure the cleansed files from vanadium get rsynced over to stats1002, and stats1003. * If necessary (see 3rd item), remove the data from kafka cosumers (Might be easier to just nuke current data, as we repaved Hadoop some days ago anyways) * If necessary (see 3rd item), remove the data from udp2log consumers (Not sure. Might turn out that effectively no udp2log filter is actually selecting this data) Taking a quick look, it seems data-collection might have started in April 2014. The 2nd and 3rd item probably need more discussion with Steven (probably also legal, as some items are costly). As our team lacks the required access for most of those parts, we either need to get access [2], or consume more Ops time (which requires more preparations on our end). As the above list of items have some “Clarify” and “If” items, it's hard to give an estimate. If those items do not resolve to much extra work: Maybe 1-2 weeks total wall-clock time. But most of this time will be waiting time. So maybe one or two man-days. [1] http://lists.wikimedia.org/pipermail/analytics/2014-August/002367.html [2] I already applied when receiving Steven's first email, and Toby approved. But those items just require three days waiting. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
