https://bugzilla.wikimedia.org/show_bug.cgi?id=67128

--- Comment #9 from [email protected] ---
Discussing the solution to this item a bit more with ottomata last
week, it turned out that it might be better to incorporate the
duplicate checking into partition adding, and turning the aggregated
statistics into a means to set a “done” flag for data sets that do not
suffer obvious holes/duplicates.

That would help the general pipeline, as it allows to trigger further
parts of the pipeline based on the done flag and no longer encoding
the same timing heuristic again and again into pipelines.

However, partition adding is currently not working (as it is still
centered around the precursor of refinery). So we need to fix
partition adding before. But that's needed anyways to get webrequest
ingestion working in refinery, so it's not a wasted effort.

So the new requirements are:
  * Fixing the partition adding jobs
  * Integrating the duplicate monitoring there
  * Tag data sets as done (dependent on the outcome of the statistic
    computations)

With those changed requirements, this bug has been reestimated.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to