Alternatively, perhaps "hash" could be an optional part of an MCR chunk?
We could keep it for the wikitext, but drop the hash for the metadata, and
drop any support for a "combined" hash over wikitext + all-other-pieces.

...which begs the question about how reverts work in MCR.  Is it just the
wikitext which is reverted, or do categories and other metadata revert as
well?  And perhaps we can just mark these at revert time instead of trying
to reconstruct it after the fact?
 --scott

On Fri, Sep 15, 2017 at 4:13 PM, Stas Malyshev <smalys...@wikimedia.org>
wrote:

> Hi!
>
> On 9/15/17 1:06 PM, Andrew Otto wrote:
> >> As a random idea - would it be possible to calculate the hashes
> > when data is transitioned from SQL to Hadoop storage?
> >
> > We take monthly snapshots of the entire history, so every month we’d
> > have to pull the content of every revision ever made :o
>
> Why? If you already seen that revision in previous snapshot, you'd
> already have its hash? Admittedly, I have no idea how the process works,
> so I am just talking out of general knowledge and may miss some things.
> Also of course you already have hashes from revs till this day and up to
> the day we decide to turn the hash off. Starting that day, it'd have to
> be generated, but I see no reason to generate one more than once?
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
(http://cscott.net)
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to