Hi!

> We should hear from Joseph, Dan, Marcel, and Aaron H on this I think, but
> from the little I know:
> 
> Most analytical computations (for things like reverts, as you say) don’t
> have easy access to content, so computing SHAs on the fly is pretty hard.
> MediaWiki history reconstruction relies on the SHA to figure out what
> revisions revert other revisions, as there is no reliable way to know if
> something is a revert other than by comparing SHAs.

As a random idea - would it be possible to calculate the hashes when
data is transitioned from SQL to Hadoop storage? I imagine that would
slow down the transition, but not sure if it'd be substantial or not. If
we're using the hash just to compare revisions, we could also use
different hash (maybe non-crypto hash?) which may be faster.

-- 
Stas Malyshev
smalys...@wikimedia.org

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to