On Thu, Sep 21, 2017 at 6:10 AM, Daniel Kinzler <daniel.kinz...@wikimedia.de
> wrote:

> Yes, we could put it into a separate table. But that table would be
> exactly as
> tall as the content table, and would be keyed to it. I see no advantage.


The advantage is that MediaWiki almost would never need to use the hash
table. It would need to add the hash for a new revision there, but table
size is not much of an issue on INSERT; other than that, only slow
operations like export and API requests which explicitly ask for the hash
would need to join on that table.
Or this primarily a disk space concern?

> Also, since content is supposed to be deduplicated (so two revisions with
> > the exact same content will have the same content_address), cannot that
> > replace content_sha1 for revert detection purposes?
>
> Only if we could detect and track "manual" reverts. And the only reliable
> way to
> do this right now is by looking at the sha1.


The content table points to a blob store which is content-addressible and
has its own deduplication mechanism, right? So you just send it the content
to store, and get an address back, and in the case of a manual revert, that
address will be one that has already been used in other content rows. Or do
you need to detect the revert before saving it?

SHA1 is not that slow.
>

For the API/Special:Export definitely not. Maybe for generating the
official dump files it might be significant? A single sha1 operation on a
modern CPU should not take more than a microsecond: there are a few hundred
operations in a decently implemented sha1 and processors are in the GHz
range. PHP benchmarks [1] also give similar values. With the 64-byte block
size, that's something like 5 hours/TB - not sure how that compares to the
dump process itself (also it's probably running on lots of cores in
parallel).


[1] http://www.spudsdesign.com/benchmark/index.php?t=hash1
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to