What is the current state, will some kind of digest be retained?

On Thu, Sep 21, 2017 at 9:56 PM, Gergo Tisza <[email protected]> wrote:

> On Thu, Sep 21, 2017 at 6:10 AM, Daniel Kinzler <
> [email protected]
> > wrote:
>
> > Yes, we could put it into a separate table. But that table would be
> > exactly as
> > tall as the content table, and would be keyed to it. I see no advantage.
>
>
> The advantage is that MediaWiki almost would never need to use the hash
> table. It would need to add the hash for a new revision there, but table
> size is not much of an issue on INSERT; other than that, only slow
> operations like export and API requests which explicitly ask for the hash
> would need to join on that table.
> Or this primarily a disk space concern?
>
> > Also, since content is supposed to be deduplicated (so two revisions with
> > > the exact same content will have the same content_address), cannot that
> > > replace content_sha1 for revert detection purposes?
> >
> > Only if we could detect and track "manual" reverts. And the only reliable
> > way to
> > do this right now is by looking at the sha1.
>
>
> The content table points to a blob store which is content-addressible and
> has its own deduplication mechanism, right? So you just send it the content
> to store, and get an address back, and in the case of a manual revert, that
> address will be one that has already been used in other content rows. Or do
> you need to detect the revert before saving it?
>
> SHA1 is not that slow.
> >
>
> For the API/Special:Export definitely not. Maybe for generating the
> official dump files it might be significant? A single sha1 operation on a
> modern CPU should not take more than a microsecond: there are a few hundred
> operations in a decently implemented sha1 and processors are in the GHz
> range. PHP benchmarks [1] also give similar values. With the 64-byte block
> size, that's something like 5 hours/TB - not sure how that compares to the
> dump process itself (also it's probably running on lots of cores in
> parallel).
>
>
> [1] http://www.spudsdesign.com/benchmark/index.php?t=hash1
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to