What is the current state, will some kind of digest be retained? On Thu, Sep 21, 2017 at 9:56 PM, Gergo Tisza <[email protected]> wrote:
> On Thu, Sep 21, 2017 at 6:10 AM, Daniel Kinzler < > [email protected] > > wrote: > > > Yes, we could put it into a separate table. But that table would be > > exactly as > > tall as the content table, and would be keyed to it. I see no advantage. > > > The advantage is that MediaWiki almost would never need to use the hash > table. It would need to add the hash for a new revision there, but table > size is not much of an issue on INSERT; other than that, only slow > operations like export and API requests which explicitly ask for the hash > would need to join on that table. > Or this primarily a disk space concern? > > > Also, since content is supposed to be deduplicated (so two revisions with > > > the exact same content will have the same content_address), cannot that > > > replace content_sha1 for revert detection purposes? > > > > Only if we could detect and track "manual" reverts. And the only reliable > > way to > > do this right now is by looking at the sha1. > > > The content table points to a blob store which is content-addressible and > has its own deduplication mechanism, right? So you just send it the content > to store, and get an address back, and in the case of a manual revert, that > address will be one that has already been used in other content rows. Or do > you need to detect the revert before saving it? > > SHA1 is not that slow. > > > > For the API/Special:Export definitely not. Maybe for generating the > official dump files it might be significant? A single sha1 operation on a > modern CPU should not take more than a microsecond: there are a few hundred > operations in a decently implemented sha1 and processors are in the GHz > range. PHP benchmarks [1] also give similar values. With the 64-byte block > size, that's something like 5 hours/TB - not sure how that compares to the > dump process itself (also it's probably running on lots of cores in > parallel). > > > [1] http://www.spudsdesign.com/benchmark/index.php?t=hash1 > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
