Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table

Anthony Tue, 20 Sep 2011 18:17:26 -0700

Thanks for the explanation.  I guess I see what you're getting at now.
 Sorry I didn't see it sooner.


On Tue, Sep 20, 2011 at 8:50 PM, Brion Vibber <[email protected]> wrote:
> On Tue, Sep 20, 2011 at 5:36 PM, Anthony <[email protected]> wrote:
>
>> On Tue, Sep 20, 2011 at 3:37 PM, Happy Melon <[email protected]> wrote:
>> > It may or may not be an architecturally-better design to have it as a
>> > separate table, although considering how rapidly MW's 'architecture'
>> changes
>> > I'd say keeping things as simple as possible is probably a virtue.  But
>> that
>> > is the basis on which we should be deciding it.
>>
>> It's an intentional denormalization of the database done apparently
>> for performance reasons (although, I still can't figure out exactly
>> *why* it's being done as it still seems to be useful only for the dump
>> system, and therefore should be part of the dump system, not part of
>> mediawiki proper).  It doesn't even seem to apply to "normal", i.e.
>> non-Wikimedia, installations.
>>
>
> 1) Those dumps are generated by MediaWiki from MediaWiki's database -- try
> Special:Export on the web UI, some API methods, and the dumpBackup.php maint
> script family.
>
> 2) Checksums would be of fairly obvious benefit to verifying text storage
> integrity within MediaWiki's own databases (though perhaps best sitting on
> or keyed to the text table...?) Default installs tend to use simple
> plain-text or gzipped storage, but big installs like Wikimedia's sites (and
> not necessarily just us!) optimize storage space by batch-compressing
> multiple text nodes into a local or remote blobs table.
>
>
>> On Tue, Sep 20, 2011 at 4:45 PM, Happy Melon <[email protected]>
>> wrote:
>> > This is a big project which still retains enthusiasm because we recognise
>> > that it has equally big potential to provide interesting new features far
>> > beyond the immediate usecases we can construct now (dump validation and
>> > 'something to do with reversions').
>>
>> Can you explain how it's going to help with dump validation?  It seems
>> to me that further denormalizing the database is only going to
>> *increase* these sorts of problems.
>>
>
> You'd be able to confirm that the text in an XML dump, or accessible through
> the wiki directly, matches what the database thinks it contains -- and that
> a given revision hasn't been corrupted by some funky series of accidents in
> XML dump recycling or External Storage recompression.
>
> IMO that's about the only thing it's really useful for; detecting
> non-obviously-performed reversions seems like an edge case that's not worth
> optimizing for, since it would fail to handle lots of cases like reverting
> partial edits (say an "undo" of a section edit where there are other
> intermediary edits -- since the other parts of the page text are not
> identical, you won't get a match on the checksum).
>
> -- brion
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table

Reply via email to