Thanks for the explanation. I guess I see what you're getting at now. Sorry I didn't see it sooner.
On Tue, Sep 20, 2011 at 8:50 PM, Brion Vibber <[email protected]> wrote: > On Tue, Sep 20, 2011 at 5:36 PM, Anthony <[email protected]> wrote: > >> On Tue, Sep 20, 2011 at 3:37 PM, Happy Melon <[email protected]> wrote: >> > It may or may not be an architecturally-better design to have it as a >> > separate table, although considering how rapidly MW's 'architecture' >> changes >> > I'd say keeping things as simple as possible is probably a virtue. But >> that >> > is the basis on which we should be deciding it. >> >> It's an intentional denormalization of the database done apparently >> for performance reasons (although, I still can't figure out exactly >> *why* it's being done as it still seems to be useful only for the dump >> system, and therefore should be part of the dump system, not part of >> mediawiki proper). It doesn't even seem to apply to "normal", i.e. >> non-Wikimedia, installations. >> > > 1) Those dumps are generated by MediaWiki from MediaWiki's database -- try > Special:Export on the web UI, some API methods, and the dumpBackup.php maint > script family. > > 2) Checksums would be of fairly obvious benefit to verifying text storage > integrity within MediaWiki's own databases (though perhaps best sitting on > or keyed to the text table...?) Default installs tend to use simple > plain-text or gzipped storage, but big installs like Wikimedia's sites (and > not necessarily just us!) optimize storage space by batch-compressing > multiple text nodes into a local or remote blobs table. > > >> On Tue, Sep 20, 2011 at 4:45 PM, Happy Melon <[email protected]> >> wrote: >> > This is a big project which still retains enthusiasm because we recognise >> > that it has equally big potential to provide interesting new features far >> > beyond the immediate usecases we can construct now (dump validation and >> > 'something to do with reversions'). >> >> Can you explain how it's going to help with dump validation? It seems >> to me that further denormalizing the database is only going to >> *increase* these sorts of problems. >> > > You'd be able to confirm that the text in an XML dump, or accessible through > the wiki directly, matches what the database thinks it contains -- and that > a given revision hasn't been corrupted by some funky series of accidents in > XML dump recycling or External Storage recompression. > > IMO that's about the only thing it's really useful for; detecting > non-obviously-performed reversions seems like an edge case that's not worth > optimizing for, since it would fail to handle lots of cases like reverting > partial edits (say an "undo" of a section edit where there are other > intermediary edits -- since the other parts of the page text are not > identical, you won't get a match on the checksum). > > -- brion > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
