>On Mon, Sep 19, 2011 at 12:53 PM, Asher Feldman <afeldman [at] wikimedia>wrote: > >> Since the primary use case here seems to be offline analysis and it may not >> be of much interest to mediawiki users outside of wmf, can we store the >> checksums in new tables (i.e. revision_sha1) instead of running large >> alters, and implement the code to generate checksums on new edits via an >> extension? >> >> Checksums for most old revs can be generated offline and populated before
>> the extension goes live. Since nothing will be using the new table yet, >> there'd be no issues with things like gap lock contention on the revision >> table from mass populating it. >> > > That's probably the simplest solution; adding a new empty table will be very > quick. It may make it slower to use the field though, depending on what all > uses/exposes it. > > During stub dump generation for instance this would need to add a left outer > join on the other table, and add things to the dump output (and also needs > an update to the XML schema for the dump format). This would then need to be > preserved through subsequent dump passes as well. > > -- brion Can we resist the temptation to implement schema changes as new tables purely to make life easier for Wikimedia? Core schema changes are certainly enough of a hurdle to warrant serious discussion, but they are not the totally-intractable mess that they used to be. 1.19 already includes index changes to the user and logging tables; it will already require the full game of musical chairs with the db slaves. Implementing this as a new column does not actually make things any more complicated, it would just mean that an operation that would take three hours before might now take five. It may or may not be an architecturally-better design to have it as a separate table, but that is the basis on which we should be deciding it. This is a big project which still retains enthusiasm because we recognise that it has equally big potential to provide interesting new features far beyond the immediate usecases we can construct now (dump validation and 'something to do with reversions'). Let's not hamstring it at birth based on the operational pressures of the one MediaWiki end user who is best placed to overcome said issues. --HM _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
