On Mon, Apr 6, 2009 at 3:57 PM, <[email protected]> wrote: > I'm curious what does > SELECT COUNT(DISTINCT old_text), COUNT(*) FROM text; > show on Wikipedia's database? On mine I get > COUNT(DISTINCT old_text): 2913 > COUNT(*): 3560 > I.e., 1/7 of the rows are redundant.
As others have noted, Wikimedia compresses everything and doesn't really store lots of redundant text. That said, past analysis of edit summaries suggest that about 1 edit in 10 is a revert on enwiki. -Robert Rohde _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
