On Mon, Apr 6, 2009 at 3:57 PM,  <[email protected]> wrote:
> I'm curious what does
>  SELECT COUNT(DISTINCT old_text), COUNT(*) FROM text;
> show on Wikipedia's database? On mine I get
>  COUNT(DISTINCT old_text): 2913
>                  COUNT(*): 3560
> I.e., 1/7 of the rows are redundant.

As others have noted, Wikimedia compresses everything and doesn't
really store lots of redundant text.

That said, past analysis of edit summaries suggest that about 1 edit
in 10 is a revert on enwiki.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to