On Mon, May 11, 2009 at 9:20 AM, Matthias Apitz <[email protected]> wrote:
> Thanks for your explanation. Do you have an idea how much diskspace the
> searchindex will have, compared with the database size or the text size
> of the imported file 'text.txt'? The actual size of the DB is:

If you got only the latest revision of each article, the search index
should be comparable in size to the text table.  Otherwise, it should
be vastly smaller.  For comparison, here are my sizes from
http://www.twcenter.net/wiki/, which has homegrown content:

mysql> SELECT TABLE_NAME, DATA_LENGTH, INDEX_LENGTH,
DATA_LENGTH+INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE
TABLE_SCHEMA='wikidb' ORDER BY DATA_LENGTH+INDEX_LENGTH DESC LIMIT 5;
+-------------+-------------+--------------+--------------------------+
| TABLE_NAME  | DATA_LENGTH | INDEX_LENGTH | DATA_LENGTH+INDEX_LENGTH |
+-------------+-------------+--------------+--------------------------+
| text        |   134807552 |            0 |                134807552 |
| searchindex |     7662368 |      7068672 |                 14731040 |
| revision    |     4505600 |      6045696 |                 10551296 |
| logging     |     2637824 |      6324224 |                  8962048 |
| pagelinks   |     1327104 |      1523712 |                  2850816 |
+-------------+-------------+--------------+--------------------------+
5 rows in set (0.91 sec)

Note how the text table is an order of magnitude larger than
searchindex, because only the most recent revision is indexed.  For a
full eswiki dump I'd assume it would be a much larger difference,
because Spanish Wikipedia pages tend to have many more revisions than
my wiki.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to