Natalia, Thanks for offering some hints. To put some data on the table ...
Below some size quantities for two parts of the database (1) a collection with few and small documents (2) a collection with many small documents For each of these I give size of (A) exported format (B) xindice 1.0 database (C) xindice 1.1 database Sizes are computed by "du" (running Cygwin). Summary table of sizes in KB (table more readable in fixed font ;-) ): . | (A) | (B) | (C) | . (1) | 20 | 250 | 84000 | . (2) | 4000 | 450 | 12000 | The "1.1" vs "1.0" expansion factor is much larger for the case of collections with few and small files (factor 350) , than for many and small files (factor 25). But nevertheless, I wonder for what kind of databases 1.1 would beat 1.0 from a *size* point of view. Maybe comparing processing speed of "1.1" vs "1.0" would give a different picture, but at the moment I have no numbers to offer in that respect. By the way, I do not have any very large documents in the database, so I have no idea how "1.1" compares to "1.0" for such beasts. And the 64,000 $ question is, of course, what means are available to decrease disk space footprint of the "1.1" database? /O ========================== (1) a set of collections with few and small documents (A) This is a size listing of (part of) the exported 1.0 database. Result: total of 20KB contents. Most of these files are small. (Note: the "meta" collections/files are my own meta info of the application level, so not Xindice meta info!). xxx$ du -ba publications 553 publications/rss-common/meta/meta.xml 1065 publications/rss-common/meta 1577 publications/rss-common 257 publications/rss_2_0/meta/meta.xml 769 publications/rss_2_0/meta 1281 publications/rss_2_0 260 publications/rss_0_92/meta/meta.xml 772 publications/rss_0_92/meta 1284 publications/rss_0_92 260 publications/rss_0_91/meta/meta.xml 772 publications/rss_0_91/meta 1284 publications/rss_0_91 438 publications/newsletter/meta/meta.xml 950 publications/newsletter/meta 1462 publications/newsletter 512 publications/archive/meta 1024 publications/archive 429 publications/newsletter-index/meta/meta.xml 941 publications/newsletter-index/meta 1453 publications/newsletter-index 427 publications/news-archive/meta/meta.xml 939 publications/news-archive/meta 1451 publications/news-archive 394 publications/home-page/meta/meta.xml 906 publications/home-page/meta 1418 publications/home-page 257 publications/rss_1_0/meta/meta.xml 769 publications/rss_1_0/meta 1281 publications/rss_1_0 315 publications/home-page-time-stamp/meta/meta.xml 827 publications/home-page-time-stamp/meta 1339 publications/home-page-time-stamp 350 publications/events/meta/meta.xml 862 publications/events/meta 1374 publications/events 1634 publications/meta/meta.xml 2146 publications/meta 18886 publications xxx$ (B) The size of the corresponding 1.0 database files Result: total of 250KB contents. yyy$ du -ba publications/ 12288 publications/archive/archive.tbl 12288 publications/archive/meta/meta.tbl 12288 publications/archive/meta 24576 publications/archive 12288 publications/home-page/home-page.tbl 12288 publications/home-page/meta/meta.tbl 12288 publications/home-page/meta 24576 publications/home-page 12288 publications/meta/meta.tbl 12288 publications/meta 12288 publications/news-archive/meta/meta.tbl 12288 publications/news-archive/meta 12288 publications/news-archive/news-archive.tbl 24576 publications/news-archive 12288 publications/newsletter/meta/meta.tbl 12288 publications/newsletter/meta 12288 publications/newsletter/newsletter.tbl 24576 publications/newsletter 12288 publications/publications.tbl 12288 publications/rss-common/meta/meta.tbl 12288 publications/rss-common/meta 12288 publications/rss-common/rss-common.tbl 24576 publications/rss-common 12288 publications/rss_0_91/meta/meta.tbl 12288 publications/rss_0_91/meta 12288 publications/rss_0_91/rss_0_91.tbl 24576 publications/rss_0_91 12288 publications/rss_0_92/meta/meta.tbl 12288 publications/rss_0_92/meta 12288 publications/rss_0_92/rss_0_92.tbl 24576 publications/rss_0_92 12288 publications/rss_1_0/meta/meta.tbl 12288 publications/rss_1_0/meta 12288 publications/rss_1_0/rss_1_0.tbl 24576 publications/rss_1_0 12288 publications/rss_2_0/meta/meta.tbl 12288 publications/rss_2_0/meta 12288 publications/rss_2_0/rss_2_0.tbl 24576 publications/rss_2_0 245760 publications/ yyy$ (C) ... and of the corresponding 1.1 database files Result: total of 84000KB contents. zzz$ du -ba publications/ 4202496 publications/archive/archive.tbl 4202496 publications/archive/meta/meta.tbl 4202496 publications/archive/meta 8404992 publications/archive 4202496 publications/home-page/home-page.tbl 4202496 publications/home-page/meta/meta.tbl 4202496 publications/home-page/meta 8404992 publications/home-page 4202496 publications/meta/meta.tbl 4202496 publications/meta 4202496 publications/news-archive/meta/meta.tbl 4202496 publications/news-archive/meta 4202496 publications/news-archive/news-archive.tbl 8404992 publications/news-archive 4202496 publications/newsletter/meta/meta.tbl 4202496 publications/newsletter/meta 4202496 publications/newsletter/newsletter.tbl 8404992 publications/newsletter 4202496 publications/publications.tbl 4202496 publications/rss-common/meta/meta.tbl 4202496 publications/rss-common/meta 4202496 publications/rss-common/rss-common.tbl 8404992 publications/rss-common 4202496 publications/rss_0_91/meta/meta.tbl 4202496 publications/rss_0_91/meta 4202496 publications/rss_0_91/rss_0_91.tbl 8404992 publications/rss_0_91 4202496 publications/rss_0_92/meta/meta.tbl 4202496 publications/rss_0_92/meta 4202496 publications/rss_0_92/rss_0_92.tbl 8404992 publications/rss_0_92 4202496 publications/rss_1_0/meta/meta.tbl 4202496 publications/rss_1_0/meta 4202496 publications/rss_1_0/rss_1_0.tbl 8404992 publications/rss_1_0 4202496 publications/rss_2_0/meta/meta.tbl 4202496 publications/rss_2_0/meta 4202496 publications/rss_2_0/rss_2_0.tbl 8404992 publications/rss_2_0 84049920 publications/ zzz$ ========================== (2) a collection with a large set of small documents (A) This is a size listing of (part of) the exported 1.0 database. This one contains approx 4000 rather small documents. Result: total of 4000KB contents (so approx 1KB per document) xxx$ du -b content/news 3932206 content/news/db 665 content/news/meta 3933383 content/news xxx$ (note: not displaying individual files/documents ... there are too many of them) (B) The size of the corresponding 1.0 database files Result: total of 450KB contents. yyy$ du -ba content/news/ 430080 content/news/db/db.tbl 430080 content/news/db 12288 content/news/meta/meta.tbl 12288 content/news/meta 12288 content/news/news.tbl 454656 content/news/ yyy$ (C) ... and of the corresponding 1.1 database files Result: total of 12MB contents. zzz$ du -ba content/news/ 4202496 content/news/db/db.tbl 4202496 content/news/db 4202496 content/news/meta/meta.tbl 4202496 content/news/meta 4202496 content/news/news.tbl 12607488 content/news/ zzz$ ===end=== Natalia Shilenkova wrote: > > There was a change in Xindice v1.1 that could possibly be responsible > for database size increase. > > Xindice v1.0 failed to correctly allocate initial file space for a > collection according to its page size and page count parameters, so a > collection with just a few documents would occupy several Kb on disk > instead of reserving some space on disk for the collection to grow. It > was fixed in v1.1 and a collection with default page size (4Kb) and > page count (1024) parameters now occupies about 4Mb. > > If you had several small collections in Xindice v1.0 it is possible > that after rebuilding them for v1.1 the database would take > considerably more disk space. How many collections do you have in that > database and how big are the collections? If this indeed is the reason > for the database increase, it could be fixed by adjusting page count > parameter for small collections. > > Also, I think Meta collections were introduced somewhere between 1.0 > and 1.1, I cannot remember now if they are created when rebuilding a > database, but if they are, it would take some disk space, too. You can > explore database directories to see if Meta collections are there (in > system/Metas, I think). Meta collections can be turned off. > > Natalia > > On Thu, Apr 16, 2009 at 9:35 AM, OKO <ol...@sics.se> wrote: >> >> Ran xindice_rebuild on a smallish existing 1.0 database to get it into >> 1.1 >> format. >> The size was expanded enormously: >> db 1.0: 2 M >> db 1.1: 233 M >> >> That is a factor of 100 ;-( >> >> I have not dared do this on my real 1.0 database, which now occupies more >> than 1G of disk space. >> >> *Question*: >> - Is this a feature or a bug? >> >> Tool info as presented: >> $ xindice -h >> trying to register database >> >> Xindice Command Tools v1.1 >> >> ...etc... >> >> /O >> -- >> View this message in context: >> http://www.nabble.com/xindice_rebuild---file-size-vastly-multiplied-tp23078111p23078111.html >> Sent from the Xindice - Users mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/xindice_rebuild---file-size-vastly-multiplied-tp23078111p23084822.html Sent from the Xindice - Users mailing list archive at Nabble.com.