On Thu, Apr 16, 2009 at 3:32 PM, OKO <ol...@sics.se> wrote: [snip] > > The "1.1" vs "1.0" expansion factor is much larger for the case of > collections with few and small files (factor 350) , than for many and small > files (factor 25). > > But nevertheless, I wonder for what kind of databases 1.1 would beat 1.0 > from a *size* point of view. Maybe comparing processing speed of "1.1" vs > "1.0" would give a different picture, but at the moment I have no numbers to > offer in that respect.
Here is the deal: database format between 1.0 and 1.1 hasn't changed, except some bugs in index files were fixed and the collection files are now created properly, which in your case, unfortunately, means they take a lot more space than before. The reason is that default size is intended for bigger collections with many documents. The database you tried to rebuild is the corner case however, with several collections that only have a single document each, so default size settings are way too large. But this is only the default setting and it can be change to more suitable number. This way, 1.1 database should take about the same amount of space as the existing database. > By the way, I do not have any very large documents in the database, so I > have no idea how "1.1" compares to "1.0" for such beasts. > > And the 64,000 $ question is, of course, what means are available to > decrease disk space footprint of the "1.1" database? The following steps can help to reduce database size: 1. Meta collections can be turned off. The setting is in the config/system.xml file (In Xindice 1.1 directory). In the following line: <root-collection dbroot="./db/" name="db" use-metadata="on"> change use-metadata="off". This will make all Meta collections go away. 2. Initial collection size can be adjusted (here I assume that all the collections use default BTreeFiler to store data, HashFiler is somewhat different beast). When creating a collection it can be given a configuration to specify pagecount setting that directly affect initial collection size: <collection compressed="true" inline-metadata="true" name="test"> <filer class="org.apache.xindice.core.filer.BTreeFiler" pagecount="16" /> </collection> When creating collection from command-line tool, this setting can be specified with --pagecount parameter: bin\xindice ac -c /db -n test --pagecount 16 The "perfect" value for pagecount depends on size and amount of documents in a collection. For collections that have documents added and modified often it should be picked based on how much data is expected to be stored in the collection, but in any case it only sets initial size, so collection file will grow when it is too small to hold new data. In your case, when a collection has only one small document and this situation is not expected to change, pagecount may very well be set to 1. Now back to data migration... I made some changes to xindice_rebuild to add optional pagecount parameter that will overwrite original collection setting. If you have source version of Xindice download, please save attached file to <xindice directory>\java\src\org\apache\xindice\tools\DatabaseRebuild.java and run build.bat or build.sh depending on your OS. After that, you can try to rebuild database again using optional parameter: bin\xindice_rebuild.bat rebuild db -p 1 If you have binary version of Xindice download instead, please let me know, I'll see what can be done for that. Alternatively, collections can be rebuilt manually, by exporting documents, creating new collections using command-line tool with the option "--pagecount 1" and importing documents into new collections. Let me know if something doesn't work for you. Natalia
DatabaseRebuild.java
Description: Binary data