Natalia, Regarding DatabaseRebuild.java ...
Well, I originally downloaded the binary 1.1 package (Windows), so I did not have a smooth setup for re-compiling the whole thing. As I wanted a very quick test, I did recompile DatabaseRebuild.java, by putting it in a separate directory branch, and using the jars as classpath context. It compiled fine. Then I used it to redo the rebuild, using "-p 1" as suggested. And tested with the xindice (1.1) commandline tool to list/retrieve collections/documents and it works. The result was, for all practical purposes, a 1.1 db representation that was equal in size to the original 1.0 db representation. The only size difference seems to be: - system/SysConfig ... 1.0: 24576 bytes vs 1.1: 32768 bytes - system/SysSymbols ... 1.0 659456 bytes vs 1.1: 651264 bytes which is completely negligible. Technically, the outcome of this initial test seems OK, so I will embark on a larger test in a week or two. As I had these challenges about disk space, it could be useful to find some heuristics on the Xindice Wiki http://wiki.apache.org/xindice/ about how settings of these parameters may influence the physical disk space needed. Some statements you made in the last mail certainly could be useful to find there. regards, /O Natalia Shilenkova wrote: > > On Thu, Apr 16, 2009 at 3:32 PM, OKO <ol...@sics.se> wrote: > [snip] >> >> The "1.1" vs "1.0" expansion factor is much larger for the case of >> collections with few and small files (factor 350) , than for many and >> small >> files (factor 25). >> >> But nevertheless, I wonder for what kind of databases 1.1 would beat 1.0 >> from a *size* point of view. Maybe comparing processing speed of "1.1" vs >> "1.0" would give a different picture, but at the moment I have no numbers >> to >> offer in that respect. > > Here is the deal: database format between 1.0 and 1.1 hasn't changed, > except some bugs in index files were fixed and the collection files > are now created properly, which in your case, unfortunately, means > they take a lot more space than before. The reason is that default > size is intended for bigger collections with many documents. > > The database you tried to rebuild is the corner case however, with > several collections that only have a single document each, so default > size settings are way too large. But this is only the default setting > and it can be change to more suitable number. This way, 1.1 database > should take about the same amount of space as the existing database. > >> By the way, I do not have any very large documents in the database, so I >> have no idea how "1.1" compares to "1.0" for such beasts. >> >> And the 64,000 $ question is, of course, what means are available to >> decrease disk space footprint of the "1.1" database? > > The following steps can help to reduce database size: > 1. Meta collections can be turned off. The setting is in the > config/system.xml file (In Xindice 1.1 directory). In the following > line: > > <root-collection dbroot="./db/" name="db" use-metadata="on"> > > change use-metadata="off". This will make all Meta collections go away. > > 2. Initial collection size can be adjusted (here I assume that all the > collections use default BTreeFiler to store data, HashFiler is > somewhat different beast). When creating a collection it can be given > a configuration to specify pagecount setting that directly affect > initial collection size: > > <collection compressed="true" inline-metadata="true" name="test"> > <filer class="org.apache.xindice.core.filer.BTreeFiler" pagecount="16" > /> > </collection> > > When creating collection from command-line tool, this setting can be > specified with --pagecount parameter: > > bin\xindice ac -c /db -n test --pagecount 16 > > The "perfect" value for pagecount depends on size and amount of > documents in a collection. For collections that have documents added > and modified often it should be picked based on how much data is > expected to be stored in the collection, but in any case it only sets > initial size, so collection file will grow when it is too small to > hold new data. > > In your case, when a collection has only one small document and this > situation is not expected to change, pagecount may very well be set to > 1. > > Now back to data migration... I made some changes to xindice_rebuild > to add optional pagecount parameter that will overwrite original > collection setting. > > If you have source version of Xindice download, please save attached > file to <xindice > directory>\java\src\org\apache\xindice\tools\DatabaseRebuild.java and > run build.bat or build.sh depending on your OS. After that, you can > try to rebuild database again using optional parameter: > > bin\xindice_rebuild.bat rebuild db -p 1 > > If you have binary version of Xindice download instead, please let me > know, I'll see what can be done for that. Alternatively, collections > can be rebuilt manually, by exporting documents, creating new > collections using command-line tool with the option "--pagecount 1" > and importing documents into new collections. > > Let me know if something doesn't work for you. > > Natalia > > > -- View this message in context: http://www.nabble.com/xindice_rebuild---file-size-vastly-multiplied-tp23078111p23155343.html Sent from the Xindice - Users mailing list archive at Nabble.com.