Natalia,

Regarding DatabaseRebuild.java ...

Well, I originally downloaded the binary 1.1 package (Windows), so I did not
have a smooth setup for re-compiling the whole thing. 

As I wanted a very quick test, I did recompile DatabaseRebuild.java, by
putting it in a separate directory branch, and using the jars as classpath
context. It compiled fine.

Then I used it to redo the rebuild, using "-p 1" as suggested.
And tested with the xindice (1.1) commandline tool to list/retrieve
collections/documents and it works.

The result was, for all practical purposes, a 1.1 db representation that was
equal in size to the original 1.0 db representation.

The only size difference seems to be:
 - system/SysConfig ... 1.0:    24576 bytes    vs 1.1:    32768 bytes
 - system/SysSymbols ... 1.0    659456 bytes    vs 1.1:    651264 bytes
which is completely negligible. 

Technically, the outcome of this initial test seems OK, so I will embark on
a larger test in a week or two. 


As I had these challenges about disk space, it could be useful to find some
heuristics on the Xindice Wiki
   http://wiki.apache.org/xindice/
about how settings of these parameters may influence the physical disk space
needed. Some statements you made in the last mail certainly could be useful
to find there.


regards,

/O



Natalia Shilenkova wrote:
> 
> On Thu, Apr 16, 2009 at 3:32 PM, OKO <ol...@sics.se> wrote:
> [snip]
>>
>> The "1.1" vs "1.0" expansion factor is much larger for the case of
>> collections with few and small files (factor 350) , than for many and
>> small
>> files (factor 25).
>>
>> But nevertheless, I wonder for what kind of databases 1.1 would beat 1.0
>> from a *size* point of view. Maybe comparing processing speed of "1.1" vs
>> "1.0" would give a different picture, but at the moment I have no numbers
>> to
>> offer in that respect.
> 
> Here is the deal: database format between 1.0 and 1.1 hasn't changed,
> except some bugs in index files were fixed and the collection files
> are now created properly, which in your case, unfortunately, means
> they take a lot more space than before. The reason is that default
> size is intended for bigger collections with many documents.
> 
> The database you tried to rebuild is the corner case however, with
> several collections that only have a single document each, so default
> size settings are way too large. But this is only the default setting
> and it can be change to more suitable number. This way, 1.1 database
> should take about the same amount of space as the existing database.
> 
>> By the way, I do not have any very large documents in the database, so I
>> have no idea how "1.1" compares to "1.0" for such beasts.
>>
>> And the 64,000 $ question is, of course, what means are available to
>> decrease disk space footprint of the "1.1"  database?
> 
> The following steps can help to reduce database size:
> 1. Meta collections can be turned off. The setting is in the
> config/system.xml file (In Xindice 1.1 directory). In the following
> line:
> 
>     <root-collection dbroot="./db/" name="db" use-metadata="on">
> 
> change use-metadata="off". This will make all Meta collections go away.
> 
> 2. Initial collection size can be adjusted (here I assume that all the
> collections use default BTreeFiler to store data, HashFiler is
> somewhat different beast). When creating a collection it can be given
> a configuration to specify pagecount setting that directly affect
> initial collection size:
> 
> <collection compressed="true" inline-metadata="true" name="test">
>   <filer class="org.apache.xindice.core.filer.BTreeFiler" pagecount="16"
> />
> </collection>
> 
> When creating collection from command-line tool, this setting can be
> specified with --pagecount parameter:
> 
> bin\xindice ac -c /db -n test --pagecount 16
> 
> The "perfect" value for pagecount depends on size and amount of
> documents in a collection. For collections that have documents added
> and modified often it should be picked based on how much data is
> expected to be stored in the collection, but in any case it only sets
> initial size, so collection file will grow when it is too small to
> hold new data.
> 
> In your case, when a collection has only one small document and this
> situation is not expected to change, pagecount may very well be set to
> 1.
> 
> Now back to data migration... I made some changes to xindice_rebuild
> to add optional pagecount parameter that will overwrite original
> collection setting.
> 
> If you have source version of Xindice download, please save attached
> file to <xindice
> directory>\java\src\org\apache\xindice\tools\DatabaseRebuild.java and
> run build.bat or build.sh depending on your OS. After that, you can
> try to rebuild database again using optional parameter:
> 
> bin\xindice_rebuild.bat rebuild db -p 1
> 
> If you have binary version of Xindice download instead, please let me
> know, I'll see what can be done for that. Alternatively, collections
> can be rebuilt manually, by exporting documents, creating new
> collections using command-line tool with the option "--pagecount 1"
> and importing documents into new collections.
> 
> Let me know if something doesn't work for you.
> 
> Natalia
> 
>  
> 

-- 
View this message in context: 
http://www.nabble.com/xindice_rebuild---file-size-vastly-multiplied-tp23078111p23155343.html
Sent from the Xindice - Users mailing list archive at Nabble.com.

Reply via email to