On 2/18/2013 4:57 AM, giovanni.bricc...@banzai.it wrote:
I have some questions about tlog files and how are managed.
I'm using dih to do incremental data loading, once a day I do a full
refresh.
these are the request parameters
/dataimport?command=full-import&commit=true
/dataimport?command=delta-import&commit=true&optimize=false
I was expecting to see removed all the old tlog file when completing a
delta/full, but I see that these files remains. Actually
the older files gets removed.
Am I using the wrong parameters? is there a different parameter to
trigger the hard commit?
Are there some configuration parameters to configure the number of tlog
files to keep? Unfortunately I have very little space on my disks and I
need to double check space consumption .
Your best option is to turn on autoCommit with openSearcher set to
false. I use a maxDocs of 25000 and a maxTime of 300000 (five minutes).
Every 25000 docs, Solr does a hard commit, but because openSearcher is
false, it does not change the index at all from the perspective of a
client. You would need to choose values appropriate for your installation.
<!-- the default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
<autoCommit>
<maxDocs>25000</maxDocs>
<maxTime>300000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<updateLog />
</updateHandler>
The hard commit does one important thing here - it closes the current
tlog and starts a new one. Solr does not keep very many tlogs around,
but if you do a full-import without any commits, the tlog will contain
every single document you have.
I actually do my index rebuilds in build core and swap them to live when
the rebuild is fully complete, but I have double-checked the docs
available from a client, and they do not change until the full-import is
done.
Another thing - I would use optimize=false on the full-import and the
delta-import. The only real reason to do an optimize in a modern Solr
version is to purge deleted documents. If you are doing a new
full-import every day, then you don't have to worry about that, because
the new index will not contain any deleted documents. It's true that an
optimized index does slightly outperform one with many segments of
varying sizes, but generally speaking the huge I/O overhead during the
optimize is very detrimental to performance.
Thanks,
Shawn