On 2/18/2013 4:57 AM, giovanni.bricc...@banzai.it wrote:
I have some questions about  tlog files and how are managed.

I'm using dih to do incremental data loading, once a day I do a full
refresh.

these are the request parameters

/dataimport?command=full-import&commit=true
/dataimport?command=delta-import&commit=true&optimize=false

I was expecting to see removed all the old tlog file when completing a
delta/full, but I see that these files remains. Actually
the older files gets removed.

Am I using the wrong parameters? is there a different parameter to
trigger the hard commit?
Are there some configuration parameters to configure the number of tlog
files to keep? Unfortunately I have very little space on my disks and I
need to double check space consumption .

Your best option is to turn on autoCommit with openSearcher set to false. I use a maxDocs of 25000 and a maxTime of 300000 (five minutes). Every 25000 docs, Solr does a hard commit, but because openSearcher is false, it does not change the index at all from the perspective of a client. You would need to choose values appropriate for your installation.

<!-- the default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
  <autoCommit>
    <maxDocs>25000</maxDocs>
    <maxTime>300000</maxTime>
    <openSearcher>false</openSearcher>
  </autoCommit>
  <updateLog />
</updateHandler>

The hard commit does one important thing here - it closes the current tlog and starts a new one. Solr does not keep very many tlogs around, but if you do a full-import without any commits, the tlog will contain every single document you have.

I actually do my index rebuilds in build core and swap them to live when the rebuild is fully complete, but I have double-checked the docs available from a client, and they do not change until the full-import is done.

Another thing - I would use optimize=false on the full-import and the delta-import. The only real reason to do an optimize in a modern Solr version is to purge deleted documents. If you are doing a new full-import every day, then you don't have to worry about that, because the new index will not contain any deleted documents. It's true that an optimized index does slightly outperform one with many segments of varying sizes, but generally speaking the huge I/O overhead during the optimize is very detrimental to performance.

Thanks,
Shawn

Reply via email to