Hi Erick,

I have tried indexing code I have few times, this is the behaviour I have
tried out:

When an indexing process starts, even if one or more tlog file exists, a
new tlog file is created and all the new documents are stored there.
When indexing process ends and does an hard commit, older old tlog files
are removed but the new one (the latest) remains.

As far as I can see, since my indexing process every time loads few
millions of documents, at end of process latest tlog file persist with all
these documents there.
So I have such big tlog files. Now the question is, why latest tlog file
persist even if the code have done a hard commit.
When an hard commit is done successfully, why should we keep latest tlog
file?



On Mon, May 25, 2015 at 7:24 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> OK, assuming you're not doing any commits at all until the very end,
> then the tlog contains all the docs for the _entire_ run. The article
> really doesn't care whether the commits come from the solrconfig.xml
> or SolrJ client or curl. The tlog simply is not truncated until a hard
> commit happens, no matter where it comes from.
>
> So here's what I'd do:
> 1> set autoCommit in your solrconfig.xml with openSearcher=false for
> every minute. Then the problem will probably go away.
> or
> 2> periodically issue a hard commit (openSearcher=false) from the client.
>
> Of the two, I _strongly_ recommend <1> as it's more graceful when
> there are multiple clents.
>
> Best,
> Erick
>
> On Mon, May 25, 2015 at 4:45 AM, Vincenzo D'Amore <v.dam...@gmail.com>
> wrote:
> > Hi Erick, thanks for your support.
> >
> > Reading the post I realised that my scenario does not apply the
> autoCommit
> > configuration, now we don't have autoCommit in our solrconfig.xml.
> >
> > We need docs are searchable only after the indexing process, and all the
> > documents are committed only at end of index process.
> >
> > Now I don't understand why tlog files are so big, given that we have an
> > hard commit at end of every indexing.
> >
> >
> >
> >
> > On Sun, May 24, 2015 at 5:49 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> Vincenzo:
> >>
> >> Here's perhaps more than you want to know about hard commits, soft
> >> commits and transaction logs:
> >>
> >>
> >>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore <v.dam...@gmail.com>
> >> wrote:
> >> > Thanks Shawn for your prompt support.
> >> >
> >> > Best regards,
> >> > Vincenzo
> >> >
> >> > On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <apa...@elyograg.org>
> >> wrote:
> >> >
> >> >> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
> >> >> > Thanks Shawn,
> >> >> >
> >> >> > may be this is a silly question, but I looked around and didn't
> find
> >> an
> >> >> > answer...
> >> >> > Well, could I update solrconfig.xml for the collection while the
> >> >> instances
> >> >> > are running or should I restart the cluster/reload the cores?
> >> >>
> >> >> You can upload a new config to zookeeper with the zkcli program while
> >> >> Solr is running, and nothing will change, at least not immediately.
> The
> >> >> new config will take effect when you reload the collection or restart
> >> >> all the Solr instances.
> >> >>
> >> >> Thanks,
> >> >> Shawn
> >> >>
> >> >>
> >>
> >
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Reply via email to