When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing.
When I changed the configuration to 60000 for soft commit and 60000 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <nitinml...@gmail.com> wrote: > Hi Erick, > I am just saying. I want to be sure on commits difference.. > What if I do frequent commits or not? And why I am saying that I need to > commit things so very quickly because I have to index 28GB of data which > takes 7-8 hours(frequent commits). > As you said, do commits after 60000 seconds then it will be more expensive. > If I don't encounter with **"overlapping searchers" warning messages** > then I feel it seems to be okay. Is it? > > > > > On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> Don't do it. Really, why do you want to do this? This seems like >> an "XY" problem, you haven't explained why you need to commit >> things so very quickly. >> >> I suspect you haven't tried _searching_ while committing at such >> a rate, and you might as well turn all your top-level caches off >> in solrconfig.xml since they won't be useful at all. >> >> Best, >> Erick >> >> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <nitinml...@gmail.com> >> wrote: >> > Hi, >> > If I do very very fast indexing(softcommit = 300 and hardcommit = >> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000) as >> you >> > both said. Will fast indexing fail to index some data? >> > Any suggestion on this ? >> > >> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar < >> > andyetitmo...@gmail.com> wrote: >> > >> >> Yes, and doing so is painful and takes lots of people and hardware >> >> resources to get there for large amounts of data and queries :) >> >> >> >> As Erick says, work backwards from 60s and first establish how high the >> >> commit interval can be to satisfy your use case.. >> >> On 16 Mar 2015 16:04, "Erick Erickson" <erickerick...@gmail.com> >> wrote: >> >> >> >> > First start by lengthening your soft and hard commit intervals >> >> > substantially. Start with 60000 and work backwards I'd say. >> >> > >> >> > Ramkumar has tuned the heck out of his installation to get the commit >> >> > intervals to be that short ;). >> >> > >> >> > I'm betting that you'll see your RAM usage go way down, but that' s a >> >> > guess until you test. >> >> > >> >> > Best, >> >> > Erick >> >> > >> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki < >> nitinml...@gmail.com> >> >> > wrote: >> >> > > Hi Erick, >> >> > > You are saying correct. Something, **"overlapping >> >> searchers" >> >> > > warning messages** are coming in logs. >> >> > > **numDocs numbers** are changing when documents are adding at the >> time >> >> of >> >> > > indexing. >> >> > > Any help? >> >> > > >> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson < >> >> > erickerick...@gmail.com> >> >> > > wrote: >> >> > > >> >> > >> First, the soft commit interval is very short. Very, very, very, >> very >> >> > >> short. 300ms is >> >> > >> just short of insane unless it's a typo ;). >> >> > >> >> >> > >> Here's a long background: >> >> > >> >> >> > >> >> >> > >> >> >> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> > >> >> >> > >> But the short form is that you're opening searchers every 300 ms. >> The >> >> > >> hard commit is better, >> >> > >> but every 3 seconds is still far too short IMO. I'd start with >> soft >> >> > >> commits of 60000 and hard >> >> > >> commits of 60000 (60 seconds), meaning that you're going to have >> to >> >> > >> wait 1 minute for >> >> > >> docs to show up unless you explicitly commit. >> >> > >> >> >> > >> You're throwing away all the caches configured in solrconfig.xml >> more >> >> > >> than 3 times a second, >> >> > >> executing autowarming, etc, etc, etc.... >> >> > >> >> >> > >> Changing these to longer intervals might cure the problem, but if >> not >> >> > >> then, as Hoss would >> >> > >> say, "details matter". I suspect you're also seeing "overlapping >> >> > >> searchers" warning messages >> >> > >> in your log, and it;s _possible_ that what's happening is that >> you're >> >> > >> just exceeding the >> >> > >> max warming searchers and never opening a new searcher with the >> >> > >> newly-indexed documents. >> >> > >> But that's a total shot in the dark. >> >> > >> >> >> > >> How are you looking for docs (and not finding them)? Does the >> numDocs >> >> > >> number in >> >> > >> the solr admin screen change? >> >> > >> >> >> > >> >> >> > >> Best, >> >> > >> Erick >> >> > >> >> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki < >> nitinml...@gmail.com >> >> > >> >> > >> wrote: >> >> > >> > Hi Alexandre, >> >> > >> > >> >> > >> > >> >> > >> > *Hard Commit* is : >> >> > >> > >> >> > >> > <autoCommit> >> >> > >> > <maxTime>${solr.autoCommit.maxTime:3000}</maxTime> >> >> > >> > <openSearcher>false</openSearcher> >> >> > >> > </autoCommit> >> >> > >> > >> >> > >> > *Soft Commit* is : >> >> > >> > >> >> > >> > <autoSoftCommit> >> >> > >> > <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime> >> >> > >> > </autoSoftCommit> >> >> > >> > >> >> > >> > And I am committing 20000 documents each time. >> >> > >> > Is it good config for committing? >> >> > >> > Or I am good something wrong ? >> >> > >> > >> >> > >> > >> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch < >> >> > >> arafa...@gmail.com> >> >> > >> > wrote: >> >> > >> > >> >> > >> >> What's your commit strategy? Explicit commits? Soft >> commits/hard >> >> > >> >> commits (in solrconfig.xml)? >> >> > >> >> >> >> > >> >> Regards, >> >> > >> >> Alex. >> >> > >> >> ---- >> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a >> newsletter: >> >> > >> >> http://www.solr-start.com/ >> >> > >> >> >> >> > >> >> >> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <nitinml...@gmail.com >> > >> >> > wrote: >> >> > >> >> > Hello, >> >> > >> >> > I have written a python script to do 20000 >> documents >> >> > >> indexing >> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU. >> >> > >> >> > When I started indexing, at that time 15 GB RAM was freed. >> While >> >> > >> >> indexing, >> >> > >> >> > all RAM is consumed but **not** a single document is >> indexed. Why >> >> > so? >> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service >> Unavailable* >> >> in >> >> > >> python >> >> > >> >> > script. >> >> > >> >> > I think it is due to heavy load on Zookeeper by which all >> nodes >> >> > went >> >> > >> >> down. >> >> > >> >> > I am not sure about that. Any help please.. >> >> > >> >> > Or anything else is happening.. >> >> > >> >> > And how to overcome this issue. >> >> > >> >> > Please assist me towards right path. >> >> > >> >> > Thanks.. >> >> > >> >> > >> >> > >> >> > Warm Regards, >> >> > >> >> > Nitin Solanki >> >> > >> >> >> >> > >> >> >> > >> >> >> > >