Hi Erick, After reading the discussion you guys were having about renaming optimize to forceMerge I realized I was guilty of over-optimizing like you guys were worried about! We have about 15 million docs indexed now and we spin about 50-300 adds per second 24/7, most of them being updates to existing documents whose data has changed since the last time it was indexed (which we keep track of in a DB table). There are some new documents being added in the mix and some deletes as well too.
I understand now how the merge policy caps the number of segments. I used to think they would grow unbounded and thus optimize was required. How does the large number of updates of existing documents affect the need to optimize, by causing a large number of deletes with a 're-add'? And so I suppose that means the index size tends to grow with the deleted docs hanging around in the background, as it were. So in our situation, what frequency of optimize would you recommend? We're on 3.6.1 btw... Thanks, Robi -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, October 11, 2012 5:29 AM To: solr-user@lucene.apache.org Subject: Re: anyone have any clues about this exception Well, you'll actually be able to optimize, it's just called forceMerge. But the point is that optimize seems like something that _of course_ you want to do, when in reality it's not something you usually should do at all. Optimize does two things: 1> merges all the segments into one (usually) 2> removes all of the info associated with deleted documents. Of the two, point <2> is the one that really counts and that's done whenever segment merging is done anyway. So unless you have a very large number of deletes (or updates of the same document), optimize buys you very little. You can tell this by the difference between numDocs and maxDoc in the admin page. So what happens if you just don't bother to optimize? Take a look at merge policy to help control how merging happens perhaps as an alternative. Best Erick On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert <rober...@buy.com> wrote: > You could be right. Going back in the logs, I noticed it used to happen less > frequently and always towards the end of an optimize operation. It is > probably my indexer timing out waiting for updates to occur during optimizes. > The errors grew recently due to my upping the indexer threadcount to 22 > threads, so there's a lot more timeouts occurring now. Also our index has > grown to double the old size so the optimize operation has started taking a > lot longer, also contributing to what I'm seeing. I have just changed my > optimize frequency from three times a day to one time a day after reading the > following: > > Here they are talking about completely deprecating the optimize > command in the next version of solr... > https://issues.apache.org/jira/browse/SOLR-3141c > > > -----Original Message----- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Wednesday, October 10, 2012 11:10 AM > To: solr-user@lucene.apache.org > Subject: Re: anyone have any clues about this exception > > Something timed out, the other end closed the connection. This end tried to > write to closed pipe and died, something tried to catch that exception and > write its own and died even worse? Just making it up really, but sounds good > (plus a 3-year Java tech-support hunch). > > If it happens often enough, see if you can run WireShark on that machine's > network interface and catch the whole network conversation in action. Often, > there is enough clues there by looking at tcp packets and/or stuff > transmitted. WireShark is a power-tool, so takes a little while the first > time, but the learning will pay for itself over and over again. > > Regards, > Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert <rober...@buy.com> wrote: >> Tomcat localhost log (not the catalina log) for my solr 3.6.1 (master) >> instance contains lots of these exceptions but solr itself seems to be doing >> fine... any ideas? I'm not seeing these exceptions being logged on my slave >> servers btw, just the master where we do our indexing only. >> >> >> >> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve >> invoke >> SEVERE: Servlet.service() for servlet default threw exception >> java.lang.IllegalStateException >> at >> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407) >> at >> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >> at >> com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30) >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >> at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) >> at >> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) >> at >> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >> at >> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) >> at java.lang.Thread.run(Unknown Source) >