[SolrCloud] Too many open files - internal server error
Hi, We're doing some tests with the latest trunk revision on a cluster of five high-end machines. There is one collection, five shards and one replica per shard on some other node. We're filling the index from a MapReduce job, 18 processes run concurrently. This is plenty when indexing to a single high-end node but with SolrCloud things go down pretty soon. First we get a Too Many Open Files error on all nodes almost at the same time. When shutting down the indexer the nodes won't respond anymore except for an Internal Server Error. First the too many open files stack trace: 2012-02-29 15:22:51,067 ERROR [solr.core.SolrCore] - [http-80-6] - : java.io.FileNotFoundException: /opt/solr/openindex_b/data/index/_h5_0.tim (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:216) at org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:449) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:288) at org.apache.lucene.codecs.BlockTreeTermsWriter.init(BlockTreeTermsWriter.java:149) at org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:118) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:322) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:92) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:475) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:320) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:389) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1533) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1505) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:56) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:53) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:354) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:451) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:258) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:118) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:135) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662 A similar exception sometimes begins with: %2012-02-29 15:25:36,137 ERROR [solr.update.CommitTracker] - [pool-5-thread-1] - : auto commit
Re: [SolrCloud] Too many open files - internal server error
Sami, As superuser: $ lsof | wc -l But, just now, i also checked the system handler and it told me: str name=ulimit(error executing: ulimit -n)/str This is rather strange, it seems. lsof | wc -l is not higher than 6k right now and ulimit -n is 32k. Is lsof not to be trusted in this case or... something else? Thanks On Wednesday 29 February 2012 16:44:58 Sami Siren wrote: Hi Markus, The Linux machines have proper settings for ulimit and friends, 32k open files allowed so i suspect there's another limit which i am unaware of. I also listed the number of open files while the errors were coming in but it did not exceed 11k at any given time. How did you check the number of filedescriptors used? Did you get this number from the system info handler (http://hotname:8983/solr/admin/system?indent=onwt=json) or somehow differently? -- Sami Siren -- Markus Jelsma - CTO - Openindex
Re: [SolrCloud] Too many open files - internal server error
On Wed, Feb 29, 2012 at 5:53 PM, Markus Jelsma markus.jel...@openindex.io wrote: Sami, As superuser: $ lsof | wc -l But, just now, i also checked the system handler and it told me: str name=ulimit(error executing: ulimit -n)/str That's odd, you should see something like this there: openFileDescriptorCount:131, maxFileDescriptorCount:4096, Which jvm do you have? This is rather strange, it seems. lsof | wc -l is not higher than 6k right now and ulimit -n is 32k. Is lsof not to be trusted in this case or... something else? I am not sure what is going on, are you sure the open file descriptor (32k) limit is active for the user running solr? -- Sami Siren
Re: [SolrCloud] Too many open files - internal server error
On Wed, Feb 29, 2012 at 10:32 AM, Markus Jelsma markus.jel...@openindex.io wrote: The Linux machines have proper settings for ulimit and friends, 32k open files allowed Maybe you can expand on this point. cat /proc/sys/fs/file-max cat /proc/sys/fs/nr_open Those take precedence over ulimit. Not sure if there are others... -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: [SolrCloud] Too many open files - internal server error
On Wednesday 29 February 2012 17:52:55 Sami Siren wrote: On Wed, Feb 29, 2012 at 5:53 PM, Markus Jelsma markus.jel...@openindex.io wrote: Sami, As superuser: $ lsof | wc -l But, just now, i also checked the system handler and it told me: str name=ulimit(error executing: ulimit -n)/str That's odd, you should see something like this there: openFileDescriptorCount:131, maxFileDescriptorCount:4096, Which jvm do you have? Standard issue SUN Java 6 on Debian. We run that JVM on all machines. But i see the same (error executing: ulimit -n) locally with Jetty and Solr trunk and Solr 3.5 and on a production server with Solr 3.2 with Tomcat6. This is rather strange, it seems. lsof | wc -l is not higher than 6k right now and ulimit -n is 32k. Is lsof not to be trusted in this case or... something else? I am not sure what is going on, are you sure the open file descriptor (32k) limit is active for the user running solr? I get the correct output for ulimit -n as tomcat6 user. However, i did find a mistake in /etc/security/limits.conf where i misspelled the tomcat6 user (shame). On recent systems only ulimit and sysctl is not enough so spelling tomcat6 correctly should fix the open files issue. No we only have the issue of (error executing: ulimit -n). -- Sami Siren -- Markus Jelsma - CTO - Openindex
Re: [SolrCloud] Too many open files - internal server error
Thanks. They are set properly. But i misspelled the tomcat6 username in limits.conf :( On Wednesday 29 February 2012 18:08:55 Yonik Seeley wrote: On Wed, Feb 29, 2012 at 10:32 AM, Markus Jelsma markus.jel...@openindex.io wrote: The Linux machines have proper settings for ulimit and friends, 32k open files allowed Maybe you can expand on this point. cat /proc/sys/fs/file-max cat /proc/sys/fs/nr_open Those take precedence over ulimit. Not sure if there are others... -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 -- Markus Jelsma - CTO - Openindex
Re: [SolrCloud] Too many open files - internal server error
I had this problem sometime ago, It happened on our homolog machine. There was 3 solr instances , 1 master 2 slaves, running. My Solution was: I stoped the slaves, deleted both data folders, runned an optimize and than started it again. I tried to raise the OS open file limit first, but i think it was not a good idea... so i tried this ... On Wed, Feb 29, 2012 at 2:07 PM, Markus Jelsma markus.jel...@openindex.iowrote: I get the correct output for ulimit -n as tomcat6 user. However, i did find a -- Carlos Alberto Schneider Informant -(47) 38010919 - 9904-5517