[SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
Hi,

We're doing some tests with the latest trunk revision on a cluster of five 
high-end machines. There is one collection, five shards and one replica per 
shard on some other node.

We're filling the index from a MapReduce job, 18 processes run concurrently. 
This is plenty when indexing to a single high-end node but with SolrCloud 
things go down pretty soon.

First we get a Too Many Open Files error on all nodes almost at the same time. 
When shutting down the indexer the nodes won't respond anymore except for an 
Internal Server Error.

First the too many open files stack trace:

2012-02-29 15:22:51,067 ERROR [solr.core.SolrCore] - [http-80-6] - : 
java.io.FileNotFoundException: /opt/solr/openindex_b/data/index/_h5_0.tim (Too 
many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:216)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:449)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:288)
at 
org.apache.lucene.codecs.BlockTreeTermsWriter.init(BlockTreeTermsWriter.java:149)
at 
org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:66)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:118)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:322)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:92)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:475)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:320)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:389)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1533)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1505)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:56)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:53)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:354)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:451)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:258)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:118)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:135)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662



A similar exception sometimes begins with:

%2012-02-29 15:25:36,137 ERROR [solr.update.CommitTracker] - [pool-5-thread-1] 
- : auto commit 

Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
Sami,

As superuser:
$ lsof | wc -l

But, just now, i also checked the system handler and it told me:
str name=ulimit(error executing: ulimit -n)/str

This is rather strange, it seems. lsof | wc -l is not higher than 6k right now 
and ulimit -n is 32k. Is lsof not to be trusted in this case or... something 
else? 

Thanks

On Wednesday 29 February 2012 16:44:58 Sami Siren wrote:
 Hi Markus,
 
  The Linux machines have proper settings for ulimit and friends, 32k open
  files allowed so i suspect there's another limit which i am unaware of.
  I also listed the number of open files while the errors were coming in
  but it did not exceed 11k at any given time.
 
 How did you check the number of filedescriptors used? Did you get this
 number from the system info handler
 (http://hotname:8983/solr/admin/system?indent=onwt=json) or somehow
 differently?
 
 --
  Sami Siren

-- 
Markus Jelsma - CTO - Openindex


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Sami Siren
On Wed, Feb 29, 2012 at 5:53 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Sami,

 As superuser:
 $ lsof | wc -l

 But, just now, i also checked the system handler and it told me:
 str name=ulimit(error executing: ulimit -n)/str

That's odd, you should see something like this there:

openFileDescriptorCount:131,
maxFileDescriptorCount:4096,

Which jvm do you have?

 This is rather strange, it seems. lsof | wc -l is not higher than 6k right now
 and ulimit -n is 32k. Is lsof not to be trusted in this case or... something
 else?

I am not sure what is going on, are you sure the open file descriptor
(32k) limit is active for the user running solr?

--
 Sami Siren


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Yonik Seeley
On Wed, Feb 29, 2012 at 10:32 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
 The Linux machines have proper settings for ulimit and friends, 32k open files
 allowed

Maybe you can expand on this point.

cat /proc/sys/fs/file-max
cat /proc/sys/fs/nr_open

Those take precedence over ulimit.  Not sure if there are others...

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
On Wednesday 29 February 2012 17:52:55 Sami Siren wrote:
 On Wed, Feb 29, 2012 at 5:53 PM, Markus Jelsma
 
 markus.jel...@openindex.io wrote:
  Sami,
  
  As superuser:
  $ lsof | wc -l
  
  But, just now, i also checked the system handler and it told me:
  str name=ulimit(error executing: ulimit -n)/str
 
 That's odd, you should see something like this there:
 
 openFileDescriptorCount:131,
 maxFileDescriptorCount:4096,
 
 Which jvm do you have?

Standard issue SUN Java 6 on Debian. We run that JVM on all machines. But i 
see the same (error executing: ulimit -n) locally with Jetty and Solr trunk 
and Solr 3.5 and on a production server with Solr 3.2 with Tomcat6.

 
  This is rather strange, it seems. lsof | wc -l is not higher than 6k
  right now and ulimit -n is 32k. Is lsof not to be trusted in this case
  or... something else?
 
 I am not sure what is going on, are you sure the open file descriptor
 (32k) limit is active for the user running solr?

I get the correct output for ulimit -n as tomcat6 user. However, i did find a 
mistake in /etc/security/limits.conf where i misspelled the tomcat6 user 
(shame). On recent systems only ulimit and sysctl is not enough so spelling 
tomcat6 correctly should fix the open files issue. 

No we only have the issue of (error executing: ulimit -n).

 
 --
  Sami Siren

-- 
Markus Jelsma - CTO - Openindex


Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
Thanks. They are set properly. But i misspelled the tomcat6 username in 
limits.conf :(

On Wednesday 29 February 2012 18:08:55 Yonik Seeley wrote:
 On Wed, Feb 29, 2012 at 10:32 AM, Markus Jelsma
 
 markus.jel...@openindex.io wrote:
  The Linux machines have proper settings for ulimit and friends, 32k open
  files allowed
 
 Maybe you can expand on this point.
 
 cat /proc/sys/fs/file-max
 cat /proc/sys/fs/nr_open
 
 Those take precedence over ulimit.  Not sure if there are others...
 
 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10

-- 
Markus Jelsma - CTO - Openindex



Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Carlos Alberto Schneider
I had this problem sometime ago,
It happened on our homolog machine.

There was 3 solr instances , 1 master 2 slaves, running.
My Solution was: I stoped the slaves, deleted both data folders, runned an
optimize and than started it again.

I tried to raise the OS open file limit first, but i think it was not a
good idea... so i tried this ...


On Wed, Feb 29, 2012 at 2:07 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 I get the correct output for ulimit -n as tomcat6 user. However, i did
 find a




-- 
Carlos Alberto Schneider
Informant -(47) 38010919 - 9904-5517