[SoldCloud] leaking file descriptors
Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a while ClassNotFoundErrors and other similar weirdness begin to appear. This will not solve itself if indexing is stopped. With lsof i found that Solr keeps open roughly 9k files 8 hours after indexing failed. Out of the 9k there are roughly 7.5k deleted files that still have a file descriptor open for the tomcat6 user, these are all segments files: /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd java 10049 tomcat6 DEL REG9,0 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx java 10049 tomcat6 DEL REG9,0 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx java 10049 tomcat6 DEL REG9,0 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs java 10049 tomcat6 DEL REG9,0 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs java 10049 tomcat6 DEL REG9,0 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim java 10049 tomcat6 DEL REG9,0 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx java 10049 tomcat6 DEL REG9,0 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq any many more Did i misconfigure anything? This is a pretty standard (no changes to IndexDefaults section) and a recent Solr trunk revision. Is there a bug somewhere? Thanks, Markus
Re: [SoldCloud] leaking file descriptors
What is netstat telling you about the connections on the servers? Any connections in CLOSE_WAIT (passive close) hanging? Saw this on my servers last week. Used a little proggi to spoof a local connection on those servers ports and was able to fake the TCP-stack to close those connections. It also immediately released all open fd's set to DEL and cleaned everything up without restarting. Regards Bernd Am 01.03.2012 11:36, schrieb Markus Jelsma: Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a while ClassNotFoundErrors and other similar weirdness begin to appear. This will not solve itself if indexing is stopped. With lsof i found that Solr keeps open roughly 9k files 8 hours after indexing failed. Out of the 9k there are roughly 7.5k deleted files that still have a file descriptor open for the tomcat6 user, these are all segments files: /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd java 10049 tomcat6 DEL REG 9,0 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx java 10049 tomcat6 DEL REG 9,0 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx java 10049 tomcat6 DEL REG 9,0 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs java 10049 tomcat6 DEL REG 9,0 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs java 10049 tomcat6 DEL REG 9,0 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim java 10049 tomcat6 DEL REG 9,0 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx java 10049 tomcat6 DEL REG 9,0 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq any many more Did i misconfigure anything? This is a pretty standard (no changes to IndexDefaults section) and a recent Solr trunk revision. Is there a bug somewhere? Thanks, Markus
Re: [SoldCloud] leaking file descriptors
Do you have autocommit enabled? I tested this with 1m docs indexed by using the default example config and saw used file descriptors go up to 2400 (did not come down even after the final commit at the end). Then I disabled autocommit, reindexed and the descriptor count stayed pretty much flat at around 400-500. -- Sami Siren On Thu, Mar 1, 2012 at 12:36 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a while ClassNotFoundErrors and other similar weirdness begin to appear. This will not solve itself if indexing is stopped. With lsof i found that Solr keeps open roughly 9k files 8 hours after indexing failed. Out of the 9k there are roughly 7.5k deleted files that still have a file descriptor open for the tomcat6 user, these are all segments files: /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd java 10049 tomcat6 DEL REG 9,0 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx java 10049 tomcat6 DEL REG 9,0 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx java 10049 tomcat6 DEL REG 9,0 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs java 10049 tomcat6 DEL REG 9,0 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs java 10049 tomcat6 DEL REG 9,0 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim java 10049 tomcat6 DEL REG 9,0 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx java 10049 tomcat6 DEL REG 9,0 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq any many more Did i misconfigure anything? This is a pretty standard (no changes to IndexDefaults section) and a recent Solr trunk revision. Is there a bug somewhere? Thanks, Markus
Re: [SoldCloud] leaking file descriptors
On Thursday 01 March 2012 13:03:18 Bernd Fehling wrote: What is netstat telling you about the connections on the servers? Any connections in CLOSE_WAIT (passive close) hanging? I can't tell exact numbers right now but there were a lot between all the cores and the indexing clients. Saw this on my servers last week. Used a little proggi to spoof a local connection on those servers ports and was able to fake the TCP-stack to close those connections. It also immediately released all open fd's set to DEL and cleaned everything up without restarting. Interesting! But sounds like a sneaky work-around :) Regards Bernd Am 01.03.2012 11:36, schrieb Markus Jelsma: Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a while ClassNotFoundErrors and other similar weirdness begin to appear. This will not solve itself if indexing is stopped. With lsof i found that Solr keeps open roughly 9k files 8 hours after indexing failed. Out of the 9k there are roughly 7.5k deleted files that still have a file descriptor open for the tomcat6 user, these are all segments files: /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd java 10049 tomcat6 DEL REG 9,0 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx java 10049 tomcat6 DEL REG 9,0 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx java 10049 tomcat6 DEL REG 9,0 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs java 10049 tomcat6 DEL REG 9,0 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs java 10049 tomcat6 DEL REG 9,0 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim java 10049 tomcat6 DEL REG 9,0 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx java 10049 tomcat6 DEL REG 9,0 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq any many more Did i misconfigure anything? This is a pretty standard (no changes to IndexDefaults section) and a recent Solr trunk revision. Is there a bug somewhere? Thanks, Markus -- Markus Jelsma - CTO - Openindex