[SoldCloud] leaking file descriptors

2012-03-01 Thread Markus Jelsma

Hi,

Yesterday we had an issue with too many open files, which was solved 
because a username was misspelled. But there is still a problem with 
open files.


We cannot succesfully index a few millions documents from MapReduce to 
a 5-node Solr cloud cluster. One of the problems is that after a while 
ClassNotFoundErrors and other similar weirdness begin to appear. This 
will not solve itself if indexing is stopped.


With lsof i found that Solr keeps open roughly 9k files 8 hours after 
indexing failed. Out of the 9k there are roughly 7.5k deleted files that 
still have a file descriptor open for the tomcat6 user, these are all 
segments files:


/opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
java  10049 tomcat6  DEL   REG9,0   
515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
java  10049 tomcat6  DEL   REG9,0   
515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
java  10049 tomcat6  DEL   REG9,0   
515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
java  10049 tomcat6  DEL   REG9,0   
515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
java  10049 tomcat6  DEL   REG9,0   
515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
java  10049 tomcat6  DEL   REG9,0   
515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
java  10049 tomcat6  DEL   REG9,0   
515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq

 any many more

Did i misconfigure anything? This is a pretty standard (no changes to 
IndexDefaults section) and a recent Solr trunk revision. Is there a bug 
somewhere?


Thanks,
Markus


Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Bernd Fehling

What is netstat telling you about the connections on the servers?

Any connections in CLOSE_WAIT (passive close) hanging?

Saw this on my servers last week.
Used a little proggi to spoof a local connection on those servers ports
and was able to fake the TCP-stack to close those connections.
It also immediately released all open fd's set to DEL and cleaned
everything up without restarting.

Regards
Bernd


Am 01.03.2012 11:36, schrieb Markus Jelsma:

Hi,

Yesterday we had an issue with too many open files, which was solved because a 
username was misspelled. But there is still a problem with open
files.

We cannot succesfully index a few millions documents from MapReduce to a 5-node 
Solr cloud cluster. One of the problems is that after a while
ClassNotFoundErrors and other similar weirdness begin to appear. This will not 
solve itself if indexing is stopped.

With lsof i found that Solr keeps open roughly 9k files 8 hours after indexing 
failed. Out of the 9k there are roughly 7.5k deleted files that
still have a file descriptor open for the tomcat6 user, these are all segments 
files:

/opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
java 10049 tomcat6 DEL REG 9,0 515607 
/opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
java 10049 tomcat6 DEL REG 9,0 515504 
/opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
java 10049 tomcat6 DEL REG 9,0 515735 
/opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
java 10049 tomcat6 DEL REG 9,0 515595 
/opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
java 10049 tomcat6 DEL REG 9,0 515592 
/opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
java 10049 tomcat6 DEL REG 9,0 515591 
/opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
java 10049 tomcat6 DEL REG 9,0 515590 
/opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq
 any many more

Did i misconfigure anything? This is a pretty standard (no changes to 
IndexDefaults section) and a recent Solr trunk revision. Is there a bug
somewhere?

Thanks,
Markus


Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Sami Siren
Do you have autocommit enabled? I tested this with 1m docs indexed by
using the default example config and saw used file descriptors go up
to 2400 (did not come down even after the final commit at the end).
Then I disabled autocommit, reindexed and the descriptor count stayed
pretty much flat at around 400-500.

--
 Sami Siren



On Thu, Mar 1, 2012 at 12:36 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Hi,

 Yesterday we had an issue with too many open files, which was solved because
 a username was misspelled. But there is still a problem with open files.

 We cannot succesfully index a few millions documents from MapReduce to a
 5-node Solr cloud cluster. One of the problems is that after a while
 ClassNotFoundErrors and other similar weirdness begin to appear. This will
 not solve itself if indexing is stopped.

 With lsof i found that Solr keeps open roughly 9k files 8 hours after
 indexing failed. Out of the 9k there are roughly 7.5k deleted files that
 still have a file descriptor open for the tomcat6 user, these are all
 segments files:

 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
 java      10049 tomcat6  DEL       REG                9,0
 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
 java      10049 tomcat6  DEL       REG                9,0
 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
 java      10049 tomcat6  DEL       REG                9,0
 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
 java      10049 tomcat6  DEL       REG                9,0
 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
 java      10049 tomcat6  DEL       REG                9,0
 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
 java      10049 tomcat6  DEL       REG                9,0
 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
 java      10049 tomcat6  DEL       REG                9,0
 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq
  any many more

 Did i misconfigure anything? This is a pretty standard (no changes to
 IndexDefaults section) and a recent Solr trunk revision. Is there a bug
 somewhere?

 Thanks,
 Markus


Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Markus Jelsma


On Thursday 01 March 2012 13:03:18 Bernd Fehling wrote:
 What is netstat telling you about the connections on the servers?
 
 Any connections in CLOSE_WAIT (passive close) hanging?

I can't tell exact numbers right now but there were a lot between all the 
cores and the indexing clients.

 
 Saw this on my servers last week.
 Used a little proggi to spoof a local connection on those servers ports
 and was able to fake the TCP-stack to close those connections.
 It also immediately released all open fd's set to DEL and cleaned
 everything up without restarting.

Interesting! But sounds like a sneaky work-around :)

 
 Regards
 Bernd
 
 Am 01.03.2012 11:36, schrieb Markus Jelsma:
  Hi,
  
  Yesterday we had an issue with too many open files, which was solved
  because a username was misspelled. But there is still a problem with
  open files.
  
  We cannot succesfully index a few millions documents from MapReduce to a
  5-node Solr cloud cluster. One of the problems is that after a while
  ClassNotFoundErrors and other similar weirdness begin to appear. This
  will not solve itself if indexing is stopped.
  
  With lsof i found that Solr keeps open roughly 9k files 8 hours after
  indexing failed. Out of the 9k there are roughly 7.5k deleted files that
  still have a file descriptor open for the tomcat6 user, these are all
  segments files:
  
  /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
  java 10049 tomcat6 DEL REG 9,0 515607
  /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx java 10049
  tomcat6 DEL REG 9,0 515504
  /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx java 10049
  tomcat6 DEL REG 9,0 515735
  /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs java 10049
  tomcat6 DEL REG 9,0 515595
  /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs java 10049
  tomcat6 DEL REG 9,0 515592
  /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim java 10049
  tomcat6 DEL REG 9,0 515591
  /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx java 10049
  tomcat6 DEL REG 9,0 515590
  /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq  any many
  more
  
  Did i misconfigure anything? This is a pretty standard (no changes to
  IndexDefaults section) and a recent Solr trunk revision. Is there a bug
  somewhere?
  
  Thanks,
  Markus

-- 
Markus Jelsma - CTO - Openindex