It could be related to ulimit on your machines. A good number to start
around is 65000 for ulimit.





On Tue, Jun 10, 2014, at 10:40 AM, Sean Allen wrote:

On a 0.9.0.1 cluster.

Everything was fine until last week. No changes were made and we now
regularly have nodes dying where we end up with the following
exception. Note, number of open files is really low, we aren't out of
file handles. Has anyone else encountered this?

2014-06-10 13:34:04 b.s.d.worker [ERROR] Error when processing event
java.io.FileNotFoundException:
/opt/storm/var/storm/workers/b9ec5518-9430-4275-9844-e2f6e203e3ce/heart
beats/1402421644201 (Too many open files)
at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_17]
at java.io.FileOutputStream.<init>(FileOutputStream.java:212)
~[na:1.7.0_17]
at java.io.FileOutputStream.<init>(FileOutputStream.java:165)
~[na:1.7.0_17]
at org.apache.commons.io.FileUtils.openOutputStream(FileUtils.java:179)
~[commons-io-1.4.jar:1.4]
at
org.apache.commons.io.FileUtils.writeByteArrayToFile(FileUtils.java:128
2) ~[commons-io-1.4.jar:1.4]
at backtype.storm.utils.LocalState.persist(LocalState.java:69)
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.utils.LocalState.put(LocalState.java:49)
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.daemon.worker$do_heartbeat.invoke(worker.clj:51)
~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883$heart
beat_fn__5884.invoke(worker.clj:339) ~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.timer$schedule_recurring$this__3019.invoke(timer.clj:77)
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.timer$mk_timer$fn__3002$fn__3003.invoke(timer.clj:33)
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.timer$mk_timer$fn__3002.invoke(timer.clj:26)
[storm-core-0.9.0.1.jar:na]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_17]

--

Ce n'est pas une signature

Reply via email to