On a 0.9.0.1 cluster. Everything was fine until last week. No changes were made and we now regularly have nodes dying where we end up with the following exception. Note, number of open files is really low, we aren't out of file handles. Has anyone else encountered this?
2014-06-10 13:34:04 b.s.d.worker [ERROR] Error when processing event java.io.FileNotFoundException: /opt/storm/var/storm/workers/b9ec5518-9430-4275-9844-e2f6e203e3ce/heartbeats/1402421644201 (Too many open files) at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_17] at java.io.FileOutputStream.<init>(FileOutputStream.java:212) ~[na:1.7.0_17] at java.io.FileOutputStream.<init>(FileOutputStream.java:165) ~[na:1.7.0_17] at org.apache.commons.io.FileUtils.openOutputStream(FileUtils.java:179) ~[commons-io-1.4.jar:1.4] at org.apache.commons.io.FileUtils.writeByteArrayToFile(FileUtils.java:1282) ~[commons-io-1.4.jar:1.4] at backtype.storm.utils.LocalState.persist(LocalState.java:69) ~[storm-core-0.9.0.1.jar:na] at backtype.storm.utils.LocalState.put(LocalState.java:49) ~[storm-core-0.9.0.1.jar:na] at backtype.storm.daemon.worker$do_heartbeat.invoke(worker.clj:51) ~[storm-core-0.9.0.1.jar:na] at backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883$heartbeat_fn__5884.invoke(worker.clj:339) ~[storm-core-0.9.0.1.jar:na] at backtype.storm.timer$schedule_recurring$this__3019.invoke(timer.clj:77) ~[storm-core-0.9.0.1.jar:na] at backtype.storm.timer$mk_timer$fn__3002$fn__3003.invoke(timer.clj:33) ~[storm-core-0.9.0.1.jar:na] at backtype.storm.timer$mk_timer$fn__3002.invoke(timer.clj:26) [storm-core-0.9.0.1.jar:na] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na] at java.lang.Thread.run(Thread.java:722) [na:1.7.0_17] -- Ce n'est pas une signature
