its fair to say that its true, but thrift monitoring by CDH relays no errors and the thrift server itself is not outputing any errors I do notice that after 30 minutes of storm's "thrift connection timeout" that hbase thrift will slowly increase its open file count to dangerously high values.
I also noticed that after killing the topology and all of the jps process of storm (nimbus, ui and worker) the memory footprint on the server is not released. I did not list out what was consuming the storm servers memory before restarting the machine. so far restarting hbase, and l restarted the storm server, cleared out the working directory, restarted storm and redeployed the topology and it seems to be working again. On Tue, May 12, 2015 at 2:25 PM, Jeffery Maass <[email protected]> wrote: > The supervisor log you posted covers multiple different workers... This > looks expected to me. If an unhandled exception occurs in a worker, it > will die. Then either nimbus or the supervisor will cease to see its > heartbeats, the supervisor will attempt to kill it, then nimbus will ask a > supervisor to start a new worker. > > The most relevant logs are in the worker log. I'm betting the problem is > connected to "thrift connection timeout". > > Sorry I couldn't be of more help. > > -- Abraham Tom Data Architect - RippleLabs.com
