Hi!

I need your help.

I'm ugrading our Storm production cluster from 0.9.4 to 1.0.1 version.

All the 0.9.4 Storm procceses are running under supervision with
Monit-5.12.2. In this case, all the start-scripts that Monit use exec
NImbus, Supervisor and UI daemons in *background* (&).

For example, the start-supervisor.sh only contains:

/usr/bin/nohup ${STORM_HOME}/bin/storm supervisor &

This config is strongly probed, actually is running on our production
environment for almost 2 years.

When I try the same configuration with Storm 1.0.1 and Monit 5.17.1, I
think that the system doesn't run properly. Although all the start scripts
are able to set up Nimbus, UI and Supervisor by themselves, when they're
executed by Monit it crash. Monit is not able to raise none of them.

I have tried by modifying start-scripts in order to set up all the Storm
daemons in *foreground*, for example:

/usr/bin/nohup ${STORM_HOME}/bin/storm supervisor

In this case, Monit daemon is able to restart all of them but always in the
second round. During the first attempt, all of them die because of timeout.

These are Monit logs:

For 1st attempt:
error    : 'nimbus' failed to start (exit status -1) --
....../start-nimbus.sh: Program timed out .....
info     : 'nimbus' restart action failed

For 2nd attempt:
info     : 'nimbus' process is running after previous exec error (slow
starting or manually recovered?)

Although finally all procceses are running when I force failures (kill dash
nines), the machine remains a "defunct" procces due to the first failed
attempt.

Is this problem related with the use of the conf/storm.py file?? (Of course
I have python 2.6.6 and jdk 1.8 running on the machnies and all procceses
run OK without Monit)

Should I run NImbus, Supervisor and UI in background or in foreground?

I don't know why is this happening, but I would definetly like to run a
Storm 1.0.1 cluster as strong & stable as our Storm 0.9.4 cluster is.

Could you help me guys?

Thank you sooo much in advance!


*JULIÁN BERMEJO FERREIRO*
*Departamento de Tecnología *
*[email protected] <[email protected]>*
<http://www.beeva.com/>

Reply via email to