I just added a little bulletproofing to the router code. Before I describe the change, I must explain a bit about how the router works.
1. In osrf_ctl.sh, we invoke the router as the executable opensrf_router. 2. The router spawns two child processes, 3. Each of the child processes spawns a grandchild and then immediately exits. 4. Each grandchild turns itself into a daemon and hangs around to route things. In the old code, the parent process would exit immediately after spawning its children. The osrf_ctl.sh script runs a ps to capture the process IDs of the running routers. However, when the parent exits, the grandchildren might not be running yet. As a result, the script inserts a sleep between opensrf_router and ps, so that the the grandchildren have time to get spawned before ps goes looking for them. That sleep is no longer necessary. Now the parent router process waits for all of its immediate children to terminate before exiting. (It does *not* wait for the grandchildren to terminate; that would be a long wait.) As a result, the grandchildren should be running by the time the parent exits. If a child process terminates abnormally -- i.e. it exits with a non-zero condition code, or it is terminated by a signal -- the parent issues a warning message to that effect. That message, if issued, goes to standard error, not to a log file. The reason is that each child process opens its own separate log file, as defined in the configuration file. The parent has no log file defined for it, and never opens one. If you run osrf_ctl.sh from the command line, these messages, if issued, will appear immediately after the "Starting OpenSRF Router" message issued by the shell script to standard output. If you run osrf_ctl.sh from another layer of scripting, you may want to redirect standard error so as to capture these message if they occur. Scott McKellar
