On 2013-05-31 13:41, Edward Ned Harvey (openindiana) wrote:
If LOCKDIR becomes stale (for example, system power cycled while lock existed) any script that *has* lock guarantees to release it in less than 60 seconds. So if the BREAKLOCK script detects LOCK exists for more than 60 seconds, assume it's a stale lock and steal it forcibly.
I've had my share of lock-files and lock-dirs to, usually, forbid simultaneous executions of a script (i.e. an initscript which can take quite a while for an appserver - don't let impatient admins on different consoles run it many times at once). Simply existence of an agreed-on filename is usually not enough. This file might hold a useful piece of information, such as the PID of the process which created it (echo $$ > $LOCKFILE) so that other instances of the script can verify (with ps or /proc/$pid) that the locker indeed exists and works. Expectations that a job will take some predefined time usually become outdated (maybe in a year or two, especially for things like backup, scrub, appserver init - which may be proportional to amount of stored data) and regular jobs without such locks to abort concurrent new execution tend to pile up, slow the system down, take longer to execute, pile up more... you get the picture. It leads to exhaustion of resources and OOM kills on linux, or scanrate hell on Solaris... end of peaceful life one way or another :) Another thing to look out for is dependency on potentially slow resources, such as NFS which may be unavailable for a while and just blocks on access, depending on your setup. If you expect such troublemakers, wrap calls with a lifetime-limiting script (i.e. run for 60 seconds, then die ungracefully by external killer). HTH, //Jim _______________________________________________ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss