Chris Jewell <[email protected]> writes: > I quite like the &3 idea. One reason I slightly shy away from emails > is that they're not a default option (though easy enough for an > administrator to configure that way), and it may often be more useful > to have job context output kept with the stdout and stderr files
Agreed. I keep meaning to put together a script to extract the available info from qacct, the GE log files, and possibly syslog, post mortem for a job (assuming shared classic spooling). Does anyone else fancy having a go? > Agreed. The standard 137 exit code is not really quite enough. For > the administrator, it could be *very* useful to know how many jobs are > being killed as a result of exceeding resource requests, (In case people don't know) You can arrange for the admin to get more useful mail about at least some failed jobs, but I can't remember off-hand what configures it. That's at least partially broken in 6.2u5, and I typically just get mailed the hostfile, though I have a fix which hasn't been properly tested <https://arc.liv.ac.uk/trac/SGE/ticket/1307>. It can also swamp you if an array job fails because its working directory disappeared, for instance. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
