Am 18.06.2014 um 02:56 schrieb Jesse Becker: > On Wed, Jun 18, 2014 at 07:56:36AM +1000, Ian Mortimer wrote: >> I've had to disable outgoing mail from our cluster after an incident >> when an array job sent thousands of notifications to an external >> address. >> >> I assume if I set mailer to /bin/true that will stop any notifications >> being sent, but is there a way to prevent notifications just for array >> jobs while allowing notifications for regular jobs > > There isn't a "built in" way to do that, but I can think of two > different ways of handling the spamming problem. > > The mailer setting can be any program or script. Three arguments are > passed: > $1 - A literal "-s" (to set the subject for when mail(1) is called) > $2 - The "subject" line (examples below) > $3 - The recipient(s) of the message > > The body of the email is fed to the script on STDIN. > > The subject is useful for this. Here are two examples, one a normal > job, the other an array task: > > Job 369 (date) Complete > Job-array task 210356.5 (worker.sh) Complete > > The script can pretty easily figure out if it's an array job, and handle > it appropriately. > > However, it's pretty easy for a user to queue up 10,000 jobs with email > notification enabled that run /bin/false to cause similar problems, so > you might want to consider option #2: > > Configure your mail server to not send email from SGE to "outside" > addresses. This shouldn't be overly difficult, and you can "whitelist" > designated addresses if you so choose. > > As an aside, you can do lots of "interesting" things in the mailer, > including pull certain information about the just-completed job. We use > it occasionally to trap errors that the prologue can miss.
This could also be used to send an array-digest: create a directory for each job somewhere in a shared place in the mail-wrapper and `cat` the email content to a file for each task (i.e. in the mail-wrapper). At submission time, submit a follow-up job with -hold_jid to a dummy queue (with a CPU time limit of 60 seconds or so and only accessible by a BOOLEAN to an always available dummy queue). Only after the array job ran, this follow up job will run, `cat` all the found files according to the $JOB_ID it waited for and send this email from inside this dummy job. Afterwards this directory with the collection of emails can be deleted in this follow-up job. -- Reuti > -- > Jesse Becker (Contractor) > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
