Can anyone tell me what is going on here?

The machine has plenty of disk space. The directory from which the job was
set to run has plenty of disk space.

Also, is there any way to prevent such events from making the queue go
into  the ERROR state?

Simon


Job 8057282 caused action: Queue "[email protected]" set to
ERROR
 User        = build
 Queue       = [email protected]
 Start Time  = <unknown>
 End Time    = <unknown>
failed before job:08/03/2012 01:20:51 [600:25020]: can't close file usage:
No space left on device
Shepherd trace:
08/03/2012 01:02:01 [600:25020]: shepherd called with uid = 0, euid = 600
08/03/2012 01:02:01 [600:25020]: starting up 6.2u4
08/03/2012 01:02:01 [600:25020]: setpgid(25020, 25020) returned 0
08/03/2012 01:02:01 [600:25020]: no prolog script to start
08/03/2012 01:02:01 [600:25020]: parent: forked "job" with pid 25023
08/03/2012 01:02:01 [600:25023]: child: starting son(job,
/home/gridengine/blue/spool/h2-c6-64-1/job_scripts/8057282, 0);
08/03/2012 01:02:01 [600:25020]: parent: job-pid: 25023
08/03/2012 01:02:01 [600:25023]: pid=25023 pgrp=25023 sid=25023 old
pgrp=25020 getlogin()=root
08/03/2012 01:02:01 [600:25023]: reading passwd information for user 'build'
08/03/2012 01:02:01 [600:25023]: setosjobid: uid = 0, euid = 600
08/03/2012 01:02:01 [600:25023]: setting limits
08/03/2012 01:02:01 [600:25023]: RLIMIT_CPU setting: (soft 7200 hard 7200)
resulting: (soft 7200 hard 7200)
08/03/2012 01:02:01 [600:25023]: RLIMIT_FSIZE setting: (soft 0^HINFINITY
hard 0^HINFINITY) resulting: (soft 0^HINFINITY hard 0^HINFINITY)
08/03/2012 01:02:01 [600:25023]: RLIMIT_DATA setting: (soft 0^HINFINITY
hard 0^HINFINITY) resulting: (soft 0^HINFINITY hard 0^HINFINITY)
08/03/2012 01:02:01 [600:25023]: RLIMIT_STACK setting: (soft 0^HINFINITY
hard 0^HINFINITY) resulting: (soft 0^HINFINITY hard 0^HINFINITY)
08/03/2012 01:02:01 [600:25023]: RLIMIT_CORE setting: (soft 0^HINFINITY
hard 0^HINFINITY) resulting: (soft 0^HINFINITY hard 0^HINFINITY)
08/03/2012 01:02:01 [600:25023]: RLIMIT_VMEM/RLIMIT_AS setting: (soft
0^HINFINITY hard 0^HINFINITY) resulting: (soft 0^HINFINITY hard 0^HINFINITY)
08/03/2012 01:02:01 [600:25023]: RLIMIT_RSS setting: (soft 0^HINFINITY hard
0^HINFINITY) resulting: (soft 0^HINFINITY hard 0^HINFINITY)
08/03/2012 01:02:01 [600:25023]: setting environment
08/03/2012 01:02:01 [600:25023]: Initializing error file
08/03/2012 01:02:01 [600:25023]: switching to intermediate/target user
08/03/2012 01:02:01 [2002:25023]: closing all filedescriptors
08/03/2012 01:02:01 [2002:25023]: further messages are in "error" and
"trace"
08/03/2012 01:02:01 [2002:25023]: now running with uid=2002, euid=2002
08/03/2012 01:02:01 [2002:25023]: execvp(/bin/bash, "bash" "-s")
08/03/2012 01:20:51 [600:25020]: wait3 returned 25023 (status: 0;
WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 0)
08/03/2012 01:20:51 [600:25020]: job exited with exit status 0
08/03/2012 01:20:51 [600:25020]: reaped "job" with pid 25023
08/03/2012 01:20:51 [600:25020]: job exited not due to signal
08/03/2012 01:20:51 [600:25020]: job exited with status 0
08/03/2012 01:20:51 [600:25020]: now sending signal KILL to pid -25023
08/03/2012 01:20:51 [600:25020]: writing usage file to "usage"
08/03/2012 01:20:51 [600:25020]: can't close file usage: No space left on
device

Shepherd error:
08/03/2012 01:20:51 [600:25020]: can't close file usage: No space left on
device
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to