> 
> I started with a search of the SGE mailing list archive, and found your
> post. :)
> 
> Have you found a solution?


Hello all,

Sorry for the long leave of absence.  I've been thoroughly testing my system 
for this issue.  I checked my RAID1 for consistency, and performed an 
xfs_repair to make doubly sure my filesystem was okay.  It was.  I also 
disabled SELinux in case that was the problem.

In reply to Reuti:

> The directories (/opt/sge/default/spool/it060123/active_jobs/...) are 
> normally created by the admin user - is this root or any other one with 
> normal rights (which would be fine)?
> 
> Nevertheless also "other users" must be allowed to read this directory and 
> the files inside. Is there any special `umask` in place and/or does it only 
> happen to parallel jobs and/or only certain users?

No special umask or parallel jobs being used.  The problem seems more apparent 
when lots of very short jobs are sent to the system.

The one thing that Mark and I have in common is high-CPU count machines.  My 
box is currently configured to provide 28 slots out of 32 logical cores.  I 
wonder if this might be causing a race-condition to become apparent in the 
creation of the pe_hostfile?

Cheers,

Chris


--
Dr Chris Jewell
Lecturer in Biostatistics
Institute of Fundamental Sciences
Massey University
Private Bag 11222
Palmerston North 4442
New Zealand
Tel: +64 (0) 6 350 5701 Extn: 3586


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to