> > I started with a search of the SGE mailing list archive, and found your > post. :) > > Have you found a solution?
Hello all, Sorry for the long leave of absence. I've been thoroughly testing my system for this issue. I checked my RAID1 for consistency, and performed an xfs_repair to make doubly sure my filesystem was okay. It was. I also disabled SELinux in case that was the problem. In reply to Reuti: > The directories (/opt/sge/default/spool/it060123/active_jobs/...) are > normally created by the admin user - is this root or any other one with > normal rights (which would be fine)? > > Nevertheless also "other users" must be allowed to read this directory and > the files inside. Is there any special `umask` in place and/or does it only > happen to parallel jobs and/or only certain users? No special umask or parallel jobs being used. The problem seems more apparent when lots of very short jobs are sent to the system. The one thing that Mark and I have in common is high-CPU count machines. My box is currently configured to provide 28 slots out of 32 logical cores. I wonder if this might be causing a race-condition to become apparent in the creation of the pe_hostfile? Cheers, Chris -- Dr Chris Jewell Lecturer in Biostatistics Institute of Fundamental Sciences Massey University Private Bag 11222 Palmerston North 4442 New Zealand Tel: +64 (0) 6 350 5701 Extn: 3586 _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users