Hello, Am 07.10.2011 um 16:50 schrieb Reuti:
> Am 04.10.2011 um 12:05 schrieb Schmidt, Burkhard: > >> Am 29.09.2011 um 20:50 schrieb Reuti: >> >>> Am 28.09.2011 um 15:41 schrieb Schmidt, Burkhard: >>> >>>> I'm running SGE 6.2u5 on an Xserve cluster running Mac OS X Server v10.6 >>>> Snow Leopard with Open Directory network accounts. All users belong to the >>>> same default group staff. >>> >>> the complete cluster is OS X, or only the master node or only the slaves? >> >> The complete cluster is running Mac OS X Server v10.6 Snow Leopard. >> >>> There were issues in the past as a result for an account having too many >>> additinal groups, but I'm not sure whether it applies here, as the error >>> message was different. >>> >>> http://gridengine.org/pipermail/users/2011-March/000447.html >>> >>> Nevertheless: can you check the group count of the users in question? >> >> I did so, and it is less than 14 on the execution hosts for all of my users. >> But it is more than 14 on the head node, due to the presence of 31 >> com.apple.sharepoint.* derived group memberships with IDs in the range 101 >> -- 178. >> >> However, as all my users (not only those added after the upgrade to 10.6) >> have the same (large number of) group memberships on the head node, this >> doesn't seem to be the origin of my problem. >> >> What's confusing me is the shepherd error message [1319:18564]: can't open >> file job_pid: Permission denied. These files have permissions 644, owner is >> the local user running Grid Engine, group is the local admin group. So they >> should be readable for everybody. > > I was some days on vacation, hence the delay. Yes, the files, but what about > the enclosing directory. Usually it's something like > /var/spool/sge/node01/active_jobs/12345.1 or alike in the spool directory of > the node. All intermediate directories have permissions 755, with owner:group set to local_admin:local_admin_group. local_admin is the user executing Grid Engine's execd. > Do you have the spool directory local on each machine or in a shared space? The spool directories are local, located on the boot volume on each node. Your question regarding permissions led me to verify *all* permissions of *all* file a users touches when setting up a job, including the executables, in particular the users's shell. And I guess I have found the culprit: All the problematic users had a default shell set to none in Open Directory. My fault, I should have realized that Workgroup Manager in Mac OS X Server v10.6 doesn't set a default shell for new users with the default template, in contrast to the behavior of WGM in Mac OS X Server v10.5. However, logging in via SSH to the nodes always worked, so apparently SSH is "blind" for this problem, and therefore I was. Best regards, and many thanks for your help, Burkhard. P.S.: Whenever you find yourself in Dresden, please contact me. I owe you a favour. _______________________________________________ Dr. Burkhard Schmidt Max-Planck-Institut für Chemische Physik fester Stoffe Nöthnitzer Str. 40 * 01187 Dresden * Tel. +49 351 4646-2235 Sekretariat: Tel. +49 351 4646-3231, Fax +49 351 4646-3232
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
