Hello,

Am 07.10.2011 um 16:50 schrieb Reuti:

> Am 04.10.2011 um 12:05 schrieb Schmidt, Burkhard:
> 
>> Am 29.09.2011 um 20:50 schrieb Reuti:
>> 
>>> Am 28.09.2011 um 15:41 schrieb Schmidt, Burkhard:
>>> 
>>>> I'm running SGE 6.2u5 on an Xserve cluster running Mac OS X Server v10.6 
>>>> Snow Leopard with Open Directory network accounts. All users belong to the 
>>>> same default group staff.
>>> 
>>> the complete cluster is OS X, or only the master node or only the slaves?
>> 
>> The complete cluster is running Mac OS X Server v10.6 Snow Leopard.
>> 
>>> There were issues in the past as a result for an account having too many 
>>> additinal groups, but I'm not sure whether it applies here, as the error 
>>> message was different.
>>> 
>>> http://gridengine.org/pipermail/users/2011-March/000447.html
>>> 
>>> Nevertheless: can you check the group count of the users in question?
>> 
>> I did so, and it is less than 14 on the execution hosts for all of my users. 
>> But it is more than 14 on the head node, due to the presence of 31 
>> com.apple.sharepoint.* derived group memberships with IDs in the range 101 
>> -- 178.
>> 
>> However, as all my users (not only those added after the upgrade to 10.6) 
>> have the same (large number of) group memberships on the head node, this 
>> doesn't seem to be the origin of my problem.
>> 
>> What's confusing me is the shepherd error message [1319:18564]: can't open 
>> file job_pid: Permission denied. These files have permissions 644, owner is 
>> the local user running Grid Engine, group is the local admin group. So they 
>> should be readable for everybody.
> 
> I was some days on vacation, hence the delay. Yes, the files, but what about 
> the enclosing directory. Usually it's something like 
> /var/spool/sge/node01/active_jobs/12345.1 or alike in the spool directory of 
> the node.

All intermediate directories have permissions 755, with owner:group set to 
local_admin:local_admin_group. local_admin is the user executing Grid Engine's 
execd.

> Do you have the spool directory local on each machine or in a shared space?

The spool directories are local, located on the boot volume on each node.

Your question regarding permissions led me to verify *all* permissions of *all* 
file a users touches when setting up a job, including the executables, in 
particular the users's shell. And I guess I have found the culprit: All the 
problematic users had a default shell set to none in Open Directory.

My fault, I should have realized that Workgroup Manager in Mac OS X Server 
v10.6 doesn't set a default shell for new users with the default template, in 
contrast to the behavior of WGM in Mac OS X Server v10.5. However, logging in 
via SSH to the nodes always worked, so apparently SSH is "blind" for this 
problem, and therefore I was.

Best regards, and many thanks for your help,

Burkhard.

P.S.: Whenever you find yourself in Dresden, please contact me. I owe you a 
favour.

_______________________________________________
Dr. Burkhard Schmidt
Max-Planck-Institut für Chemische Physik fester Stoffe
Nöthnitzer Str. 40 * 01187 Dresden * Tel. +49 351 4646-2235
Sekretariat: Tel. +49 351 4646-3231, Fax +49 351 4646-3232

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to