Hi,
thx, for your quick reply. Yes we have a little cluster with 4 servers.
The problem only exists on this machine and also for all users.
The only difference is that this particular machine is the master host.
The other machines are just execution hosts and one of them is also a
shadow master. Otherwise they are pretty much identical, software wise.

till

On 05/09/12 17:09, Reuti wrote:
> Hi,
>
> Am 05.09.2012 um 15:06 schrieb Tillmann Stieger:
>
>   
>> I have been searching for a solution to my problem for quite a while. But no 
>> luck so far. I hope that someone here can help me out.
>>
>> On one specific machine I always get the following error-message:
>>     
> You mean you have a larger cluster and problem exists only on one machine, 
> and there for one or all users?
>
> Is this node in some wy differently configured than the others?
>
> -- Reuti
>
>
>   
>> Job 6941 caused action: Job 6941 set to ERROR
>>
>> User        = xxx
>> Queue       = all.q@xxx
>> Start Time  = <unknown>
>> End Time    = <unknown>
>> failed opening input/output file:09/05/2012 12:48:00 [1111:12201]: can't 
>> stat() "/home/xxx" as stdout_path
>> Shepherd trace:
>> 09/05/2012 12:48:00 [1111:12200]: shepherd called with uid = 1111, euid = 
>> 1111
>> 09/05/2012 12:48:00 [1111:12200]: starting up 2011.11
>> 09/05/2012 12:48:00 [1111:12200]: warning: starting not as superuser 
>> (uid=1111)
>> 09/05/2012 12:48:00 [1111:12200]: setpgid(12200, 12200) returned 0
>> 09/05/2012 12:48:00 [1111:12200]: do_core_binding: "binding" parameter not 
>> found in config file
>> 09/05/2012 12:48:00 [1111:12200]: no prolog script to start
>> 09/05/2012 12:48:00 [1111:12200]: parent: forked "job" with pid 12201
>> 09/05/2012 12:48:00 [1111:12200]: parent: job-pid: 12201
>> 09/05/2012 12:48:00 [1111:12201]: child: starting son(job, 
>> /opt/sge6_2/default/spool/crab2/job_scripts/6941, 0);
>> 09/05/2012 12:48:00 [1111:12201]: pid=12201 pgrp=12201 sid=12201 old 
>> pgrp=12200 getlogin()=<no login set>
>> 09/05/2012 12:48:00 [1111:12201]: reading passwd information for user 'xxx'
>> 09/05/2012 12:48:00 [1111:12201]: setosjobid: uid = 1111, euid = 1111
>> 09/05/2012 12:48:00 [1111:12201]: setting limits
>> 09/05/2012 12:48:00 [1111:12201]: RLIMIT_CPU setting: (soft 
>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
>> resulting: (soft 18446744073709551615(INFINITY), hard 
>> 18446744073709551615(INFINITY))
>> 09/05/2012 12:48:00 [1111:12201]: RLIMIT_FSIZE setting: (soft 
>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
>> resulting: (soft 18446744073709551615(INFINITY), hard 
>> 18446744073709551615(INFINITY))
>> 09/05/2012 12:48:00 [1111:12201]: RLIMIT_DATA setting: (soft 
>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
>> resulting: (soft 18446744073709551615(INFINITY), hard 
>> 18446744073709551615(INFINITY))
>> 09/05/2012 12:48:00 [1111:12201]: RLIMIT_STACK setting: (soft 
>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
>> resulting: (soft 18446744073709551615(INFINITY), hard 
>> 18446744073709551615(INFINITY))
>> 09/05/2012 12:48:00 [1111:12201]: RLIMIT_CORE setting: (soft 
>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
>> resulting: (soft 18446744073709551615(INFINITY), hard 
>> 18446744073709551615(INFINITY))
>> 09/05/2012 12:48:00 [1111:12201]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 
>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
>> resulting: (soft 18446744073709551615(INFINITY), hard 
>> 18446744073709551615(INFINITY))
>> 09/05/2012 12:48:00 [1111:12201]: RLIMIT_RSS setting: (soft 
>> 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) 
>> resulting: (soft 18446744073709551615(INFINITY), hard 
>> 18446744073709551615(INFINITY))
>> 09/05/2012 12:48:00 [1111:12201]: setting environment
>> 09/05/2012 12:48:00 [1111:12201]: Initializing error file
>> 09/05/2012 12:48:00 [1111:12201]: switching to intermediate/target user
>> 09/05/2012 12:48:00 [1111:12201]: tried to change uid/gid without being root
>> 09/05/2012 12:48:00 [1111:12201]: try running further with uid=1111
>> 09/05/2012 12:48:00 [1111:12201]: closing all filedescriptors
>> 09/05/2012 12:48:00 [1111:12201]: further messages are in "error" and "trace"
>> 09/05/2012 12:48:00 [1111:12201]: can't stat() "/home/xxx" as stdout_path: 
>> Permission denied KRB5CCNAME=none uid=1111 gid=100 100 
>> 09/05/2012 12:48:00 [1111:12200]: wait3 returned 12201 (status: 6656; 
>> WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 26)
>> 09/05/2012 12:48:00 [1111:12200]: job exited with exit status 26
>> 09/05/2012 12:48:00 [1111:12200]: reaped "job" with pid 12201
>> 09/05/2012 12:48:00 [1111:12200]: job exited not due to signal
>> 09/05/2012 12:48:00 [1111:12200]: job exited with status 26
>> 09/05/2012 12:48:00 [1111:12200]: now sending signal KILL to pid -12201
>> 09/05/2012 12:48:00 [1111:12200]: no tasker to notify
>> 09/05/2012 12:48:00 [1111:12200]: failed starting job
>> 09/05/2012 12:48:00 [1111:12200]: no epilog script to start
>>
>> Shepherd error:
>> 09/05/2012 12:48:00 [1111:12201]: can't stat() "/home/xxx" as stdout_path: 
>> Permission denied KRB5CCNAME=none uid=1111 gid=100 100 
>>
>> Shepherd pe_hostfile:
>> xxx 1 all.q@xxx UNDEFINED
>>
>>
>>
>> Has someone experienced similar issues?
>>
>> I would really appreciate any advice.Thanks.
>> till
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>>     
>
>   

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to