Re: [gridengine users] User job fails silently

2018-08-08 Thread Derrick Lin
> What state of the job you see in this line? Is it just hanging there and doing nothing? They do not appear in `top`? And it never vanishes automatically but you have to kill the job by hand? Sorry for the confusion. The job state is "r" according to SGE, but as you mentioned qstat output is

Re: [gridengine users] User job fails silently

2018-08-08 Thread Feng Zhang
I also did some test. If provide extra interrupting signal processing in all these scripts(to catch any TERM signal from OS), it can kind of solve the issue. On Wed, Aug 8, 2018 at 11:04 PM, Feng Zhang wrote: > > I am guessing it may be very similar to what I have met before. My issue > was:

Re: [gridengine users] User job fails silently

2018-08-08 Thread Feng Zhang
I am guessing it may be very similar to what I have met before. My issue was: one user used a bash script as a batch job script, and in it, it calls another script(python), and this script then calls a third script(and may be so on...). For these kind of jobs, if there's anything wrong, it can

Re: [gridengine users] User job fails silently

2018-08-08 Thread Reuti
Hi, Am 09.08.2018 um 01:26 schrieb Derrick Lin: > > What state of the job you see in this line? Is it just hanging there and > > doing nothing? They do not appear in `top`? And it never vanishes > > automatically but you have to kill the job by hand? > > Sorry for the confusion. The job

Re: [gridengine users] User job fails silently

2018-08-08 Thread Reuti
> Am 08.08.2018 um 08:15 schrieb Derrick Lin : > > Hi guys, > > I have a user reported his jobs stuck running for much longer than usual. > > So I go to the exec host, check the process and all processes owned by that > user look like: > > `- -bash

Re: [gridengine users] Start jobs on exec host in sequential order

2018-08-08 Thread Derrick Lin
Thanks guys, I will take a look at each option. On Mon, Aug 6, 2018 at 9:52 PM, William Hay wrote: > On Wed, Aug 01, 2018 at 11:06:19AM +1000, Derrick Lin wrote: > >HI Reuti, > >The prolog script is set to run by root indeed. The xfs quota requires > >root privilege. > >I also

[gridengine users] User job fails silently

2018-08-08 Thread Derrick Lin
Hi guys, I have a user reported his jobs stuck running for much longer than usual. So I go to the exec host, check the process and all processes owned by that user look like: `- -bash /opt/gridengine/default/spool/omega-6-20/job_scripts/1187671 In qstat, it still shows job is in running state.