Re: [gridengine users] Barrier job

2012-07-10 Thread Jesse Becker
On Tue, Jul 10, 2012 at 06:51:03PM -0400, David Erickson wrote: Hi all- Following up with a slightly different question from yesterday, I have another GE installation that has many hosts grabbing jobs from a single queue. These jobs are logically grouped together, although they are not submitted

[gridengine users] Barrier job

2012-07-10 Thread David Erickson
Hi all- Following up with a slightly different question from yesterday, I have another GE installation that has many hosts grabbing jobs from a single queue. These jobs are logically grouped together, although they are not submitted together (but actually when a job finishes from another GE cluste

Re: [gridengine users] Stop executing jobs on job error?

2012-07-10 Thread David Erickson
Great info, will be hacking on this this afternoon. Thanks! On Tue, Jul 10, 2012 at 11:43 AM, Rayson Ho wrote: > On Tue, Jul 10, 2012 at 5:45 AM, Reuti wrote: >> >> Just to note, that the path can be accessed by $SGE_JOB_SPOOL_DIR. > > > Thanks Reuti - it will be useful to David. > > I forgot t

Re: [gridengine users] queues behaving differently

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 4:23 PM, John Young wrote: > With this in place, it seems odd that from one of my queues I > get a default setting for the number of descriptors of 1024. > > So I have two questions really: > > 1. Why am I getting different behavior from the two queues? Could be OS issue -

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 3:51 PM, Reuti wrote: >> Failure was that user's .bashrc were not being read. In each user's >> account, I have it sourcing a system wide shell script which sets certain >> things up, like our module environment configuration and so nothing was >> being setup. > > Yep,

Re: [gridengine users] queues behaving differently

2012-07-10 Thread John Young
On 07/10/2012 04:14 PM, Rayson Ho wrote: On Tue, Jul 10, 2012 at 4:02 PM, John Young wrote: If you really have a real use-case for setting the # of descriptors in the queue config, then let us know and we can implement that in OGS/GE (... when time permits). Well... I have an engineer here w

Re: [gridengine users] queues behaving differently

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 4:02 PM, John Young wrote: >> If you really have a real use-case for setting the # of descriptors in >> the queue config, then let us know and we can implement that in OGS/GE >> (... when time permits). >> > Well... I have an engineer here who want to run a 2048 core job.

Re: [gridengine users] queues behaving differently

2012-07-10 Thread John Young
On 07/10/2012 03:47 PM, Rayson Ho wrote: The number of file descriptors is not part of the queue limit, see the message I sent to the list 2 months ago: http://gridengine.org/pipermail/users/2012-May/003705.html If you really have a real use-case for setting the # of descriptors in the queue co

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Reuti
Am 10.07.2012 um 21:20 schrieb Joseph Farran: > On 07/10/2012 11:38 AM, Rayson Ho wrote: >> On Tue, Jul 10, 2012 at 1:48 PM, Joseph Farran wrote: >>> I was using the same identical script, so it's still a mystery why the >>> script ran on some nodes while it failed on others, but now with this ch

Re: [gridengine users] queues behaving differently

2012-07-10 Thread Rayson Ho
The number of file descriptors is not part of the queue limit, see the message I sent to the list 2 months ago: http://gridengine.org/pipermail/users/2012-May/003705.html If you really have a real use-case for setting the # of descriptors in the queue config, then let us know and we can implement

[gridengine users] queues behaving differently

2012-07-10 Thread John Young
I have a short test job that I can submit to different queues on my cluster that appear to be configured the same, but I get different results. Here is the job: --- #!/bin/tcsh # #$ -N show-limits #$ -S /bin/tcsh #$ -o show-limits.out #$ -e show-limits.err

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Joseph Farran
On 07/10/2012 11:38 AM, Rayson Ho wrote: On Tue, Jul 10, 2012 at 1:48 PM, Joseph Farran wrote: I was using the same identical script, so it's still a mystery why the script ran on some nodes while it failed on others, but now with this change it works on all nodes and that is good enough. Th

Re: [gridengine users] Stop executing jobs on job error?

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 5:45 AM, Reuti wrote: > > Just to note, that the path can be accessed by $SGE_JOB_SPOOL_DIR. Thanks Reuti - it will be useful to David. I forgot this environment var as I have not used this hack for almost a year... basically since getting the job exit status in epilog w

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Rayson Ho
On Tue, Jul 10, 2012 at 1:48 PM, Joseph Farran wrote: > I was using the same identical script, so it's still a mystery why the > script ran on some nodes while it failed on others, but now with this change > it works on all nodes and that is good enough. Those settings affect whether the global l

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Joseph Farran
On 07/10/2012 02:37 AM, Reuti wrote: Am 10.07.2012 um 09:29 schrieb Hung-Sheng Tsao Ph.D.: hi to use -S /bin/bash need shell_start_mode posix_behavior you may also want to add bash to login_shells in qconf -mconf global regards On 7/10/2012 1:21 AM, Joseph A. Farran wrote: Hello. I

Re: [gridengine users] export of environment variables from start_proc_args

2012-07-10 Thread Mark Dixon
On Tue, 10 Jul 2012, Reuti wrote: ... * PE_HOSTFILE rewriting This I would suggest to do in the start_proc_args of the PE, but it might be personal taste of course. -- Reuti The advantage of doing PE_HOSTFILE rewriting in the starter_method is that we can modify the job's PE_HOSTFILE vari

Re: [gridengine users] integrate BLCR and SGE

2012-07-10 Thread Reuti
Am 10.07.2012 um 07:42 schrieb mahbube rustaee: > > > > > Yes, I configured BLCR checkpoint . when a job suspend (qmod -sj) state be > > "s" and will be queue (Rq ) automatically. > > This is the normal behavior. > > > How can do that manually? I mean job be in "s" state until unsuspend it >

Re: [gridengine users] Stop executing jobs on job error?

2012-07-10 Thread Reuti
Am 10.07.2012 um 05:10 schrieb Rayson Ho: > 1) There's the "exit_status" file in the job's spool directory, and > you can just go to the active_jobs directory in the execd's spool, and > in there you will find a subdirectory for each job. So you can just > parse the file to get the exit status of

Re: [gridengine users] export of environment variables from start_proc_args

2012-07-10 Thread Reuti
Am 10.07.2012 um 10:25 schrieb Mark Dixon: > On Mon, 9 Jul 2012, Dave Love wrote: > ... >> You essentially need to mimic the built-in starter, which is why it's >> best to hook into it instead. The mimic is straightforward as long as >> you don't use shell and can follow the built-in code. >> >>

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Reuti
Am 10.07.2012 um 09:29 schrieb Hung-Sheng Tsao Ph.D.: > hi > to use -S /bin/bash > need > shell_start_mode posix_behavior > > you may also want to add bash to login_shells in qconf -mconf global > > regards > > On 7/10/2012 1:21 AM, Joseph A. Farran wrote: >> Hello. >> >> I have a cluste

Re: [gridengine users] export of environment variables from start_proc_args

2012-07-10 Thread Mark Dixon
On Mon, 9 Jul 2012, Dave Love wrote: ... I'm not sure I understand the problem. Is it specific to using the starter method? Fluent 12 apparently works here with the standard wrapper as "rsh". All I know is that it didn't work for me :) ... I guess it depends on the specification. I doubt i

Re: [gridengine users] export of environment variables from start_proc_args

2012-07-10 Thread Mark Dixon
On Mon, 9 Jul 2012, Dave Love wrote: ... You essentially need to mimic the built-in starter, which is why it's best to hook into it instead. The mimic is straightforward as long as you don't use shell and can follow the built-in code. What sort of things do people need to do other than firkle w

Re: [gridengine users] Default Shell bash not always found

2012-07-10 Thread Hung-Sheng Tsao Ph.D.
hi to use -S /bin/bash need shell_start_mode posix_behavior you may also want to add bash to login_shells in qconf -mconf global regards On 7/10/2012 1:21 AM, Joseph A. Farran wrote: Hello. I have a cluster with Rocks 5.4.3 (SL) and I believe this is a Rocks issue but not sure. I have

Re: [gridengine users] Using Galaxy with SGE: Job output not returned from Cluster

2012-07-10 Thread Sascha Kastens
Hi Reuti,   I have contacted the original poster. Unfortunately he cannot remember the exact solution. He wrote something about changes to the queueing system... but this does not help anyway.   Cheers, Sascha   Original Message processed by CONSOLIDATE Subject: Re: [gridengine users] U