Re: [gridengine users] running job holds and restart

2013-10-28 Thread Reuti
Hi, Am 28.10.2013 um 06:40 schrieb Sangmin Park: Thanks, adam I configured sge queue configuration following second link you said. But, it does not work. I make 4 queues, short.q, normal.q, long.q and matlab.q short.q, normal.q and long.q queue instances are running all computing

Re: [gridengine users] BLCR starter_method

2013-10-28 Thread Reuti
Hi, Am 28.10.2013 um 01:21 schrieb Joseph Farran: We have setup BLCR ( Berkeley Lab Checkpoint/Restart ) on our cluster with Grid Engine ckpt scripts to process the checkpoints and restart methods. In an effort to make things as easy as possible for our user base, I am using Grid Engine

Re: [gridengine users] running job holds and restart

2013-10-28 Thread Sangmin Park
Dear Reuti, I've edit the negative value in the priority section, short.q is 4, normal.q is 6 and long.q is 8, respectively. And I configured 72 cores for each queues. Below is matlab.q instance details. qname matlab.q hostlist @matlabhosts seq_no0

Re: [gridengine users] running job holds and restart

2013-10-28 Thread Reuti
Am 28.10.2013 um 12:30 schrieb Sangmin Park: I've edit the negative value in the priority section, short.q is 4, normal.q is 6 and long.q is 8, respectively. And I configured 72 cores for each queues. But you didn't answer the question: How do you limit the overall slot count? RQS oder

Re: [gridengine users] running job holds and restart

2013-10-28 Thread Reuti
Am 28.10.2013 um 13:45 schrieb Sangmin Park: This is the RQS limithosts {@parallelhosts} to slots=$num_proc limitqueues !matlab.q hosts {@matlabhosts} to slots=$num_proc parallelhosts include matlabhosts. slots value in the matlab.q means the number of cores per

Re: [gridengine users] running job holds and restart

2013-10-28 Thread Sangmin Park
yes, suspending the job when all 12 slots are used on a particular host. This is what I want to. So, I tried to submit job using 12 slots, but it did not work. Still not working.. --Sangmin On Mon, Oct 28, 2013 at 9:47 PM, Reuti re...@staff.uni-marburg.de wrote: Am 28.10.2013 um 13:45 schrieb

Re: [gridengine users] running job holds and restart

2013-10-28 Thread Reuti
Am 28.10.2013 um 13:59 schrieb Sangmin Park: yes, suspending the job when all 12 slots are used on a particular host. This is what I want to. So, I tried to submit job using 12 slots, but it did not work. Aha, it might be necessary to change the order of rules in your RQS. The first

Re: [gridengine users] running job holds and restart

2013-10-28 Thread Reuti
Am 28.10.2013 um 14:05 schrieb Reuti: Am 28.10.2013 um 13:59 schrieb Sangmin Park: yes, suspending the job when all 12 slots are used on a particular host. This is what I want to. So, I tried to submit job using 12 slots, but it did not work. Aha, it might be necessary to change the

[gridengine users] Son of GridEngine binaries compatibility

2013-10-28 Thread Txema Heredia
Hi all, Are SoGE 8.1.5 qmaster binaries compatible with SGE6.2u5 or OGS2011.11p1 installations? I have tried running it in a 6.2u5 cluster and all I get is a sge_qmaster problem / sge_qmaster didn't start! message leaving this in /var/log/messages: Oct 28 14:43:55 floquet kernel:

Re: [gridengine users] BLCR starter_method

2013-10-28 Thread Joseph Farran
Thanks Reuti as always. If you have a *default* starter_method script please post it as it will help many since it's tricky to get everything right for those of us who don't know GE inside-out. Best, Joseph On 10/28/2013 12:12 AM, Reuti wrote: Hi, Am 28.10.2013 um 01:21 schrieb Joseph

Re: [gridengine users] running job holds and restart

2013-10-28 Thread Sangmin Park
I tried to limit in RQS as you said. But, not working. RQS related with matlabhosts is only slotcap as below { name slotcap description maximum usable slots per node enabled TRUE limithosts {*} to slots=$num_proc limitqueues !matlab.q hosts {*} to