[gridengine users] SoGE 8.1.8 - Forking a shell script from within prolog script - will it work?

2015-04-14 Thread Yuri Burmachenko
Hallo to distinguished forum members,

Is it possible to fork a different script from within a SGE prolog script? Will 
it work?
Like below:

prolog.sh:
#!/bin/bash
my_script.sh 

We are using SoGE 8.1.8.

Any tips will be greatly appreciated.
Thank You.


Yuri Burmachenko | Sr. Engineer | IT | Mellanox Technologies Ltd.
Work: +972 74 7236386 | Cell +972 54 7542188 |Fax: +972 4 959 3245
Follow us on Twitterhttp://twitter.com/mellanoxtech and 
Facebookhttp://www.facebook.com/pages/Mellanox-Technologies/223164879116

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] SoGE 8.1.8 - Forking script from within prolog does not work for interactive jobs while worked with SGE 6.1u6

2015-04-14 Thread Yuri Burmachenko
Can anyone assist?

Any tips on how to resolve, will be greatly appreciated.
Thank You.


From: Yuri Burmachenko
Sent: Tuesday, April 07, 2015 4:41 PM
To: users@gridengine.org
Cc: Dmitry Leibovich; Yuval Leader
Subject: SoGE 8.1.8 - Forking script from within prolog does not work for 
interactive jobs while worked with SGE 6.1u6

Hallo to distinguished forum members,

I hope you can assist me.

Since we have migrated to SoGE 8.1.8 from SGE 6.1u6 we experience some issue 
when try to fork a special script from within a prolog script.

We have the following line of code in the prolog script:

/usr/bin/sudo -u sgeadmin /home/sgeadmin/bin/getmemusage.sh ${JOB_ID} 
${SGE_ROOT} 

This script is forked and does memory monitoring by tracing it during job 
lifecycle.
While it worked with SGE 6.1u6, in SoGE 8.1.8 we don't see it working with 
interactive jobs - it simply not invoked.

It does work with regular batch jobs.

Any tips on how to resolve, will be greatly appreciated.
Thank You.


Yuri Burmachenko | Sr. Engineer | IT | Mellanox Technologies Ltd.
Work: +972 74 7236386 | Cell +972 54 7542188 |Fax: +972 4 959 3245
Follow us on Twitterhttp://twitter.com/mellanoxtech and 
Facebookhttp://www.facebook.com/pages/Mellanox-Technologies/223164879116

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] project quota details

2015-04-14 Thread Michael Stauffer
Ed, thanks for the reply and offer.

Here's an example of what I want to do (for slot quotas. I figure setup for
h_vmem and s_vmem rqs's will be similar):

project name# slots aggregate   # slots user   users

---- ---   
lab1  100 40
ted,ann,bob,jim
lab2  80   15
ann,cin,fred,jen
50
lan (lan's a power user in lab2)

For now each project will be assigned exclusively to one of two qsub
queues, but that part should be straight forward if you just show me how to
handle the above for one queue. And each will go to the qlogin queue with
different slot limits, but that should be straightforward too if I know how
to do it for qsub queue.

The FE is  2 years old, has 16 2.2 GHz Xeon cores, and 64GB ram.

Thanks!

-M

On Mon, Apr 13, 2015 at 9:05 PM, Ed Lauzier elauzi...@perlstar.com wrote:

 Hi Michael,

 Send some basic examples of what you want to do and I'll fire off a basic
 RQS config
 that will get you going.

 There is a lot to it, esp for project-level fairshare settings

 Also, remember that if you want decent response, you need a scheduler with
 at least 2 cpus.  Best to have 4 cpus so that decisions can be made faster
 during the scheduling cycle and worker threads can do their thing

 Also consider looking using the perl JSV for runtime limits enforcements.

 It may be best to get Univa to assist you for a day even over the phone
 if you can justify the expense.  It is well worth it to get the new Univa
 Grid Engine.


 -Ed

 -Original Message-
 *From:* Reuti [mailto:re...@staff.uni-marburg.de]
 *Sent:* Monday, April 13, 2015 06:22 PM
 *To:* 'Michael Stauffer'
 *Cc:* 'Gridengine Users Group'
 *Subject:* Re: [gridengine users] project quota details

 Hi, Am 13.04.2015 um 23:24 schrieb Michael Stauffer:  OGS/GE 2011.11p1
 (Rocks 6.1)   Hi,   I'm looking to setup project-based quota
 management. I'm at a university and different labs will be signing up for
 different quotas for the users in their labs.   I understand that I can:
   - add users to one or more projects  - assign quotas (slots and memory
 in my case) to projects that will limit the total concurrent resource usage
 by project  - have users choose a particular project when they submit a
 job (needed for users who do work for multiple labs)   I'm wondering if I
 can also set a per-user quota within a project quota that will limit how
 much of a resource any individual from the project can use at once. That
 is, I'd like that limit to be lower than the project's limit on all project
 users, so that no one user in a project can use all the project's resources
 at once.   Could different per-user quotas be assigned for different
 users within a project? e.g. a power user in a project might generally need
 more slots than other users. Yes. You need to phrase these individual
 limits in a second RQS. I.e. one RQS will limit the overall consumption per
 project, the second one will limit the combinations of projects and users
 to varying limits. -- Reuti   Any suggestions on strategies for this kind
 of resource management would be a great help.   Thanks   -M 
 ___  users mailing list 
 users@gridengine.org  https://gridengine.org/mailman/listinfo/users
 ___ users mailing list
 users@gridengine.org https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] limit slots to core count no longer works

2015-04-14 Thread John Young
Hello,

   We (fairly) recently upgraded our cluster to Rocks 6.1.1
and we now seem to be having problems with RQS.  On our old
cluster, we had an RQS quota set as follows:

{
   name host-slots
   description  restrict slots to core count
   enabled  TRUE
   limithosts {*} to slots=$num_proc
}

The reason for this was to try to prevent oversubscription
of the processors on the clients.  Now, if I have this quota
enabled, jobs that are submitted don't start and if I do a
'qstat -j job-number' under scheduling info I see things like

cannot run because it exceeds limit compute-0-7/ in rule host-slots/1
cannot run because it exceeds limit compute-0-7/ in rule host-slots/1
(-l slots=1) cannot run in queue compute-0-39.local because it offers only 
hc:slots=0.00
cannot run because it exceeds limit compute-0-78/ in rule host-slots/1
cannot run because it exceeds limit compute-0-78/ in rule host-slots/1
cannot run because it exceeds limit compute-0-55/ in rule host-slots/1
cannot run because it exceeds limit compute-0-55/ in rule host-slots/1
cannot run because it exceeds limit compute-0-74/ in rule host-slots/1
cannot run because it exceeds limit compute-0-74/ in rule host-slots/1
cannot run because it exceeds limit compute-2-7/ in rule host-slots/1
cannot run because it exceeds limit compute-2-1/ in rule host-slots/1
cannot run because it exceeds limit compute-2-2/ in rule host-slots/1
cannot run because it exceeds limit compute-0-22/ in rule host-slots/1
cannot run because it exceeds limit compute-0-22/ in rule host-slots/1
cannot run because it exceeds limit compute-1-2/ in rule host-slots/1
cannot run in PE mpich because it only offers 0 slots

But as soon as I run 'qconf -mrqs' and change TRUE to FALSE, the job runs.

Has the process for preventing oversubscription changed?  Any ideas?

JY

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] limit slots to core count no longer works

2015-04-14 Thread Feng Zhang
What is $num_proc? Did you try to set a real number? Like  limit
hosts {*} to slots=12?

On Tue, Apr 14, 2015 at 3:32 PM, John Young j.e.yo...@larc.nasa.gov wrote:
 Hello,

We (fairly) recently upgraded our cluster to Rocks 6.1.1
 and we now seem to be having problems with RQS.  On our old
 cluster, we had an RQS quota set as follows:

 {
name host-slots
description  restrict slots to core count
enabled  TRUE
limithosts {*} to slots=$num_proc
 }

 The reason for this was to try to prevent oversubscription
 of the processors on the clients.  Now, if I have this quota
 enabled, jobs that are submitted don't start and if I do a
 'qstat -j job-number' under scheduling info I see things like

 cannot run because it exceeds limit compute-0-7/ in rule host-slots/1
 cannot run because it exceeds limit compute-0-7/ in rule host-slots/1
 (-l slots=1) cannot run in queue compute-0-39.local because it offers only 
 hc:slots=0.00
 cannot run because it exceeds limit compute-0-78/ in rule host-slots/1
 cannot run because it exceeds limit compute-0-78/ in rule host-slots/1
 cannot run because it exceeds limit compute-0-55/ in rule host-slots/1
 cannot run because it exceeds limit compute-0-55/ in rule host-slots/1
 cannot run because it exceeds limit compute-0-74/ in rule host-slots/1
 cannot run because it exceeds limit compute-0-74/ in rule host-slots/1
 cannot run because it exceeds limit compute-2-7/ in rule host-slots/1
 cannot run because it exceeds limit compute-2-1/ in rule host-slots/1
 cannot run because it exceeds limit compute-2-2/ in rule host-slots/1
 cannot run because it exceeds limit compute-0-22/ in rule host-slots/1
 cannot run because it exceeds limit compute-0-22/ in rule host-slots/1
 cannot run because it exceeds limit compute-1-2/ in rule host-slots/1
 cannot run in PE mpich because it only offers 0 slots

 But as soon as I run 'qconf -mrqs' and change TRUE to FALSE, the job runs.

 Has the process for preventing oversubscription changed?  Any ideas?

 JY

 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users



-- 
Best,

Feng
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users