[gridengine users] SoGE 8.1.8 - Forking a shell script from within prolog script - will it work?
Hallo to distinguished forum members, Is it possible to fork a different script from within a SGE prolog script? Will it work? Like below: prolog.sh: #!/bin/bash my_script.sh We are using SoGE 8.1.8. Any tips will be greatly appreciated. Thank You. Yuri Burmachenko | Sr. Engineer | IT | Mellanox Technologies Ltd. Work: +972 74 7236386 | Cell +972 54 7542188 |Fax: +972 4 959 3245 Follow us on Twitterhttp://twitter.com/mellanoxtech and Facebookhttp://www.facebook.com/pages/Mellanox-Technologies/223164879116 ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] SoGE 8.1.8 - Forking script from within prolog does not work for interactive jobs while worked with SGE 6.1u6
Can anyone assist? Any tips on how to resolve, will be greatly appreciated. Thank You. From: Yuri Burmachenko Sent: Tuesday, April 07, 2015 4:41 PM To: users@gridengine.org Cc: Dmitry Leibovich; Yuval Leader Subject: SoGE 8.1.8 - Forking script from within prolog does not work for interactive jobs while worked with SGE 6.1u6 Hallo to distinguished forum members, I hope you can assist me. Since we have migrated to SoGE 8.1.8 from SGE 6.1u6 we experience some issue when try to fork a special script from within a prolog script. We have the following line of code in the prolog script: /usr/bin/sudo -u sgeadmin /home/sgeadmin/bin/getmemusage.sh ${JOB_ID} ${SGE_ROOT} This script is forked and does memory monitoring by tracing it during job lifecycle. While it worked with SGE 6.1u6, in SoGE 8.1.8 we don't see it working with interactive jobs - it simply not invoked. It does work with regular batch jobs. Any tips on how to resolve, will be greatly appreciated. Thank You. Yuri Burmachenko | Sr. Engineer | IT | Mellanox Technologies Ltd. Work: +972 74 7236386 | Cell +972 54 7542188 |Fax: +972 4 959 3245 Follow us on Twitterhttp://twitter.com/mellanoxtech and Facebookhttp://www.facebook.com/pages/Mellanox-Technologies/223164879116 ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] project quota details
Ed, thanks for the reply and offer. Here's an example of what I want to do (for slot quotas. I figure setup for h_vmem and s_vmem rqs's will be similar): project name# slots aggregate # slots user users ---- --- lab1 100 40 ted,ann,bob,jim lab2 80 15 ann,cin,fred,jen 50 lan (lan's a power user in lab2) For now each project will be assigned exclusively to one of two qsub queues, but that part should be straight forward if you just show me how to handle the above for one queue. And each will go to the qlogin queue with different slot limits, but that should be straightforward too if I know how to do it for qsub queue. The FE is 2 years old, has 16 2.2 GHz Xeon cores, and 64GB ram. Thanks! -M On Mon, Apr 13, 2015 at 9:05 PM, Ed Lauzier elauzi...@perlstar.com wrote: Hi Michael, Send some basic examples of what you want to do and I'll fire off a basic RQS config that will get you going. There is a lot to it, esp for project-level fairshare settings Also, remember that if you want decent response, you need a scheduler with at least 2 cpus. Best to have 4 cpus so that decisions can be made faster during the scheduling cycle and worker threads can do their thing Also consider looking using the perl JSV for runtime limits enforcements. It may be best to get Univa to assist you for a day even over the phone if you can justify the expense. It is well worth it to get the new Univa Grid Engine. -Ed -Original Message- *From:* Reuti [mailto:re...@staff.uni-marburg.de] *Sent:* Monday, April 13, 2015 06:22 PM *To:* 'Michael Stauffer' *Cc:* 'Gridengine Users Group' *Subject:* Re: [gridengine users] project quota details Hi, Am 13.04.2015 um 23:24 schrieb Michael Stauffer: OGS/GE 2011.11p1 (Rocks 6.1) Hi, I'm looking to setup project-based quota management. I'm at a university and different labs will be signing up for different quotas for the users in their labs. I understand that I can: - add users to one or more projects - assign quotas (slots and memory in my case) to projects that will limit the total concurrent resource usage by project - have users choose a particular project when they submit a job (needed for users who do work for multiple labs) I'm wondering if I can also set a per-user quota within a project quota that will limit how much of a resource any individual from the project can use at once. That is, I'd like that limit to be lower than the project's limit on all project users, so that no one user in a project can use all the project's resources at once. Could different per-user quotas be assigned for different users within a project? e.g. a power user in a project might generally need more slots than other users. Yes. You need to phrase these individual limits in a second RQS. I.e. one RQS will limit the overall consumption per project, the second one will limit the combinations of projects and users to varying limits. -- Reuti Any suggestions on strategies for this kind of resource management would be a great help. Thanks -M ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
[gridengine users] limit slots to core count no longer works
Hello, We (fairly) recently upgraded our cluster to Rocks 6.1.1 and we now seem to be having problems with RQS. On our old cluster, we had an RQS quota set as follows: { name host-slots description restrict slots to core count enabled TRUE limithosts {*} to slots=$num_proc } The reason for this was to try to prevent oversubscription of the processors on the clients. Now, if I have this quota enabled, jobs that are submitted don't start and if I do a 'qstat -j job-number' under scheduling info I see things like cannot run because it exceeds limit compute-0-7/ in rule host-slots/1 cannot run because it exceeds limit compute-0-7/ in rule host-slots/1 (-l slots=1) cannot run in queue compute-0-39.local because it offers only hc:slots=0.00 cannot run because it exceeds limit compute-0-78/ in rule host-slots/1 cannot run because it exceeds limit compute-0-78/ in rule host-slots/1 cannot run because it exceeds limit compute-0-55/ in rule host-slots/1 cannot run because it exceeds limit compute-0-55/ in rule host-slots/1 cannot run because it exceeds limit compute-0-74/ in rule host-slots/1 cannot run because it exceeds limit compute-0-74/ in rule host-slots/1 cannot run because it exceeds limit compute-2-7/ in rule host-slots/1 cannot run because it exceeds limit compute-2-1/ in rule host-slots/1 cannot run because it exceeds limit compute-2-2/ in rule host-slots/1 cannot run because it exceeds limit compute-0-22/ in rule host-slots/1 cannot run because it exceeds limit compute-0-22/ in rule host-slots/1 cannot run because it exceeds limit compute-1-2/ in rule host-slots/1 cannot run in PE mpich because it only offers 0 slots But as soon as I run 'qconf -mrqs' and change TRUE to FALSE, the job runs. Has the process for preventing oversubscription changed? Any ideas? JY ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] limit slots to core count no longer works
What is $num_proc? Did you try to set a real number? Like limit hosts {*} to slots=12? On Tue, Apr 14, 2015 at 3:32 PM, John Young j.e.yo...@larc.nasa.gov wrote: Hello, We (fairly) recently upgraded our cluster to Rocks 6.1.1 and we now seem to be having problems with RQS. On our old cluster, we had an RQS quota set as follows: { name host-slots description restrict slots to core count enabled TRUE limithosts {*} to slots=$num_proc } The reason for this was to try to prevent oversubscription of the processors on the clients. Now, if I have this quota enabled, jobs that are submitted don't start and if I do a 'qstat -j job-number' under scheduling info I see things like cannot run because it exceeds limit compute-0-7/ in rule host-slots/1 cannot run because it exceeds limit compute-0-7/ in rule host-slots/1 (-l slots=1) cannot run in queue compute-0-39.local because it offers only hc:slots=0.00 cannot run because it exceeds limit compute-0-78/ in rule host-slots/1 cannot run because it exceeds limit compute-0-78/ in rule host-slots/1 cannot run because it exceeds limit compute-0-55/ in rule host-slots/1 cannot run because it exceeds limit compute-0-55/ in rule host-slots/1 cannot run because it exceeds limit compute-0-74/ in rule host-slots/1 cannot run because it exceeds limit compute-0-74/ in rule host-slots/1 cannot run because it exceeds limit compute-2-7/ in rule host-slots/1 cannot run because it exceeds limit compute-2-1/ in rule host-slots/1 cannot run because it exceeds limit compute-2-2/ in rule host-slots/1 cannot run because it exceeds limit compute-0-22/ in rule host-slots/1 cannot run because it exceeds limit compute-0-22/ in rule host-slots/1 cannot run because it exceeds limit compute-1-2/ in rule host-slots/1 cannot run in PE mpich because it only offers 0 slots But as soon as I run 'qconf -mrqs' and change TRUE to FALSE, the job runs. Has the process for preventing oversubscription changed? Any ideas? JY ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Best, Feng ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users