Re: [gridengine users] Requesting multiple cores on one node
Am 16.01.2014 um 00:24 schrieb Joseph Farran: Allison, I love Grid Engine but this is the one feature I truly miss from Torque: -l nodes=x:ppn=[count] RFE could be the setting allocation_rule $user_defined in a PE, which would still be necessary to be requested during job submission. snip The MPI job will NOT suspend jobs on the free64 queue.The job waits until free64 jobs are done and then the job runs and grabs the entire nodes correctly using the exclusive consumable. Is there a fix to this? So that jobs on free64 ARE suspended when using -l exclusive=true and pe mpi on our pub64 queue? The suspension is the result of a job having started on this particular node, and for a fraction of a second the node is oversubscribed. This is also the reason why you may need more slots defined on an exechost level, as otherwise the limited slot count wouldn't allow to start the superordinated job. What you can try: as the jobs in free64 are suspended anyway, attach the exclusive attribute on a queue level, i.e. attach it to pub64 instead to the exechost. The job will start in pub64 then and suspend the free64 jobs. Using other pe like openmp works just fine and jobs are suspended correctly. So it's only with this combo. Interesting - which version of SGE are you running? -- Reuti Joseph On 01/15/2014 02:58 PM, Reuti wrote: Am 15.01.2014 um 23:28 schrieb Allison Walters: We have OpenMP jobs that need a user-defined (usually more than one but less than all) number of cores on a single node for each job. In addition to running these jobs, our program has an interface to the cluster so they can submit jobs through a custom GUI (and we build the qsub command in the background for the submission). I'm trying to find a way for the job to request those multiple cores that does not depend on the cluster to be configured a certain way, since we have no control as to whether the client has a parallel environment created, how it's named, etc... This is not in the paradigm of SGE. You can only create a consumable complex, attach it to each exechost and request the correct amount for each job, even serial ones (by a default of 1). But in this case, the memory requests (or other) won't be multiplied, as SGE always thinks it's a serial job. But then you replace the custom PE by a custom complex. Basically, I'm just looking for the equivalent of -l nodes=[count] Wouldn't it be: -l nodes=1:ppn=[count] For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - depending on a setting somewhere in Torque (i.e. for all types of job the same will be applied all the time). It could spawn more than a node in either case. -- Reuti in PBS/Torque, or -n [count] in LSF, etc... The program will use the correct number of cores we pass to it, but we need to pass that parameter to the cluster as well to ensure it only gets sent to a node with the correct amount of cores available. This works fine in the other clusters we support but I'm completely at a loss as to how to do it in Grid Engine. I feel like I must be missing something! :-) Thank you. -Allison ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Requesting multiple cores on one node
Am 16.01.2014 um 05:24 schrieb tele...@shaw.ca: Sorry - you're correct, I meant -l nodes=1:ppn=[count] . :-) Hmmm...we've had some requests from clients specifically to support SGE, but this is a pretty key part of our functionality. Currently we can submit, but without the way to specify cores, the clients won't get the timing results they expect at all. Out of curiosity, does anyone have any good references for *why* this isn't the paradigm I would say: the idea was, that the admin of the cluster should provide all the necessary setups and scripts to achieve a tight integration of a parallel application into the cluster. This is nothing the end user of a cluster should deal with. AFAIK there is no start_proc_args or control_slaves in Torque to reformat the generated list of machines or catch a plain rsh/ssh call of a user's application. 10 years ago LAM/MPI or MPICH(1) weren't running out of the box in a cluster in a tightly integrated way. And there the PE's definition was a great help (still for other's like Linda or IBM's Platform MPI). Sure, this changed over time as nowadays Open MPI and MPICH2/3 are supporting SGE's `qrsh` and Torque's TM (Task Manager) directly. Another reason may have been, to limit certain applications (i.e. PEs) to a subset of nodes and how many slots are used for a particular applications. This is nothing what can be controlled in Torque I think. On the other hand: if you have an application which needs 4 cores on every machine (i.e. allocation_rule 4 resp. -l nodes=5:ppn=4), how can you prevent a user from requesting -l nodes=2:ppn=4+6:ppn=2 in Torque for a 20 core job? (it's certainly the atypical choice, from all the schedulers I've worked with), with more detail on how/why this system works in its place? That might give me some insights on how to proceed. Taking Open MPI and MPICH2/3 into account, an RFE could be that a PE called smp is always defined by default (and it's even not possible to remove it), but it's up to the user to attach it to certain machines/hostgroups. Nevertheless, as the necessary settings in such a PE are the ones which are offered by SGE by default when you create a new PE, the definition of it is quite easy anyway. The only thing I can think of right now is some sort of script that they can run as part of installing our program that automatically sets up the necessary pe's for them, but I'm still foggy enough on pe's that I'm not sure if that's possible without knowing the details of their hardware. The hardware doesn't matter much for such a PE, more the various defined queues: to which one you will attach the newly created PE? During installation you can check whether there is such a PE (scan all PEs and look for allocation_rule $pe_slots and a proper slot count therein = use this name for the PE request) and whether it's attached to any queue. Well, this doesn't prevent the admin of the cluster to change the setting after installation of your software though. -- Reuti Thoughts? Thank you! -Allison On 15/01/2014 3:58 PM, Reuti wrote: Am 15.01.2014 um 23:28 schrieb Allison Walters: We have OpenMP jobs that need a user-defined (usually more than one but less than all) number of cores on a single node for each job. In addition to running these jobs, our program has an interface to the cluster so they can submit jobs through a custom GUI (and we build the qsub command in the background for the submission). I'm trying to find a way for the job to request those multiple cores that does not depend on the cluster to be configured a certain way, since we have no control as to whether the client has a parallel environment created, how it's named, etc... This is not in the paradigm of SGE. You can only create a consumable complex, attach it to each exechost and request the correct amount for each job, even serial ones (by a default of 1). But in this case, the memory requests (or other) won't be multiplied, as SGE always thinks it's a serial job. But then you replace the custom PE by a custom complex. Basically, I'm just looking for the equivalent of -l nodes=[count] Wouldn't it be: -l nodes=1:ppn=[count] For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - depending on a setting somewhere in Torque (i.e. for all types of job the same will be applied all the time). It could spawn more than a node in either case. -- Reuti in PBS/Torque, or -n [count] in LSF, etc... The program will use the correct number of cores we pass to it, but we need to pass that parameter to the cluster as well to ensure it only gets sent to a node with the correct amount of cores available. This works fine in the other clusters we support but I'm completely at a loss as to how to do it in Grid Engine. I feel like I must be missing something! :-) Thank you. -Allison
Re: [gridengine users] Requesting multiple cores on one node
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 15/01/14 22:28, Allison Walters wrote: We have OpenMP jobs that need a user-defined (usually more than one but less than all) number of cores on a single node for each job. In addition to running these jobs, our program has an interface to the cluster so they can submit jobs through a custom GUI (and we build the qsub command in the background for the submission). I'm trying to find a way for the job to request those multiple cores that does not depend on the cluster to be configured a certain way, since we have no control as to whether the client has a parallel environment created, how it's named, etc... Basically, I'm just looking for the equivalent of -l nodes=[count] in PBS/Torque, or -n [count] in LSF, etc... The program will use the correct number of cores we pass to it, but we need to pass that parameter to the cluster as well to ensure it only gets sent to a node with the correct amount of cores available. This works fine in the other clusters we support but I'm completely at a loss as to how to do it in Grid Engine. I feel like I must be missing something! :-) Thank you. I think the difference is underlying philosophy. Grid Engine: 1)Give a lot of flexibility to the administrator. This makes it easier for the admin to do things the designers didn't anticipate. It is less a batch scheduler more a batch scheduler construction kit with a reasonable sample configuration. Despite attempts to port it to Windows Grid Engine is basically a unix batch scheduler and follows the unix philosophy. 2)For the most part request the resources you need rather than try to tell the scheduler how to allocate jobs to nodes. Which resources you can request is up to the admin. Your actual problem isn't soluble in a configuration agnostic way but on many SGE clusters: There is a pe called smp with an allocation rule of $pe_slots. There is also commonly a resource called exclusive that can be requested to get exclusive access to a host. However this is just a common configuration and wouldn't work on the cluster I admin for example. You might choose to assume that anyone who doesn't use the configuration outlined above is familiar with their configuration and can adapt your job submission mechanism if you let them. If you want the more general ppn then the multiple PE's with allocation rules 1,2,3.. up to the largest number of cores on a single node should do it (never tried this just going by the man page). William -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS16S8AAoJEKCzH4joEjNW93UP/A6Nd0+wdKAENjsMKQfuoktL ZcGTUrMco7EaqV5V20UE1FHXfl26U/DM6tffPtWcQsFAGwRkYHNxEMCkE76F4CnQ M4x+IEJMMo4B85L5kp2/IiaoiE0Fe2AJCm8cez5kw1uEj39G+RVE0PDQW07HF8eG oksxHHKe+olr7BAEE5vdePhIjij78rH+H8qc6kr/KI507N0/eMocxkIKc26hC+JK zoTNnhgfC3VG4lnG1PUvqHXG9mYhZ+eaz+VLhHSaPnXoiZyP4KAREMfxX8DEnZSJ +IKibConmS8StWABBpmSxEfEZ/5rwqulvJP2UkrfwbZm8OzStNZjWnexCCoV9sJd GYtD66om4/3rMtnpPJYwemtII1qzo4OvZXEggCRPEsjBJTUitgeJ2RpfSqpyhMwO ABg4FY+//XDxr4WKrjpOy7dlEzCbCzadV5bh8aXhOILlVesbcZO1afVRKJ6vEil3 Zm0JasHL+pNLGDK1Qh5Zc37Gp9lsKgEK5ClbmSSzX+bjO2MeHkM7FUvzUHnqXo4c TIH95nM699X2CJju3ZIsSoO6yF3zRL8Yy1XCMkq3z7rNxx+jHq38bV+JZltyGNOe vfSyr0b3lAKDgdlAGnfYdlYj0KlThdWXQpMNUCrXagohIOrlT8LuJLJbOeLpVSS6 ztDhufhYTgUsclRefbBM =EuXr -END PGP SIGNATURE- ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Requesting multiple cores on one node
Am 16.01.2014 um 09:54 schrieb Reuti: Am 16.01.2014 um 05:24 schrieb tele...@shaw.ca: Sorry - you're correct, I meant -l nodes=1:ppn=[count] . :-) Hmmm...we've had some requests from clients specifically to support SGE, but this is a pretty key part of our functionality. Currently we can submit, but without the way to specify cores, the clients won't get the timing results they expect at all. Out of curiosity, does anyone have any good references for *why* this isn't the paradigm I would say: the idea was, that the admin of the cluster should provide all the necessary setups and scripts to achieve a tight integration of a parallel application into the cluster. This is nothing the end user of a cluster should deal with. AFAIK there is no start_proc_args or control_slaves in Torque to reformat the generated list of machines or catch a plain rsh/ssh call of a user's application. 10 years ago LAM/MPI or MPICH(1) weren't running out of the box in a cluster in a tightly integrated way. And there the PE's definition was a great help (still for other's like Linda or IBM's Platform MPI). Sure, this changed over time as nowadays Open MPI and MPICH2/3 are supporting SGE's `qrsh` and Torque's TM (Task Manager) directly. Another reason may have been, to limit certain applications (i.e. PEs) to a subset of nodes and how many slots are used for a particular applications. This is nothing what can be controlled in Torque I think. On the other hand: if you have an application which needs 4 cores on every machine (i.e. allocation_rule 4 resp. -l nodes=5:ppn=4), how can you prevent a user from requesting -l nodes=2:ppn=4+6:ppn=2 in Torque for a 20 core job? NB: I just recall that once I faced the issue in Torque where -l nodes=5:ppn=4 gave me 8 cores on a machine, and it was somewhere a (clusterwide) setting in Torque/Maui to allow bunches of 4 to be allocated more than once on a machine what we didn't want (in SGE it's only giving you 4 once on each machine). Different queuing system having different features - different constraints. -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Requesting multiple cores on one node
- Original Message - From: Reuti re...@staff.uni-marburg.de The hardware doesn't matter much for such a PE, more the various defined queues: to which one you will attach the newly created PE? During installation you can check whether there is such a PE (scan all PEs and look for allocation_rule $pe_slots and a proper slot count therein = use this name for the PE request) and whether it's attached to any queue. Well, this doesn't prevent the admin of the cluster to change the setting after installation of your software though. I'm not worried about whether they change things afterwards, so much - then it's something unsupported they've done. :-) I'm mostly worried about the scenario of them having to do as little as possible in the first place to get it to work. Thank you for the information! I'm going to take some more time to digest, and then see what I can come up with. -Allison ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Requesting multiple cores on one node
Great info - thank you! -Allison - Original Message - From: William Hay w@ucl.ac.uk To: Allison Walters tele...@shaw.ca, users@gridengine.org Sent: Thursday, January 16, 2014 2:22:04 AM Subject: Re: [gridengine users] Requesting multiple cores on one node I think the difference is underlying philosophy. Grid Engine: 1)Give a lot of flexibility to the administrator. This makes it easier for the admin to do things the designers didn't anticipate. It is less a batch scheduler more a batch scheduler construction kit with a reasonable sample configuration. Despite attempts to port it to Windows Grid Engine is basically a unix batch scheduler and follows the unix philosophy. 2)For the most part request the resources you need rather than try to tell the scheduler how to allocate jobs to nodes. Which resources you can request is up to the admin. Your actual problem isn't soluble in a configuration agnostic way but on many SGE clusters: There is a pe called smp with an allocation rule of $pe_slots. There is also commonly a resource called exclusive that can be requested to get exclusive access to a host. However this is just a common configuration and wouldn't work on the cluster I admin for example. You might choose to assume that anyone who doesn't use the configuration outlined above is familiar with their configuration and can adapt your job submission mechanism if you let them. If you want the more general ppn then the multiple PE's with allocation rules 1,2,3.. up to the largest number of cores on a single node should do it (never tried this just going by the man page). William -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS16S8AAoJEKCzH4joEjNW93UP/A6Nd0+wdKAENjsMKQfuoktL ZcGTUrMco7EaqV5V20UE1FHXfl26U/DM6tffPtWcQsFAGwRkYHNxEMCkE76F4CnQ M4x+IEJMMo4B85L5kp2/IiaoiE0Fe2AJCm8cez5kw1uEj39G+RVE0PDQW07HF8eG oksxHHKe+olr7BAEE5vdePhIjij78rH+H8qc6kr/KI507N0/eMocxkIKc26hC+JK zoTNnhgfC3VG4lnG1PUvqHXG9mYhZ+eaz+VLhHSaPnXoiZyP4KAREMfxX8DEnZSJ +IKibConmS8StWABBpmSxEfEZ/5rwqulvJP2UkrfwbZm8OzStNZjWnexCCoV9sJd GYtD66om4/3rMtnpPJYwemtII1qzo4OvZXEggCRPEsjBJTUitgeJ2RpfSqpyhMwO ABg4FY+//XDxr4WKrjpOy7dlEzCbCzadV5bh8aXhOILlVesbcZO1afVRKJ6vEil3 Zm0JasHL+pNLGDK1Qh5Zc37Gp9lsKgEK5ClbmSSzX+bjO2MeHkM7FUvzUHnqXo4c TIH95nM699X2CJju3ZIsSoO6yF3zRL8Yy1XCMkq3z7rNxx+jHq38bV+JZltyGNOe vfSyr0b3lAKDgdlAGnfYdlYj0KlThdWXQpMNUCrXagohIOrlT8LuJLJbOeLpVSS6 ztDhufhYTgUsclRefbBM =EuXr -END PGP SIGNATURE- ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Requesting multiple cores on one node
Am 15.01.2014 um 23:28 schrieb Allison Walters: We have OpenMP jobs that need a user-defined (usually more than one but less than all) number of cores on a single node for each job. In addition to running these jobs, our program has an interface to the cluster so they can submit jobs through a custom GUI (and we build the qsub command in the background for the submission). I'm trying to find a way for the job to request those multiple cores that does not depend on the cluster to be configured a certain way, since we have no control as to whether the client has a parallel environment created, how it's named, etc... This is not in the paradigm of SGE. You can only create a consumable complex, attach it to each exechost and request the correct amount for each job, even serial ones (by a default of 1). But in this case, the memory requests (or other) won't be multiplied, as SGE always thinks it's a serial job. But then you replace the custom PE by a custom complex. Basically, I'm just looking for the equivalent of -l nodes=[count] Wouldn't it be: -l nodes=1:ppn=[count] For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - depending on a setting somewhere in Torque (i.e. for all types of job the same will be applied all the time). It could spawn more than a node in either case. -- Reuti in PBS/Torque, or -n [count] in LSF, etc... The program will use the correct number of cores we pass to it, but we need to pass that parameter to the cluster as well to ensure it only gets sent to a node with the correct amount of cores available. This works fine in the other clusters we support but I'm completely at a loss as to how to do it in Grid Engine. I feel like I must be missing something! :-) Thank you. -Allison ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Requesting multiple cores on one node
Allison, I love Grid Engine but this is the one feature I truly miss from Torque: -l nodes=x:ppn=[count] Reuti, We have a complex setup trying to accomplish this same thing and it kind of works but we have an issue with job not starting when jobs are running on a subordinate queue. First, here is our setup: qconf -sc | egrep #|exclu #name shortcut typerelop requestable consumable default urgency #-- exclusive excl BOOLEXCLYES YESFALSE 1000 Our MPI PE has: $ qconf -sp mpi pe_namempi slots user_lists NONE xuser_listsNONE start_proc_argsNONE stop_proc_args NONE allocation_rule$fill_up control_slaves TRUE job_is_first_task TRUE urgency_slots min accounting_summary TRUE qsort_args NONE Our two queues: $ qconf -sq free64 | grep sub subordinate_list NONE $ qconf -sq pub64 | grep sub subordinate_list free64=1 When we submit our MPI jobs to pub64 with: #!/bin/bash #$ -q pub64 #$ -pe mpi 256 #$ -l exclusive=true The MPI job will NOT suspend jobs on the free64 queue.The job waits until free64 jobs are done and then the job runs and grabs the entire nodes correctly using the exclusive consumable. Is there a fix to this? So that jobs on free64 ARE suspended when using -l exclusive=true and pe mpi on our pub64 queue? Using other pe like openmp works just fine and jobs are suspended correctly. So it's only with this combo. Joseph On 01/15/2014 02:58 PM, Reuti wrote: Am 15.01.2014 um 23:28 schrieb Allison Walters: We have OpenMP jobs that need a user-defined (usually more than one but less than all) number of cores on a single node for each job. In addition to running these jobs, our program has an interface to the cluster so they can submit jobs through a custom GUI (and we build the qsub command in the background for the submission). I'm trying to find a way for the job to request those multiple cores that does not depend on the cluster to be configured a certain way, since we have no control as to whether the client has a parallel environment created, how it's named, etc... This is not in the paradigm of SGE. You can only create a consumable complex, attach it to each exechost and request the correct amount for each job, even serial ones (by a default of 1). But in this case, the memory requests (or other) won't be multiplied, as SGE always thinks it's a serial job. But then you replace the custom PE by a custom complex. Basically, I'm just looking for the equivalent of -l nodes=[count] Wouldn't it be: -l nodes=1:ppn=[count] For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - depending on a setting somewhere in Torque (i.e. for all types of job the same will be applied all the time). It could spawn more than a node in either case. -- Reuti in PBS/Torque, or -n [count] in LSF, etc... The program will use the correct number of cores we pass to it, but we need to pass that parameter to the cluster as well to ensure it only gets sent to a node with the correct amount of cores available. This works fine in the other clusters we support but I'm completely at a loss as to how to do it in Grid Engine. I feel like I must be missing something! :-) Thank you. -Allison ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Requesting multiple cores on one node
Sorry - you're correct, I meant -l nodes=1:ppn=[count] . :-) Hmmm...we've had some requests from clients specifically to support SGE, but this is a pretty key part of our functionality. Currently we can submit, but without the way to specify cores, the clients won't get the timing results they expect at all. Out of curiosity, does anyone have any good references for *why* this isn't the paradigm (it's certainly the atypical choice, from all the schedulers I've worked with), with more detail on how/why this system works in its place? That might give me some insights on how to proceed. The only thing I can think of right now is some sort of script that they can run as part of installing our program that automatically sets up the necessary pe's for them, but I'm still foggy enough on pe's that I'm not sure if that's possible without knowing the details of their hardware. Thoughts? Thank you! -Allison On 15/01/2014 3:58 PM, Reuti wrote: Am 15.01.2014 um 23:28 schrieb Allison Walters: We have OpenMP jobs that need a user-defined (usually more than one but less than all) number of cores on a single node for each job. In addition to running these jobs, our program has an interface to the cluster so they can submit jobs through a custom GUI (and we build the qsub command in the background for the submission). I'm trying to find a way for the job to request those multiple cores that does not depend on the cluster to be configured a certain way, since we have no control as to whether the client has a parallel environment created, how it's named, etc... This is not in the paradigm of SGE. You can only create a consumable complex, attach it to each exechost and request the correct amount for each job, even serial ones (by a default of 1). But in this case, the memory requests (or other) won't be multiplied, as SGE always thinks it's a serial job. But then you replace the custom PE by a custom complex. Basically, I'm just looking for the equivalent of -l nodes=[count] Wouldn't it be: -l nodes=1:ppn=[count] For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - depending on a setting somewhere in Torque (i.e. for all types of job the same will be applied all the time). It could spawn more than a node in either case. -- Reuti in PBS/Torque, or -n [count] in LSF, etc... The program will use the correct number of cores we pass to it, but we need to pass that parameter to the cluster as well to ensure it only gets sent to a node with the correct amount of cores available. This works fine in the other clusters we support but I'm completely at a loss as to how to do it in Grid Engine. I feel like I must be missing something! :-) Thank you. -Allison ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users