Re: [galaxy-dev] suggestion for multithreading
Andrew Warren wrote: > So would the current correct method for setting up multi-threaded jobs on a > cluster be to specify custom runners in the [galaxy:tool_runners] section of > the universe config file for EVERY tool that uses a multiple threads > (assuming the default is set to one)? > > For example, for the bowtie program and a queue named "galaxy": > *bowtie = pbs:///galaxy/-l ppn=4,mem=16gb/* Hi Andrew, You'll need to use the tool id from the tag in the XML config file. The bowtie file is 'tools/sr_mapping/bowtie_wrapper.xml' and the tool id is 'bowtie_wrapper'. Unfortunately, you also need to set the number of threads in the same XML file, although 4 happens to be the default: --threads="4" Unfortunately this value isn't read from the config currently. --nate > * > * > Is this currently the only way for galaxy to inform the queuing system how > many threads a program will use? > And does this mean that without custom runners in the config file any > muti-threaded program that has multiple instances in an asychronous workflow > has the opportunity to overload a cluster node since the queuing system > doesn't "know" how many threads the program will be using? > > Just want to make sure I'm not missing out on the latest and greatest method > for process management. :) > > Thanks, > Andrew > * > * > Louise-Amélie Schmitt wrote: > > > > > > > default_cluster_job_runner will remain for backwards compatibility, but > > > we'll ship a sample job_conf.xml that runs everything locally by > > > default. > > > > > > --nate > > > > Haha, and I did that before realizing I could do just what I needed by > > writing tool-specific *pbs*:// URLs at the end of the config file... I'm > such > > an idiot. > > Haha, okay, I don't think i even noticed since I was distracted by your > implementation being a step in the way we want to go with it. > > > But I really like what you did of it and I have a couple of questions. > > > > Concerning the single-threaded tools, what would happen if the number of > > threads set in the xml file was >1 ? > > It'd consume extra slots, but the tool itself would just run as usual. > > > Could it be possible to forbid a tool to run on a given node? > > Hrm. In *PBS* you could do it using node properties/neednodes or resource > requirements. I'd have to think a bit about how to do this in a more > general way in the XML. > > --nate > > > > > Thanks, > > L-A > > > > > > > > > >> > > >> Peter > > >> > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] suggestion for multithreading
Well, you still can use my method, which I described at the beginning of the thread. But that means modifying some code. If I'm not mistaken, Galaxy's built-in scheduler is a simple FIFO scheduler with no means to tune the needed resources. So if you set multithreaded tools, yeah I guess the nodes can expect surprises. That could happen with pbs too if you don't set the proper number of needed cpu per node / necessary amount of memory. Or I missed something too. Best, L-A Le 08/08/2011 21:07, Andrew Warren a écrit : So would the current correct method for setting up multi-threaded jobs on a cluster be to specify custom runners in the [galaxy:tool_runners] section of the universe config file for EVERY tool that uses a multiple threads (assuming the default is set to one)? For example, for the bowtie program and a queue named "galaxy": *bowtie = pbs:///galaxy/-l ppn=4,mem=16gb/* * * Is this currently the only way for galaxy to inform the queuing system how many threads a program will use? And does this mean that without custom runners in the config file any muti-threaded program that has multiple instances in an asychronous workflow has the opportunity to overload a cluster node since the queuing system doesn't "know" how many threads the program will be using? Just want to make sure I'm not missing out on the latest and greatest method for process management. :) Thanks, Andrew * * Louise-Amélie Schmitt wrote: > > > > default_cluster_job_runner will remain for backwards compatibility, but > > we'll ship a sample job_conf.xml that runs everything locally by > > default. > > > > --nate > > Haha, and I did that before realizing I could do just what I needed by > writing tool-specific *pbs*:// URLs at the end of the config file... I'm such > an idiot. Haha, okay, I don't think i even noticed since I was distracted by your implementation being a step in the way we want to go with it. > But I really like what you did of it and I have a couple of questions. > > Concerning the single-threaded tools, what would happen if the number of > threads set in the xml file was >1 ? It'd consume extra slots, but the tool itself would just run as usual. > Could it be possible to forbid a tool to run on a given node? Hrm. In *PBS* you could do it using node properties/neednodes or resource requirements. I'd have to think a bit about how to do this in a more general way in the XML. --nate > > Thanks, > L-A > > > > > >> > >> Peter > >> ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] suggestion for multithreading
So would the current correct method for setting up multi-threaded jobs on a cluster be to specify custom runners in the [galaxy:tool_runners] section of the universe config file for EVERY tool that uses a multiple threads (assuming the default is set to one)? For example, for the bowtie program and a queue named "galaxy": *bowtie = pbs:///galaxy/-l ppn=4,mem=16gb/* * * Is this currently the only way for galaxy to inform the queuing system how many threads a program will use? And does this mean that without custom runners in the config file any muti-threaded program that has multiple instances in an asychronous workflow has the opportunity to overload a cluster node since the queuing system doesn't "know" how many threads the program will be using? Just want to make sure I'm not missing out on the latest and greatest method for process management. :) Thanks, Andrew * * Louise-Amélie Schmitt wrote: > > > > default_cluster_job_runner will remain for backwards compatibility, but > > we'll ship a sample job_conf.xml that runs everything locally by > > default. > > > > --nate > > Haha, and I did that before realizing I could do just what I needed by > writing tool-specific *pbs*:// URLs at the end of the config file... I'm such > an idiot. Haha, okay, I don't think i even noticed since I was distracted by your implementation being a step in the way we want to go with it. > But I really like what you did of it and I have a couple of questions. > > Concerning the single-threaded tools, what would happen if the number of > threads set in the xml file was >1 ? It'd consume extra slots, but the tool itself would just run as usual. > Could it be possible to forbid a tool to run on a given node? Hrm. In *PBS* you could do it using node properties/neednodes or resource requirements. I'd have to think a bit about how to do this in a more general way in the XML. --nate > > Thanks, > L-A > > > > > >> > >> Peter > >> ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] suggestion for multithreading
Assaf Gordon wrote: > (moved to galaxy-dev) > > Nate Coraor wrote, On 06/02/2011 01:31 PM: > > Peter Cock wrote: > >> On Thu, Jun 2, 2011 at 6:23 PM, Nate Coraor wrote: > >>> > >>> pbs.py then knows to translate '8' to > >>> '-l nodes=1:ppn=8'. > >>> > >>> Your tool can access that value a bunch, like $__resources__.cores. > >>> > >>> The same should be possible for other consumables. > >>> > > Just a thought here: > > The actual parameters that are passed to the scheduler are not necessarily > hard-coded. > Meaning, at least with SGE, specifying the number of cores can be: > qsub -pe threads=8 > or > qsub -pe cores=8 > or > qsub -pe jiffies=8 > > and same thing for memory limitation (e.g. "-l virtual_free=800M"). > > The reason is that those resources (e.g. "threads", "cores", "virtual_free") > are just identifiers, and they are created and configured by whomever > installed SGE - they are not built-in or hard-coded). > > So just be careful in your design/implementation when automatically > translating XML resources to hard-coded parameters. > > If you do hard-code them, just make sure the specifically document it (i.e. > Galaxy expect the SGE threads parameter to be "-pe threads=8" and nothing > else). Hrm, I didn't realize that SGE didn't have a standard resource name for this. It's probably something we can just add into the XML as "cores" in Galaxy == "threads" in my SGE install. Thanks for the heads up. > > -gordon ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] suggestion for multithreading
(moved to galaxy-dev) Nate Coraor wrote, On 06/02/2011 01:31 PM: > Peter Cock wrote: >> On Thu, Jun 2, 2011 at 6:23 PM, Nate Coraor wrote: >>> >>> pbs.py then knows to translate '8' to >>> '-l nodes=1:ppn=8'. >>> >>> Your tool can access that value a bunch, like $__resources__.cores. >>> >>> The same should be possible for other consumables. >>> Just a thought here: The actual parameters that are passed to the scheduler are not necessarily hard-coded. Meaning, at least with SGE, specifying the number of cores can be: qsub -pe threads=8 or qsub -pe cores=8 or qsub -pe jiffies=8 and same thing for memory limitation (e.g. "-l virtual_free=800M"). The reason is that those resources (e.g. "threads", "cores", "virtual_free") are just identifiers, and they are created and configured by whomever installed SGE - they are not built-in or hard-coded). So just be careful in your design/implementation when automatically translating XML resources to hard-coded parameters. If you do hard-code them, just make sure the specifically document it (i.e. Galaxy expect the SGE threads parameter to be "-pe threads=8" and nothing else). -gordon ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/