Re: [galaxy-dev] Using $NSLOTS in tools to control thread number
Re: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-June/010153.html On Fri, Jun 15, 2012 at 4:52 PM, Peter Cock wrote: > On Fri, Jun 15, 2012 at 4:38 PM, James Taylor wrote: >> This is exactly what I think we should do (and have for a long time), but I >> think the variable should be something like: >> >> GALAXY_CPUS >> >> (threads is not accurate, a multithread or multiprocess job might want to use >> this info, something even more abstract than CPUS might make sense, but >> SLOTS has never made sense to me). > > I agree that a Galaxy specific name makes a lot of sense, and that > the SGE term "slots" is a bit odd. Using CPUS however is potentially > ambiguous with CPUs vs cores - my desktop has two quad core CPUs, > i.e. 2 CPUs but 8 cores. > > Where do you think this number should come from? A new entry in the > runner URL is simple albeit potentially redundant with cluster-specific > entries in the runner URL. As to the alternative (doing it automatically), > for PBS and SGE determining the number of cores from the cluster > configuration and/or parsing the cluster runner URL sounds doable - > what about the other backends? > > Peter Has the Galaxy team had any further thoughts on this topic? i.e. providing an environment variable or cheetah variable for the use of tool authors to set the number of threads/CPU cores to use. (With the value ideally coming from a default setting unless over-ridden via the [galaxy:tool_runners] entry in universe_wsgi.ini for that tool.) Thanks, Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using $NSLOTS in tools to control thread number
Hi, Just add some info about this, I have attempted to do so on SGE cluster. I found that $NSLOTS works only for MPI jobs as it's part of the MPI integration in SGE. Other non MPI jobs won't work. For example: python parallel_groomer.py input output $NSLOTS, $NSLOTS won't be replaced by the SGE with the specified number. Regards, Derrick On Sat, Jun 16, 2012 at 1:52 AM, Peter Cock wrote: > On Fri, Jun 15, 2012 at 4:38 PM, James Taylor > wrote: > > This is exactly what I think we should do (and have for a long time), > but I > > think the variable should be something like: > > > > GALAXY_CPUS > > > > (threads is not accurate, a multithread or multiprocess job might want > to use > > this info, something even more abstract than CPUS might make sense, but > > SLOTS has never made sense to me). > > I agree that a Galaxy specific name makes a lot of sense, and that > the SGE term "slots" is a bit odd. Using CPUS however is potentially > ambiguous with CPUs vs cores - my desktop has two quad core CPUs, > i.e. 2 CPUs but 8 cores. > > Where do you think this number should come from? A new entry in the > runner URL is simple albeit potentially redundant with cluster-specific > entries in the runner URL. As to the alternative (doing it automatically), > for PBS and SGE determining the number of cores from the cluster > configuration and/or parsing the cluster runner URL sounds doable - > what about the other backends? > > Peter > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ > ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using $NSLOTS in tools to control thread number
On Fri, Jun 15, 2012 at 4:38 PM, James Taylor wrote: > This is exactly what I think we should do (and have for a long time), but I > think the variable should be something like: > > GALAXY_CPUS > > (threads is not accurate, a multithread or multiprocess job might want to use > this info, something even more abstract than CPUS might make sense, but > SLOTS has never made sense to me). I agree that a Galaxy specific name makes a lot of sense, and that the SGE term "slots" is a bit odd. Using CPUS however is potentially ambiguous with CPUs vs cores - my desktop has two quad core CPUs, i.e. 2 CPUs but 8 cores. Where do you think this number should come from? A new entry in the runner URL is simple albeit potentially redundant with cluster-specific entries in the runner URL. As to the alternative (doing it automatically), for PBS and SGE determining the number of cores from the cluster configuration and/or parsing the cluster runner URL sounds doable - what about the other backends? Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using $NSLOTS in tools to control thread number
On Jun 15, 2012, at 11:27 AM, Peter Cock wrote: > On Fri, Jun 15, 2012 at 4:06 PM, Nate Coraor wrote: >> On Jun 15, 2012, at 9:05 AM, Peter Cock wrote: >> >>> On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock >>> wrote: Hello all, I'm wondering if it is sensible to make Galaxy tools automatically use the environment variable $NSLOTS to automatically adjust their number of threads? Using $NSLOTS works on SGE, but is it generally used on other clusters? The idea here is rather than hard coding the number of threads in a tool or its XML file, which may need to be altered for different local setups, and it can be specified in universe_wsgi.ini under [galaxy:tool_runners] >>> >>> Actually thinking about this over lunch, you wouldn't want to evaluate >>> the $NSLOTS variable when the XML is processed, as >>> that would be done on the server not the cluster node. In some cases >>> then embedding $NSLOTS in the command string (suitably escaped) >>> should work, otherwise doing it in a wrapper script seems best. >> >> Hi Peter, >> >> $NSLOTS is SGE-specific. > > That is a shame, it is working nicely for the tools I have tried it on - > You just put "\$NSLOTS" (with a slash to escape the dollar) in > the tag. > >> Torque uses a file whose path is set in >> $PBS_NODEFILE to list out the nodes you've been allocated (the >> node name is repeated for each slot you have on it). >> >> A couple of DRM-agnostic solutions: A common variable set by the >> job template before the tool runs. > > By that do you mean Galaxy could do some magic in the shell scripts > it generates and submits to the cluster? Yes, exactly. > i.e. Setup an environment variable, e.g. $THREADS. In the case of > Torque/PBS, it could parse the $PBS_NODEFILE which sounds nasty > - or can you get this from the PBS runner URL?. You could, but I think it'd be easier to read the $PBS_NODEFILE than attempt to parse PBS arguments. > In the case of SGE, > all the DRMAA wrapper needs to do is: > > export THREADS="$NSLOTS" > >> Or, the ability to set tool parameters from the runner URL in >> universe_wsgi.ini. > > Setting things via the runner URL in universe_wsgi.ini seems better, > especially as it could be used for "local" runners too. > > Peter > ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using $NSLOTS in tools to control thread number
On Fri, Jun 15, 2012 at 4:06 PM, Nate Coraor wrote: > On Jun 15, 2012, at 9:05 AM, Peter Cock wrote: > >> On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock >> wrote: >>> Hello all, >>> >>> I'm wondering if it is sensible to make Galaxy tools automatically use >>> the environment variable $NSLOTS to automatically adjust their >>> number of threads? >>> >>> Using $NSLOTS works on SGE, but is it generally used on other clusters? >>> >>> The idea here is rather than hard coding the number of threads in a tool >>> or its XML file, which may need to be altered for different local setups, >>> and >>> it can be specified in universe_wsgi.ini under [galaxy:tool_runners] >> >> Actually thinking about this over lunch, you wouldn't want to evaluate >> the $NSLOTS variable when the XML is processed, as >> that would be done on the server not the cluster node. In some cases >> then embedding $NSLOTS in the command string (suitably escaped) >> should work, otherwise doing it in a wrapper script seems best. > > Hi Peter, > > $NSLOTS is SGE-specific. That is a shame, it is working nicely for the tools I have tried it on - You just put "\$NSLOTS" (with a slash to escape the dollar) in the tag. > Torque uses a file whose path is set in > $PBS_NODEFILE to list out the nodes you've been allocated (the > node name is repeated for each slot you have on it). > > A couple of DRM-agnostic solutions: A common variable set by the > job template before the tool runs. By that do you mean Galaxy could do some magic in the shell scripts it generates and submits to the cluster? i.e. Setup an environment variable, e.g. $THREADS. In the case of Torque/PBS, it could parse the $PBS_NODEFILE which sounds nasty - or can you get this from the PBS runner URL?. In the case of SGE, all the DRMAA wrapper needs to do is: export THREADS="$NSLOTS" > Or, the ability to set tool parameters from the runner URL in > universe_wsgi.ini. Setting things via the runner URL in universe_wsgi.ini seems better, especially as it could be used for "local" runners too. Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using $NSLOTS in tools to control thread number
On Jun 15, 2012, at 9:05 AM, Peter Cock wrote: > On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock > wrote: >> Hello all, >> >> I'm wondering if it is sensible to make Galaxy tools automatically use >> the environment variable $NSLOTS to automatically adjust their >> number of threads? >> >> Using $NSLOTS works on SGE, but is it generally used on other clusters? >> >> The idea here is rather than hard coding the number of threads in a tool >> or its XML file, which may need to be altered for different local setups, and >> it can be specified in universe_wsgi.ini under [galaxy:tool_runners] > > Actually thinking about this over lunch, you wouldn't want to evaluate > the $NSLOTS variable when the XML is processed, as > that would be done on the server not the cluster node. In some cases > then embedding $NSLOTS in the command string (suitably escaped) > should work, otherwise doing it in a wrapper script seems best. Hi Peter, $NSLOTS is SGE-specific. Torque uses a file whose path is set in $PBS_NODEFILE to list out the nodes you've been allocated (the node name is repeated for each slot you have on it). A couple of DRM-agnostic solutions: A common variable set by the job template before the tool runs. Or, the ability to set tool parameters from the runner URL in universe_wsgi.ini. --nate > >> Would this work in principle on other cluster setups? i.e. Is $NSLOTS >> sufficiently general? > > Peter > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Using $NSLOTS in tools to control thread number
On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock wrote: > Hello all, > > I'm wondering if it is sensible to make Galaxy tools automatically use > the environment variable $NSLOTS to automatically adjust their > number of threads? > > Using $NSLOTS works on SGE, but is it generally used on other clusters? > > The idea here is rather than hard coding the number of threads in a tool > or its XML file, which may need to be altered for different local setups, and > it can be specified in universe_wsgi.ini under [galaxy:tool_runners] Actually thinking about this over lunch, you wouldn't want to evaluate the $NSLOTS variable when the XML is processed, as that would be done on the server not the cluster node. In some cases then embedding $NSLOTS in the command string (suitably escaped) should work, otherwise doing it in a wrapper script seems best. > Would this work in principle on other cluster setups? i.e. Is $NSLOTS > sufficiently general? Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/