Re: [galaxy-dev] Using $NSLOTS in tools to control thread number

2012-10-19 Thread Peter Cock
Re: http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-June/010153.html

On Fri, Jun 15, 2012 at 4:52 PM, Peter Cock  wrote:
> On Fri, Jun 15, 2012 at 4:38 PM, James Taylor  wrote:
>> This is exactly what I think we should do (and have for a long time), but I
>> think the variable should be something like:
>>
>> GALAXY_CPUS
>>
>> (threads is not accurate, a multithread or multiprocess job might want to use
>> this info, something even more abstract than CPUS might make sense, but
>> SLOTS has never made sense to me).
>
> I agree that a Galaxy specific name makes a lot of sense, and that
> the SGE term "slots" is a bit odd. Using CPUS however is potentially
> ambiguous with CPUs vs cores - my desktop has two quad core CPUs,
> i.e. 2 CPUs but 8 cores.
>
> Where do you think this number should come from? A new entry in the
> runner URL is simple albeit potentially redundant with cluster-specific
> entries in the runner URL. As to the alternative (doing it automatically),
> for PBS and SGE determining the number of cores from the cluster
> configuration and/or parsing the cluster runner URL sounds doable -
> what about the other backends?
>
> Peter

Has the Galaxy team had any further thoughts on this topic? i.e.
providing an environment variable or cheetah variable for the use
of tool authors to set the number of threads/CPU cores to use.

(With the value ideally coming from a default setting unless
over-ridden via the [galaxy:tool_runners] entry in universe_wsgi.ini
for that tool.)

Thanks,

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Using $NSLOTS in tools to control thread number

2012-07-16 Thread Derrick Lin
Hi,

Just add some info about this, I have attempted to do so on SGE cluster.

I found that $NSLOTS works only for MPI jobs as it's part of the MPI
integration in SGE.

Other non MPI jobs won't work. For example:

python parallel_groomer.py input output $NSLOTS, $NSLOTS won't be replaced
by the SGE with the specified number.

Regards,
Derrick

On Sat, Jun 16, 2012 at 1:52 AM, Peter Cock wrote:

> On Fri, Jun 15, 2012 at 4:38 PM, James Taylor 
> wrote:
> > This is exactly what I think we should do (and have for a long time),
> but I
> > think the variable should be something like:
> >
> > GALAXY_CPUS
> >
> > (threads is not accurate, a multithread or multiprocess job might want
> to use
> > this info, something even more abstract than CPUS might make sense, but
> > SLOTS has never made sense to me).
>
> I agree that a Galaxy specific name makes a lot of sense, and that
> the SGE term "slots" is a bit odd. Using CPUS however is potentially
> ambiguous with CPUs vs cores - my desktop has two quad core CPUs,
> i.e. 2 CPUs but 8 cores.
>
> Where do you think this number should come from? A new entry in the
> runner URL is simple albeit potentially redundant with cluster-specific
> entries in the runner URL. As to the alternative (doing it automatically),
> for PBS and SGE determining the number of cores from the cluster
> configuration and/or parsing the cluster runner URL sounds doable -
> what about the other backends?
>
> Peter
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Using $NSLOTS in tools to control thread number

2012-06-15 Thread Peter Cock
On Fri, Jun 15, 2012 at 4:38 PM, James Taylor  wrote:
> This is exactly what I think we should do (and have for a long time), but I
> think the variable should be something like:
>
> GALAXY_CPUS
>
> (threads is not accurate, a multithread or multiprocess job might want to use
> this info, something even more abstract than CPUS might make sense, but
> SLOTS has never made sense to me).

I agree that a Galaxy specific name makes a lot of sense, and that
the SGE term "slots" is a bit odd. Using CPUS however is potentially
ambiguous with CPUs vs cores - my desktop has two quad core CPUs,
i.e. 2 CPUs but 8 cores.

Where do you think this number should come from? A new entry in the
runner URL is simple albeit potentially redundant with cluster-specific
entries in the runner URL. As to the alternative (doing it automatically),
for PBS and SGE determining the number of cores from the cluster
configuration and/or parsing the cluster runner URL sounds doable -
what about the other backends?

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Using $NSLOTS in tools to control thread number

2012-06-15 Thread Nate Coraor
On Jun 15, 2012, at 11:27 AM, Peter Cock wrote:

> On Fri, Jun 15, 2012 at 4:06 PM, Nate Coraor  wrote:
>> On Jun 15, 2012, at 9:05 AM, Peter Cock wrote:
>> 
>>> On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock  
>>> wrote:
 Hello all,
 
 I'm wondering if it is sensible to make Galaxy tools automatically use
 the environment variable $NSLOTS to automatically adjust their
 number of threads?
 
 Using $NSLOTS works on SGE, but is it generally used on other clusters?
 
 The idea here is rather than hard coding the number of threads in a tool
 or its XML file, which may need to be altered for different local setups, 
 and
 it can be specified in universe_wsgi.ini under [galaxy:tool_runners]
>>> 
>>> Actually thinking about this over lunch, you wouldn't want to evaluate
>>> the $NSLOTS variable when the XML  is processed, as
>>> that would be done on the server not the cluster node. In some cases
>>> then embedding $NSLOTS in the command string (suitably escaped)
>>> should work, otherwise doing it in a wrapper script seems best.
>> 
>> Hi Peter,
>> 
>> $NSLOTS is SGE-specific.
> 
> That is a shame, it is working nicely for the tools I have tried it on -
> You just put "\$NSLOTS" (with a slash to escape the dollar) in
> the  tag.
> 
>>  Torque uses a file whose path is set in
>> $PBS_NODEFILE to list out the nodes you've been allocated (the
>> node name is repeated for each slot you have on it).
>> 
>> A couple of DRM-agnostic solutions: A common variable set by the
>> job template before the tool runs.
> 
> By that do you mean Galaxy could do some magic in the shell scripts
> it generates and submits to the cluster?

Yes, exactly.

> i.e. Setup an environment variable, e.g. $THREADS. In the case of
> Torque/PBS, it could parse the $PBS_NODEFILE which sounds nasty
> - or can you get this from the PBS runner URL?.

You could, but I think it'd be easier to read the $PBS_NODEFILE than attempt to 
parse PBS arguments.

> In the case of SGE,
> all the DRMAA wrapper needs to do is:
> 
> export THREADS="$NSLOTS"
> 
>>  Or, the ability to set tool  parameters from the runner URL in
>> universe_wsgi.ini.
> 
> Setting things via the runner URL in universe_wsgi.ini seems better,
> especially as it could be used for "local" runners too.
> 
> Peter
> 


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Using $NSLOTS in tools to control thread number

2012-06-15 Thread Peter Cock
On Fri, Jun 15, 2012 at 4:06 PM, Nate Coraor  wrote:
> On Jun 15, 2012, at 9:05 AM, Peter Cock wrote:
>
>> On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock  
>> wrote:
>>> Hello all,
>>>
>>> I'm wondering if it is sensible to make Galaxy tools automatically use
>>> the environment variable $NSLOTS to automatically adjust their
>>> number of threads?
>>>
>>> Using $NSLOTS works on SGE, but is it generally used on other clusters?
>>>
>>> The idea here is rather than hard coding the number of threads in a tool
>>> or its XML file, which may need to be altered for different local setups, 
>>> and
>>> it can be specified in universe_wsgi.ini under [galaxy:tool_runners]
>>
>> Actually thinking about this over lunch, you wouldn't want to evaluate
>> the $NSLOTS variable when the XML  is processed, as
>> that would be done on the server not the cluster node. In some cases
>> then embedding $NSLOTS in the command string (suitably escaped)
>> should work, otherwise doing it in a wrapper script seems best.
>
> Hi Peter,
>
> $NSLOTS is SGE-specific.

That is a shame, it is working nicely for the tools I have tried it on -
You just put "\$NSLOTS" (with a slash to escape the dollar) in
the  tag.

> Torque uses a file whose path is set in
> $PBS_NODEFILE to list out the nodes you've been allocated (the
> node name is repeated for each slot you have on it).
>
> A couple of DRM-agnostic solutions: A common variable set by the
> job template before the tool runs.

By that do you mean Galaxy could do some magic in the shell scripts
it generates and submits to the cluster?

i.e. Setup an environment variable, e.g. $THREADS. In the case of
Torque/PBS, it could parse the $PBS_NODEFILE which sounds nasty
- or can you get this from the PBS runner URL?. In the case of SGE,
all the DRMAA wrapper needs to do is:

export THREADS="$NSLOTS"

> Or, the ability to set tool  parameters from the runner URL in
> universe_wsgi.ini.

Setting things via the runner URL in universe_wsgi.ini seems better,
especially as it could be used for "local" runners too.

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Using $NSLOTS in tools to control thread number

2012-06-15 Thread Nate Coraor
On Jun 15, 2012, at 9:05 AM, Peter Cock wrote:

> On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock  
> wrote:
>> Hello all,
>> 
>> I'm wondering if it is sensible to make Galaxy tools automatically use
>> the environment variable $NSLOTS to automatically adjust their
>> number of threads?
>> 
>> Using $NSLOTS works on SGE, but is it generally used on other clusters?
>> 
>> The idea here is rather than hard coding the number of threads in a tool
>> or its XML file, which may need to be altered for different local setups, and
>> it can be specified in universe_wsgi.ini under [galaxy:tool_runners]
> 
> Actually thinking about this over lunch, you wouldn't want to evaluate
> the $NSLOTS variable when the XML  is processed, as
> that would be done on the server not the cluster node. In some cases
> then embedding $NSLOTS in the command string (suitably escaped)
> should work, otherwise doing it in a wrapper script seems best.

Hi Peter,

$NSLOTS is SGE-specific.  Torque uses a file whose path is set in $PBS_NODEFILE 
to list out the nodes you've been allocated (the node name is repeated for each 
slot you have on it).

A couple of DRM-agnostic solutions: A common variable set by the job template 
before the tool runs.  Or, the ability to set tool parameters from the runner 
URL in universe_wsgi.ini.

--nate

> 
>> Would this work in principle on other cluster setups? i.e. Is $NSLOTS
>> sufficiently general?
> 
> Peter
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Using $NSLOTS in tools to control thread number

2012-06-15 Thread Peter Cock
On Fri, Jun 15, 2012 at 12:17 PM, Peter Cock  wrote:
> Hello all,
>
> I'm wondering if it is sensible to make Galaxy tools automatically use
> the environment variable $NSLOTS to automatically adjust their
> number of threads?
>
> Using $NSLOTS works on SGE, but is it generally used on other clusters?
>
> The idea here is rather than hard coding the number of threads in a tool
> or its XML file, which may need to be altered for different local setups, and
> it can be specified in universe_wsgi.ini under [galaxy:tool_runners]

Actually thinking about this over lunch, you wouldn't want to evaluate
the $NSLOTS variable when the XML  is processed, as
that would be done on the server not the cluster node. In some cases
then embedding $NSLOTS in the command string (suitably escaped)
should work, otherwise doing it in a wrapper script seems best.

> Would this work in principle on other cluster setups? i.e. Is $NSLOTS
> sufficiently general?

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/