Re: [galaxy-dev] Cluster setup - shared temporary directory

2011-08-19 Thread Nate Coraor
Peter Cock wrote:
> On Wed, Jul 27, 2011 at 9:06 AM, Peter Cock  wrote:
> > On Tue, Jul 26, 2011 at 8:16 PM, Luobin Yang  wrote:
> >> I benchmarked MrBayes 3.1.2 program on my cluster for two cases:
> >> 1. use local /tmp for temporary files
> >> 2. use the network shared /home/galaxy/galaxy-dist/database/tmp
> >> MrBayes is about 10 times slower for case 2 than for case 1.  What I did 
> >> was
> >> to set the network shared folder as the default but in the MrBayes wrapper,
> >> I changed the environment variable TEMP to be a local folder.
> >> Luobin
> >
> > Does that mean Galaxy will configure the TEMP environment variable for
> > tools to point at the universe_wsgi.ini new_file_path setting? In your case,
> > /home/galaxy/galaxy-dist/database/tmp
> >
> > This is the kind of thing I think should be documented somewhere for
> > tool authors ...
> 
> ... and mentioned in the universe_wsgi.ini text for the new_file_path setting.
> 
> I've just seen Shantanu Pavgi's thread which also covers this issue, and
> the related TMP and TMPDIR environment variables set up or overridden
> by SGE. I think we agree that Galaxy needs some documentation and
> guidance in this area for tool authors.

Galaxy doesn't modify $TEMP, but in the past, setting $TEMP would cause
Galaxy to use that directory for all files created by Python's tempfile
module.  Since there was confusion over the difference between $TEMP
(which was documented as the way to control the creation of temp files)
and new_file_path, Galaxy was changed to put all temp files in the
location of new_file_path, ignoring $TEMP.

However, this only applies to the framework.  Tools run in a seperate
process and are not forced to use new_file_path.  If you want your tool
to use it, you could pass the value of new_file_path to your tool.

Otherwise, whatever rules are used by whatever you're using to create
temporary files will be what determines where temp files go; in Python,
it's this:

http://docs.python.org/library/tempfile.html#tempfile.tempdir

Our cluster nodes have the majority of their local disk partitioned as
/space, so we set TEMP=/space in the environment used on the nodes.

--nate

> 
> Peter
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/
> 
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Cluster setup - shared temporary directory

2011-07-27 Thread Peter Cock
On Wed, Jul 27, 2011 at 9:06 AM, Peter Cock  wrote:
> On Tue, Jul 26, 2011 at 8:16 PM, Luobin Yang  wrote:
>> I benchmarked MrBayes 3.1.2 program on my cluster for two cases:
>> 1. use local /tmp for temporary files
>> 2. use the network shared /home/galaxy/galaxy-dist/database/tmp
>> MrBayes is about 10 times slower for case 2 than for case 1.  What I did was
>> to set the network shared folder as the default but in the MrBayes wrapper,
>> I changed the environment variable TEMP to be a local folder.
>> Luobin
>
> Does that mean Galaxy will configure the TEMP environment variable for
> tools to point at the universe_wsgi.ini new_file_path setting? In your case,
> /home/galaxy/galaxy-dist/database/tmp
>
> This is the kind of thing I think should be documented somewhere for
> tool authors ...

... and mentioned in the universe_wsgi.ini text for the new_file_path setting.

I've just seen Shantanu Pavgi's thread which also covers this issue, and
the related TMP and TMPDIR environment variables set up or overridden
by SGE. I think we agree that Galaxy needs some documentation and
guidance in this area for tool authors.

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Cluster setup - shared temporary directory

2011-07-27 Thread Peter Cock
On Tue, Jul 26, 2011 at 8:16 PM, Luobin Yang  wrote:
> I benchmarked MrBayes 3.1.2 program on my cluster for two cases:
> 1. use local /tmp for temporary files
> 2. use the network shared /home/galaxy/galaxy-dist/database/tmp
> MrBayes is about 10 times slower for case 2 than for case 1.  What I did was
> to set the network shared folder as the default but in the MrBayes wrapper,
> I changed the environment variable TEMP to be a local folder.
> Luobin

Does that mean Galaxy will configure the TEMP environment variable for
tools to point at the universe_wsgi.ini new_file_path setting? In your case,
/home/galaxy/galaxy-dist/database/tmp

This is the kind of thing I think should be documented somewhere for
tool authors.

Thanks

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Cluster setup - shared temporary directory

2011-07-26 Thread Luobin Yang
I benchmarked MrBayes 3.1.2 program on my cluster for two cases:

1. use local /tmp for temporary files
2. use the network shared /home/galaxy/galaxy-dist/database/tmp

MrBayes is about 10 times slower for case 2 than for case 1.  What I did was
to set the network shared folder as the default but in the MrBayes wrapper,
I changed the environment variable TEMP to be a local folder.

Luobin

On Tue, Jul 26, 2011 at 10:39 AM, Peter Cock wrote:

> On Tue, Jul 26, 2011 at 5:16 PM, Duddy, John  wrote:
> > I can give you a very good example - if you are doing alignment and for
> some
> > reason need to convert the input file before operating on them, such that
> you
> > need a complete copy, /tmp may not have enough room. I have had this
> happen
> > to me running lots of instances of an aligner, temporarily using 100G+ of
> temp space.
> >
> > I don't see the need to have a "shared" temp space, but I do see the need
> to
> >  be able to tell the tools where you want them to put temp files.
>
> So in your setup, the cluster nodes are not likely to have 100G+ on /tmp
> (i.e.
> the local hard drive of the node), so you want them to use a temp folder on
> the
> cluster shared storage?
>
> I think needs will differ between tools - in some cases you really want
> a fast local drive for temp files, and putting them on a network drive
> will just kill performance. Using /tmp seems a safe default.
>
> Is there any guidance for tool authors on where to put temp files, and
> how to access any related Galaxy settings? There is nothing currently
> listed here: http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax
>
> Peter
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Cluster setup - shared temporary directory

2011-07-26 Thread Peter Cock
On Tue, Jul 26, 2011 at 5:16 PM, Duddy, John  wrote:
> I can give you a very good example - if you are doing alignment and for some
> reason need to convert the input file before operating on them, such that you
> need a complete copy, /tmp may not have enough room. I have had this happen
> to me running lots of instances of an aligner, temporarily using 100G+ of 
> temp space.
>
> I don't see the need to have a "shared" temp space, but I do see the need to
>  be able to tell the tools where you want them to put temp files.

So in your setup, the cluster nodes are not likely to have 100G+ on /tmp (i.e.
the local hard drive of the node), so you want them to use a temp folder on the
cluster shared storage?

I think needs will differ between tools - in some cases you really want
a fast local drive for temp files, and putting them on a network drive
will just kill performance. Using /tmp seems a safe default.

Is there any guidance for tool authors on where to put temp files, and
how to access any related Galaxy settings? There is nothing currently
listed here: http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Cluster setup - shared temporary directory

2011-07-26 Thread Duddy, John
I can give you a very good example - if you are doing alignment and for some 
reason need to convert the input file before operating on them, such that you 
need a complete copy, /tmp may not have enough room. I have had this happen to 
me running lots of instances of an aligner, temporarily using 100G+ of temp 
space.

I don't see the need to have a "shared" temp space, but I do see the need to be 
able to tell the tools where you want them to put temp files. 

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Peter Cock
Sent: Tuesday, July 26, 2011 8:10 AM
To: Galaxy Dev
Subject: [galaxy-dev] Cluster setup - shared temporary directory

Hi all,

I'm reading http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Cluster

Could someone expand a little on this section please:

> Create a shared temporary directory
>
> Some tools make use of temporary files created on the server,
> but accessed on the nodes. For this, you'll need to make a
> directory (galaxy_dist/database/tmp by default) ...

I presume this is talking about the universe_wsgi.ini setting
new_file_path = database/tmp (if so, could that be explicit)?

I would like to know more about this from the tool author point
of view. Could you at least give one example of a tool that uses
this temporary folder? As a tool author I am unclear what the
purpose is (and it would be a shock if I accidentally use this
mapped folder instead of the local temp drive of a node).

Thanks,

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Cluster setup - shared temporary directory

2011-07-26 Thread Peter Cock
Hi all,

I'm reading http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Cluster

Could someone expand a little on this section please:

> Create a shared temporary directory
>
> Some tools make use of temporary files created on the server,
> but accessed on the nodes. For this, you'll need to make a
> directory (galaxy_dist/database/tmp by default) ...

I presume this is talking about the universe_wsgi.ini setting
new_file_path = database/tmp (if so, could that be explicit)?

I would like to know more about this from the tool author point
of view. Could you at least give one example of a tool that uses
this temporary folder? As a tool author I am unclear what the
purpose is (and it would be a shock if I accidentally use this
mapped folder instead of the local temp drive of a node).

Thanks,

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/