Re: [galaxy-dev] Working with files outside galaxy without copying?

2012-11-15 Thread Samuel Lampa
Many thanks for the links, exectly what I need! (somehow missed to look 
in the right place, obviously ...)


Best
// Samuel

On 11/14/2012 04:02 PM, Hans-Rudolf Hotz wrote:

Hi Samuel

Have you looked at Data Libraries ?
(http://wiki.galaxyproject.org/Admin/Data%20Libraries/Libraries )

also look at the configurations in universe_wsgi.ini:

library_import_dir
user_library_import_dir
allow_library_path_paste

which gives you many option to access data out side of the Galaxy 
database/ folder - without duplicating the data.


Regards, Hans-Rudolf


On 11/14/2012 03:23 PM, Samuel Lampa wrote:

Hi Jorrit, thanks for the input!

On 11/14/2012 03:12 PM, Jorrit Boekel wrote:
This may be implemented, I believe some people run Galaxy on the 
cluster,


Yes, but AFAIK, most cluster job runners use some kind of file staging
(copying) into local storage on the compute nodes OR, require you do
have the data uploaded in the galaxy file area, and to have this file
are availble NFS mounted on all compute nodes.

It would be very interesting if there is an existing solution for
running jobs on files without moving/copying them at all, though!

Best
// Samuel


Galaxy normally operates on files stored in the database/files folder.
If your Galaxy instance has access to the files you need on your
parallel fs, I guess you could start by writing a tool that creates
links to your files when given the path to the input file. Another
tool may then move the dataset into the desired place while creating a
link in the database folder.

Sounds like a hack to me, but may work.

cheers,
jorrit




On Wed, Nov 14, 2012 at 2:52 PM, Samuel Lampa samuel.la...@gmail.com
mailto:samuel.la...@gmail.com wrote:

Hi,

We are looking into ways to integrate galaxy into the workflows of
users at our cluster, with lots of NGS users running all and any
kind of analyses on their typically huge amounts of data. For this
we use a parallel file system, available on all compute nodes.

This file system, although approx 1PB in size, is constantly
filling up, and thus we are not very attracted by the idea of
copying files into/out of galaxy for each analysis.

Thus, we would be interested to know what are the options for
working with existing/external(to galaxy) file systems?

Eg. would it be possible to link files into some kind of galaxy
file system (I'm not totally clear about how galaxy stores it's
data, although I found out that stuff is created in
database/files), from outside?

 ... or is there any work going on for selecting any file system
path as input in galaxy workflows?

... or any other hints?

As said, I'm quite new to galaxy, trying to grok my head around
how we can use it, so all hints are welcome.

Best Regards
// Samuel


-- Developer at SNIC-UPPMAX www.uppmax.uu.se
http://www.uppmax.uu.se
Developer at Dept of Pharm Biosciences www.farmbio.uu.se
http://www.farmbio.uu.se
Twitter - twitter.com/samuellampa http://twitter.com/samuellampa
Blog - saml.rilspace.org http://saml.rilspace.org
G+ - gplus.to/saml http://gplus.to/saml

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/








--
Developer at SNIC-UPPMAX www.uppmax.uu.se
Developer at Dept of Pharm Biosciences www.farmbio.uu.se
Twitter - twitter.com/samuellampa
Blog - saml.rilspace.org
G+ - gplus.to/saml

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Working with files outside galaxy without copying?

2012-11-14 Thread Jorrit Boekel
This may be implemented, I believe some people run Galaxy on the cluster,
but if not:

Galaxy normally operates on files stored in the database/files folder. If
your Galaxy instance has access to the files you need on your parallel fs,
I guess you could start by writing a tool that creates links to your files
when given the path to the input file. Another tool may then move the
dataset into the desired place while creating a link in the database folder.

Sounds like a hack to me, but may work.

cheers,
jorrit




On Wed, Nov 14, 2012 at 2:52 PM, Samuel Lampa samuel.la...@gmail.comwrote:

 Hi,

 We are looking into ways to integrate galaxy into the workflows of users
 at our cluster, with lots of NGS users running all and any kind of analyses
 on their typically huge amounts of data. For this we use a parallel file
 system, available on all compute nodes.

 This file system, although approx 1PB in size, is constantly filling up,
 and thus we are not very attracted by the idea of copying files into/out of
 galaxy for each analysis.

 Thus, we would be interested to know what are the options for working with
 existing/external(to galaxy) file systems?

 Eg. would it be possible to link files into some kind of galaxy file
 system (I'm not totally clear about how galaxy stores it's data, although I
 found out that stuff is created in database/files), from outside?

  ... or is there any work going on for selecting any file system path as
 input in galaxy workflows?

 ... or any other hints?

 As said, I'm quite new to galaxy, trying to grok my head around how we can
 use it, so all hints are welcome.

 Best Regards
 // Samuel


 --
 Developer at SNIC-UPPMAX www.uppmax.uu.se
 Developer at Dept of Pharm Biosciences www.farmbio.uu.se
 Twitter - twitter.com/samuellampa
 Blog - saml.rilspace.org
 G+ - gplus.to/saml

 __**_
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Working with files outside galaxy without copying?

2012-11-14 Thread Samuel Lampa

Hi Jorrit, thanks for the input!

On 11/14/2012 03:12 PM, Jorrit Boekel wrote:
This may be implemented, I believe some people run Galaxy on the cluster, 


Yes, but AFAIK, most cluster job runners use some kind of file staging 
(copying) into local storage on the compute nodes OR, require you do 
have the data uploaded in the galaxy file area, and to have this file 
are availble NFS mounted on all compute nodes.


It would be very interesting if there is an existing solution for 
running jobs on files without moving/copying them at all, though!


Best
// Samuel

Galaxy normally operates on files stored in the database/files folder. 
If your Galaxy instance has access to the files you need on your 
parallel fs, I guess you could start by writing a tool that creates 
links to your files when given the path to the input file. Another 
tool may then move the dataset into the desired place while creating a 
link in the database folder.


Sounds like a hack to me, but may work.

cheers,
jorrit




On Wed, Nov 14, 2012 at 2:52 PM, Samuel Lampa samuel.la...@gmail.com 
mailto:samuel.la...@gmail.com wrote:


Hi,

We are looking into ways to integrate galaxy into the workflows of
users at our cluster, with lots of NGS users running all and any
kind of analyses on their typically huge amounts of data. For this
we use a parallel file system, available on all compute nodes.

This file system, although approx 1PB in size, is constantly
filling up, and thus we are not very attracted by the idea of
copying files into/out of galaxy for each analysis.

Thus, we would be interested to know what are the options for
working with existing/external(to galaxy) file systems?

Eg. would it be possible to link files into some kind of galaxy
file system (I'm not totally clear about how galaxy stores it's
data, although I found out that stuff is created in
database/files), from outside?

 ... or is there any work going on for selecting any file system
path as input in galaxy workflows?

... or any other hints?

As said, I'm quite new to galaxy, trying to grok my head around
how we can use it, so all hints are welcome.

Best Regards
// Samuel


-- 
Developer at SNIC-UPPMAX www.uppmax.uu.se http://www.uppmax.uu.se

Developer at Dept of Pharm Biosciences www.farmbio.uu.se
http://www.farmbio.uu.se
Twitter - twitter.com/samuellampa http://twitter.com/samuellampa
Blog - saml.rilspace.org http://saml.rilspace.org
G+ - gplus.to/saml http://gplus.to/saml

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/





--
Developer at SNIC-UPPMAX www.uppmax.uu.se
Developer at Dept of Pharm Biosciences www.farmbio.uu.se
Twitter - twitter.com/samuellampa
Blog - saml.rilspace.org
G+ - gplus.to/saml

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Working with files outside galaxy without copying?

2012-11-14 Thread Hans-Rudolf Hotz

Hi Samuel

Have you looked at Data Libraries ?
(http://wiki.galaxyproject.org/Admin/Data%20Libraries/Libraries )

also look at the configurations in universe_wsgi.ini:

library_import_dir
user_library_import_dir
allow_library_path_paste

which gives you many option to access data out side of the Galaxy 
database/ folder - without duplicating the data.


Regards, Hans-Rudolf


On 11/14/2012 03:23 PM, Samuel Lampa wrote:

Hi Jorrit, thanks for the input!

On 11/14/2012 03:12 PM, Jorrit Boekel wrote:

This may be implemented, I believe some people run Galaxy on the cluster,


Yes, but AFAIK, most cluster job runners use some kind of file staging
(copying) into local storage on the compute nodes OR, require you do
have the data uploaded in the galaxy file area, and to have this file
are availble NFS mounted on all compute nodes.

It would be very interesting if there is an existing solution for
running jobs on files without moving/copying them at all, though!

Best
// Samuel


Galaxy normally operates on files stored in the database/files folder.
If your Galaxy instance has access to the files you need on your
parallel fs, I guess you could start by writing a tool that creates
links to your files when given the path to the input file. Another
tool may then move the dataset into the desired place while creating a
link in the database folder.

Sounds like a hack to me, but may work.

cheers,
jorrit




On Wed, Nov 14, 2012 at 2:52 PM, Samuel Lampa samuel.la...@gmail.com
mailto:samuel.la...@gmail.com wrote:

Hi,

We are looking into ways to integrate galaxy into the workflows of
users at our cluster, with lots of NGS users running all and any
kind of analyses on their typically huge amounts of data. For this
we use a parallel file system, available on all compute nodes.

This file system, although approx 1PB in size, is constantly
filling up, and thus we are not very attracted by the idea of
copying files into/out of galaxy for each analysis.

Thus, we would be interested to know what are the options for
working with existing/external(to galaxy) file systems?

Eg. would it be possible to link files into some kind of galaxy
file system (I'm not totally clear about how galaxy stores it's
data, although I found out that stuff is created in
database/files), from outside?

 ... or is there any work going on for selecting any file system
path as input in galaxy workflows?

... or any other hints?

As said, I'm quite new to galaxy, trying to grok my head around
how we can use it, so all hints are welcome.

Best Regards
// Samuel


-- Developer at SNIC-UPPMAX www.uppmax.uu.se
http://www.uppmax.uu.se
Developer at Dept of Pharm Biosciences www.farmbio.uu.se
http://www.farmbio.uu.se
Twitter - twitter.com/samuellampa http://twitter.com/samuellampa
Blog - saml.rilspace.org http://saml.rilspace.org
G+ - gplus.to/saml http://gplus.to/saml

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/






___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/