Re: [galaxy-dev] Galaxy and object stores

2014-08-26 Thread Enis Afgan
Hi Inge,
There is an implementation for using the AWS S3 object store as the data
store for a given Galaxy instance. The implementation is located here
https://bitbucket.org/galaxy/galaxy-central/src/3a51eaf209f2502bf32dbb421ecabb7fe46243ea/lib/galaxy/objectstore/s3.py?at=default
and it offers several config options in universe_wsgi.ini.

The data stored in S3 is locally cached while it's being operated on but
always synced with the back end object store.

Pulsar seems to have some support for S3 but, as the docs say in the
implementation, it's explicitly beta:
https://github.com/galaxyproject/pulsar/blob/b32b7caafc6582a3a28e694e2dbb75e7a8f2bffc/galaxy/objectstore/pulsar.py

As a side note, there are some planned enhancements to how the object store
implementation is handled and there will hopefully be quite a bit of
activity on this topic in the near future (eg, https://trello.com/c/YynQKq8m
).

Hope this at least clarifies the state of object store support,
Enis


On Mon, Aug 25, 2014 at 10:24 AM, Raknes Inge Alexander 
inge.a.rak...@uit.no wrote:

   ​I have a few questions about object stores in Galaxy:

  1: Can all Galaxy data sets be stored in an object store?
  2: If so,  does Galaxy still need to maintain a local copy of the data?
  3: Is LWR or Pulsar able to get the data directly from the object store,
 or does it still have to go through Galaxy?

  We are planning to let users of our Galaxy installation handle large
 input/output files (~30G) and we expect that the VM containing our Galaxy
 installation will become a bottleneck if all data needs to travel
 through that node.

  - Inge Alexander Raknes

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Galaxy and object stores

2014-08-26 Thread John Chilton
Thanks Enis, just to elaborate on Pulsar - I suspect it would work
with something like configuring Galaxy with S3 object store right now
- but it would do so by having Galaxy cache the data locally and then
Pulsar would negotiate the transfer with Galaxy (many different ways
this could occur depending on who things are mounted). Ideally - it
wouldn't happen this way though - I would love it if Galaxy could
determine the job is going to be run remotely and not attempt the
cache and then configure the remote Pulsar to cache the file directly
from the object store abstraction. In addition to eliminating the
extra cache and transfer, it could allow Pulsar and Galaxy to have
different views of the underlying data sources (e.g. here the data is
mounted as X and there the data is mounted as Y - or here the data is
directly available and there get it via IRODS, etc...).

There are some ... initial grasps... at this sort of thing in Pulsar
and Galaxy but it is not fully (or even substantially) implemented
currently.

-John

On Tue, Aug 26, 2014 at 11:18 AM, Enis Afgan afg...@gmail.com wrote:
 Hi Inge,
 There is an implementation for using the AWS S3 object store as the data
 store for a given Galaxy instance. The implementation is located here
 https://bitbucket.org/galaxy/galaxy-central/src/3a51eaf209f2502bf32dbb421ecabb7fe46243ea/lib/galaxy/objectstore/s3.py?at=default
 and it offers several config options in universe_wsgi.ini.

 The data stored in S3 is locally cached while it's being operated on but
 always synced with the back end object store.

 Pulsar seems to have some support for S3 but, as the docs say in the
 implementation, it's explicitly beta:
 https://github.com/galaxyproject/pulsar/blob/b32b7caafc6582a3a28e694e2dbb75e7a8f2bffc/galaxy/objectstore/pulsar.py

 As a side note, there are some planned enhancements to how the object store
 implementation is handled and there will hopefully be quite a bit of
 activity on this topic in the near future (eg,
 https://trello.com/c/YynQKq8m).

 Hope this at least clarifies the state of object store support,
 Enis


 On Mon, Aug 25, 2014 at 10:24 AM, Raknes Inge Alexander
 inge.a.rak...@uit.no wrote:

 I have a few questions about object stores in Galaxy:

 1: Can all Galaxy data sets be stored in an object store?
 2: If so,  does Galaxy still need to maintain a local copy of the data?
 3: Is LWR or Pulsar able to get the data directly from the object store,
 or does it still have to go through Galaxy?

 We are planning to let users of our Galaxy installation handle large
 input/output files (~30G) and we expect that the VM containing our Galaxy
 installation will become a bottleneck if all data needs to travel through
 that node.

 - Inge Alexander Raknes

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] Galaxy and object stores

2014-08-25 Thread Raknes Inge Alexander
?I have a few questions about object stores in Galaxy:


1: Can all Galaxy data sets be stored in an object store?
2: If so,  does Galaxy still need to maintain a local copy of the data?
3: Is LWR or Pulsar able to get the data directly from the object store, or 
does it still have to go through Galaxy?

We are planning to let users of our Galaxy installation handle large 
input/output files (~30G) and we expect that the VM containing our Galaxy 
installation will become a bottleneck if all data needs to travel through that 
node.

- Inge Alexander Raknes
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/