Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2014-06-17 Thread Kyle Ellrott
Glad to see someone else is playing around with Mesos.
I have a mesos branch that is getting a little long in the tooth. I'd like
to get a straight job runner (non-LWR, with a shared file system) running
under mesos for Galaxy before I submit that work for a pull request.

The hackathon is only 12 days away! Hopefully we'll be able to make some
progress on these sorts of projects.

Kyle



On Sun, Jun 15, 2014 at 4:06 PM, John Chilton jmchil...@gmail.com wrote:

 Hey Kyle, all,

   If anyone wants to play with running Galaxy jobs within an Apache
 Mesos environment I have added a prototype of this feature to the LWR.


 https://bitbucket.org/jmchilton/lwr/commits/555438d2fe266899338474b25c540fef42bcece7

 https://bitbucket.org/jmchilton/lwr/commits/9748b3035dbe3802d4136a6a1028df8395a9aeb3

 This work distributes jobs across a Mesos cluster and injects a
 MESOS_URL environment variable into the job runtime environment in
 case the jobs themselves want to take advantage of Mesos.

 The advantage of the LWR versus a traditional Galaxy runner is that
 the job can be staged to remote resources without shared disk. Prior
 to this I was imaging the LWR to be useful in cases where Galaxy and
 remote cluster don't share common disk but where there is in fact a
 shared scratch directory or something across the remote cluster as
 well a resource manager. The LWR Mesos framework however has the
 actual compute servers themselves stage the job up and down - so you
 could imagine distributing Galaxy across large clusters without any
 shared disk whatsoever - that could be very cool and help scale say
 cloud applications.

 Downsides of an LWR-based approach versus a Galaxy approach is that it
 is less mature and there is more stuff to configure - need to
 configure a Galaxy job_conf plugin and destination, need to configure
 the LWR itself, need to configure a message queue (for this variant of
 LWR operation anyway - it should be possible to drive this via the LWR
 in web server mode but I haven't added it yet). I would be more than
 happy to continue to see progress toward Mesos support in Galaxy
 proper.

 It is strictly a prototype so far - a sort of playground if anyone
 wants to play with these ideas and build something cool. It really is
 a framework right - not so much a job scheduler so I am not sure it
 is very immediately useful - but I imagine one could build cool stuff
 on top of it.

 Next, I think I would like to add Apache Aurora
 (http://aurora.incubator.apache.org/) support - because it seems like
 a much more traditional resource manager but built on top of Mesos so
 it would be more practical for traditional Galaxy-style jobs. Doesn't
 buy you anything in terms of parallelization but it would fit better
 with Galaxy.

 -John


 On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu
 wrote:
  I think one of the aspects where Galaxy is a bit soft is the ability to
 do
  distributed tasks. The current system of split/replicate/merge tasks
 based
  on file type is a bit limited and hard for tool developers to expand
 upon.
  Distributed computing is a non-trival thing to implement and I think it
  would be a better use of our time to use an already existing framework.
 And
  it would also mean one less API for tool writers to have to develop for.
  I was wondering if anybody has looked at Mesos (
 http://mesos.apache.org/ ).
  You can see an overview of the Mesos architecture at
  https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
  The important thing about Mesos is that it provides an API for C/C++,
  Java/Scala and Python to write distributed frameworks. There are already
  implementations of frameworks for common parallel programming systems
 such
  as:
   - Hadoop (https://github.com/mesos/hadoop)
   - MPI
  (
 https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
 )
   - Spark (http://spark-project.org)
  And you can find example Python framework at
  https://github.com/apache/mesos/tree/master/src/examples/python
 
  Integration with Galaxy would have three parts:
  1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
  passed to tool wrappers and allows them to contact the local mesos
  infrastructure (assuming the system has been configured) or pass a null
 if
  the system isn't available.
  2) Write a tool runner that works as a mesos framework to executes single
  cpu jobs on the distributed system.
  3) For instances where mesos is not available at a system wide level (say
  they only have access to an SGE based cluster), but the user wants to run
  distributed jobs, write a wrapper that can create a mesos cluster using
 the
  existing queueing system. For example, right now I run a Mesos system
 under
  the SGE queue system.
 
  I'm curious to see what other people think.
 
  Kyle
 
  ___
  Please keep all replies on the list by using reply all
  in your 

Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2014-06-15 Thread John Chilton
Hey Kyle, all,

  If anyone wants to play with running Galaxy jobs within an Apache
Mesos environment I have added a prototype of this feature to the LWR.

https://bitbucket.org/jmchilton/lwr/commits/555438d2fe266899338474b25c540fef42bcece7
https://bitbucket.org/jmchilton/lwr/commits/9748b3035dbe3802d4136a6a1028df8395a9aeb3

This work distributes jobs across a Mesos cluster and injects a
MESOS_URL environment variable into the job runtime environment in
case the jobs themselves want to take advantage of Mesos.

The advantage of the LWR versus a traditional Galaxy runner is that
the job can be staged to remote resources without shared disk. Prior
to this I was imaging the LWR to be useful in cases where Galaxy and
remote cluster don't share common disk but where there is in fact a
shared scratch directory or something across the remote cluster as
well a resource manager. The LWR Mesos framework however has the
actual compute servers themselves stage the job up and down - so you
could imagine distributing Galaxy across large clusters without any
shared disk whatsoever - that could be very cool and help scale say
cloud applications.

Downsides of an LWR-based approach versus a Galaxy approach is that it
is less mature and there is more stuff to configure - need to
configure a Galaxy job_conf plugin and destination, need to configure
the LWR itself, need to configure a message queue (for this variant of
LWR operation anyway - it should be possible to drive this via the LWR
in web server mode but I haven't added it yet). I would be more than
happy to continue to see progress toward Mesos support in Galaxy
proper.

It is strictly a prototype so far - a sort of playground if anyone
wants to play with these ideas and build something cool. It really is
a framework right - not so much a job scheduler so I am not sure it
is very immediately useful - but I imagine one could build cool stuff
on top of it.

Next, I think I would like to add Apache Aurora
(http://aurora.incubator.apache.org/) support - because it seems like
a much more traditional resource manager but built on top of Mesos so
it would be more practical for traditional Galaxy-style jobs. Doesn't
buy you anything in terms of parallelization but it would fit better
with Galaxy.

-John


On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:
 I think one of the aspects where Galaxy is a bit soft is the ability to do
 distributed tasks. The current system of split/replicate/merge tasks based
 on file type is a bit limited and hard for tool developers to expand upon.
 Distributed computing is a non-trival thing to implement and I think it
 would be a better use of our time to use an already existing framework. And
 it would also mean one less API for tool writers to have to develop for.
 I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ).
 You can see an overview of the Mesos architecture at
 https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
 The important thing about Mesos is that it provides an API for C/C++,
 Java/Scala and Python to write distributed frameworks. There are already
 implementations of frameworks for common parallel programming systems such
 as:
  - Hadoop (https://github.com/mesos/hadoop)
  - MPI
 (https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md)
  - Spark (http://spark-project.org)
 And you can find example Python framework at
 https://github.com/apache/mesos/tree/master/src/examples/python

 Integration with Galaxy would have three parts:
 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
 passed to tool wrappers and allows them to contact the local mesos
 infrastructure (assuming the system has been configured) or pass a null if
 the system isn't available.
 2) Write a tool runner that works as a mesos framework to executes single
 cpu jobs on the distributed system.
 3) For instances where mesos is not available at a system wide level (say
 they only have access to an SGE based cluster), but the user wants to run
 distributed jobs, write a wrapper that can create a mesos cluster using the
 existing queueing system. For example, right now I run a Mesos system under
 the SGE queue system.

 I'm curious to see what other people think.

 Kyle

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  

Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2013-10-29 Thread Ketan Maheshwari
Hi Kyle,

Swift indeed is a complete framework for distributed computing.
Distributing files out to cluster nodes, starting processes, bringing back
result files to submit host is done out of the box (stagein-exec-stageout
cycle).

We can discuss offline if you are interested in giving it a shot.

Best,
Ketan


On Mon, Oct 28, 2013 at 4:14 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:

 You probably are a good person to get an opinion from. My plan isn't to
 write new frameworks, but rather use existing libraries that can
 communicate with Mesos to setup their parallel environments.
 But for Swift, you would probably want to write a new framework. Just
 looking at Swift, I imagine one of the harder parts is just getting the
 system setup on a cluster (ie distributing out files to remote nodes,
 making sure that you have a way to start processes on those nodes and have
 them know where to find the master), it seems like Swift could benefit from
 having a Mesos based framework. Do you think it would enable you to have a
 'zero-config' startup of a distributed Swift application?

 Kyle



 On Mon, Oct 28, 2013 at 1:51 PM, Ketan Maheshwari 
 ketancmaheshw...@gmail.com wrote:

 Hi Kyle,

 We have a similar ongoing development wherein we are working on
 integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal
 is to enable Galaxy based applications to run on a variety of distributed
 resources via various integration schemes as suitable to application and
 underlying execution environment.

 Here is an abstract of a paper (co-authored with Ravi, who responded on
 this thread) we will be presenting in a workshop at the upcoming SC 13
 conference:

 The Galaxy platform is a web-based science portal for scientific
 computing supporting Life Sciences users community. While user-friendly and
 intuitive for doing small to medium scale computations, it currently has a
 limited support for large-scale, parallel and distributed computing. The
 Swift parallel scripting framework is capable of composing ordinary
 applications into parallel scripts that can be run on multi-scale
 distributed and performance computing platforms. In complex distributed
 environments, often the user end of application lifecycle slows down
 because of the technical complexities brought in by the scale, access
 methods and resource management nuances. Galaxy offers a simple way of
 designing, composing, executing, reusing, and reproducing application runs.
 An integration between Swift and Galaxy systems can accelerate science as
 well as bring the respective user communities together in an interactive,
 user-friendly, parallel and distributed data analysis environment enabled
 on a broad range of computational infrastructures.

 Kindly let us know if you need a hands on for the various tools we have
 already developed.


 Best,
 Ketan



 On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott kellr...@soe.ucsc.eduwrote:

 I don't think implementation will be very difficult. The bigger question
 is this a technology people are open to?
 The nearest competitor is YARN (
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
 Mesos seems a bit more geared toward general purpose usage (with several
 existing frameworks), while YARN seems more specific to Hadoop. But I'd be
 glad to hear some other thoughts.

 Kyle


 On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri madd...@mcs.anl.govwrote:

 Kyle
 This is something I am very interested in. The three parts below make
 sense to me. I would be very happy to discuss further and provide any help
 to move this forward.

 Regards
 On Oct 26, 2013, at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu
 wrote:

 I think one of the aspects where Galaxy is a bit soft is the ability to
 do distributed tasks. The current system of split/replicate/merge tasks
 based on file type is a bit limited and hard for tool developers to expand
 upon. Distributed computing is a non-trival thing to implement and I think
 it would be a better use of our time to use an already existing framework.
 And it would also mean one less API for tool writers to have to develop 
 for.
 I was wondering if anybody has looked at Mesos (
 http://mesos.apache.org/ ). You can see an overview of the Mesos
 architecture at
 https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
 The important thing about Mesos is that it provides an API for C/C++,
 Java/Scala and Python to write distributed frameworks. There are already
 implementations of frameworks for common parallel programming systems such
 as:
  - Hadoop (https://github.com/mesos/hadoop)
  - MPI (
 https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
 )
  - Spark (http://spark-project.org)
 And you can find example Python framework at
 https://github.com/apache/mesos/tree/master/src/examples/python

 Integration with Galaxy would have three parts:
 1) Add a system config variable to Galaxy called 'MESOS_URL' that is

Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2013-10-28 Thread Kyle Ellrott
I don't think implementation will be very difficult. The bigger question is
this a technology people are open to?
The nearest competitor is YARN (
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
Mesos seems a bit more geared toward general purpose usage (with several
existing frameworks), while YARN seems more specific to Hadoop. But I'd be
glad to hear some other thoughts.

Kyle


On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri madd...@mcs.anl.govwrote:

 Kyle
 This is something I am very interested in. The three parts below make
 sense to me. I would be very happy to discuss further and provide any help
 to move this forward.

 Regards
 On Oct 26, 2013, at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:

 I think one of the aspects where Galaxy is a bit soft is the ability to do
 distributed tasks. The current system of split/replicate/merge tasks based
 on file type is a bit limited and hard for tool developers to expand upon.
 Distributed computing is a non-trival thing to implement and I think it
 would be a better use of our time to use an already existing framework. And
 it would also mean one less API for tool writers to have to develop for.
 I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/). 
 You can see an overview of the Mesos architecture at
 https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
 The important thing about Mesos is that it provides an API for C/C++,
 Java/Scala and Python to write distributed frameworks. There are already
 implementations of frameworks for common parallel programming systems such
 as:
  - Hadoop (https://github.com/mesos/hadoop)
  - MPI (
 https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
 )
  - Spark (http://spark-project.org)
 And you can find example Python framework at
 https://github.com/apache/mesos/tree/master/src/examples/python

 Integration with Galaxy would have three parts:
 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
 passed to tool wrappers and allows them to contact the local mesos
 infrastructure (assuming the system has been configured) or pass a null if
 the system isn't available.
 2) Write a tool runner that works as a mesos framework to executes single
 cpu jobs on the distributed system.
 3) For instances where mesos is not available at a system wide level (say
 they only have access to an SGE based cluster), but the user wants to run
 distributed jobs, write a wrapper that can create a mesos cluster using the
 existing queueing system. For example, right now I run a Mesos system under
 the SGE queue system.

 I'm curious to see what other people think.

 Kyle
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


 --
 Ravi K Madduri
 MCS, Argonne National Laboratory
 Computation Institute, University of Chicago


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2013-10-28 Thread Kyle Ellrott
You probably are a good person to get an opinion from. My plan isn't to
write new frameworks, but rather use existing libraries that can
communicate with Mesos to setup their parallel environments.
But for Swift, you would probably want to write a new framework. Just
looking at Swift, I imagine one of the harder parts is just getting the
system setup on a cluster (ie distributing out files to remote nodes,
making sure that you have a way to start processes on those nodes and have
them know where to find the master), it seems like Swift could benefit from
having a Mesos based framework. Do you think it would enable you to have a
'zero-config' startup of a distributed Swift application?

Kyle



On Mon, Oct 28, 2013 at 1:51 PM, Ketan Maheshwari 
ketancmaheshw...@gmail.com wrote:

 Hi Kyle,

 We have a similar ongoing development wherein we are working on
 integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal
 is to enable Galaxy based applications to run on a variety of distributed
 resources via various integration schemes as suitable to application and
 underlying execution environment.

 Here is an abstract of a paper (co-authored with Ravi, who responded on
 this thread) we will be presenting in a workshop at the upcoming SC 13
 conference:

 The Galaxy platform is a web-based science portal for scientific
 computing supporting Life Sciences users community. While user-friendly and
 intuitive for doing small to medium scale computations, it currently has a
 limited support for large-scale, parallel and distributed computing. The
 Swift parallel scripting framework is capable of composing ordinary
 applications into parallel scripts that can be run on multi-scale
 distributed and performance computing platforms. In complex distributed
 environments, often the user end of application lifecycle slows down
 because of the technical complexities brought in by the scale, access
 methods and resource management nuances. Galaxy offers a simple way of
 designing, composing, executing, reusing, and reproducing application runs.
 An integration between Swift and Galaxy systems can accelerate science as
 well as bring the respective user communities together in an interactive,
 user-friendly, parallel and distributed data analysis environment enabled
 on a broad range of computational infrastructures.

 Kindly let us know if you need a hands on for the various tools we have
 already developed.


 Best,
 Ketan



 On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott kellr...@soe.ucsc.eduwrote:

 I don't think implementation will be very difficult. The bigger question
 is this a technology people are open to?
 The nearest competitor is YARN (
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
 Mesos seems a bit more geared toward general purpose usage (with several
 existing frameworks), while YARN seems more specific to Hadoop. But I'd be
 glad to hear some other thoughts.

 Kyle


 On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri madd...@mcs.anl.govwrote:

 Kyle
 This is something I am very interested in. The three parts below make
 sense to me. I would be very happy to discuss further and provide any help
 to move this forward.

 Regards
 On Oct 26, 2013, at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:

 I think one of the aspects where Galaxy is a bit soft is the ability to
 do distributed tasks. The current system of split/replicate/merge tasks
 based on file type is a bit limited and hard for tool developers to expand
 upon. Distributed computing is a non-trival thing to implement and I think
 it would be a better use of our time to use an already existing framework.
 And it would also mean one less API for tool writers to have to develop for.
 I was wondering if anybody has looked at Mesos (
 http://mesos.apache.org/ ). You can see an overview of the Mesos
 architecture at
 https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
 The important thing about Mesos is that it provides an API for C/C++,
 Java/Scala and Python to write distributed frameworks. There are already
 implementations of frameworks for common parallel programming systems such
 as:
  - Hadoop (https://github.com/mesos/hadoop)
  - MPI (
 https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
 )
  - Spark (http://spark-project.org)
 And you can find example Python framework at
 https://github.com/apache/mesos/tree/master/src/examples/python

 Integration with Galaxy would have three parts:
 1) Add a system config variable to Galaxy called 'MESOS_URL' that is
 then passed to tool wrappers and allows them to contact the local mesos
 infrastructure (assuming the system has been configured) or pass a null if
 the system isn't available.
 2) Write a tool runner that works as a mesos framework to executes
 single cpu jobs on the distributed system.
 3) For instances where mesos is not available at a system wide level
 (say they only have access to an SGE based 

[galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2013-10-26 Thread Kyle Ellrott
I think one of the aspects where Galaxy is a bit soft is the ability to do
distributed tasks. The current system of split/replicate/merge tasks based
on file type is a bit limited and hard for tool developers to expand upon.
Distributed computing is a non-trival thing to implement and I think it
would be a better use of our time to use an already existing framework. And
it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos (
http://mesos.apache.org/). You can see an overview of the Mesos
architecture at
https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++,
Java/Scala and Python to write distributed frameworks. There are already
implementations of frameworks for common parallel programming systems such
as:
 - Hadoop (https://github.com/mesos/hadoop)
 - MPI (
https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
)
 - Spark (http://spark-project.org)
And you can find example Python framework at
https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
passed to tool wrappers and allows them to contact the local mesos
infrastructure (assuming the system has been configured) or pass a null if
the system isn't available.
2) Write a tool runner that works as a mesos framework to executes single
cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say
they only have access to an SGE based cluster), but the user wants to run
distributed jobs, write a wrapper that can create a mesos cluster using the
existing queueing system. For example, right now I run a Mesos system under
the SGE queue system.

I'm curious to see what other people think.

Kyle
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/