Re: [galaxy-dev] Better packaging for toolshed binaries

2013-08-29 Thread Peter Cock
On Thu, Aug 29, 2013 at 5:45 AM, Guest, Simon
simon.gu...@agresearch.co.nz wrote:
 Dear Galaxians,

 This email is about difficulties with the current approach for installing 
 tool dependency binaries from the Galaxy Toolshed, and what might be done to 
 improve the situation.  It comes down to this:  packaging software to run on 
 different systems is tricky.  It is a problem that has been solved by various 
 Linux distributions with their packaging systems (RPM, deb, etc.), and 
 package archives.  The Galaxy Toolshed is trying to solve this problem again, 
 but so far it doesn't work very well.  There must be something better we can 
 do.


I agree with you, and as more people try to package thier
tools and the dependencies, I think more will too :(

 Since gaining a better understanding from the Galaxy Community Conference of 
 what the Toolshed is trying to do (versioned tools, reproducibility), I have 
 been working on switching over from locally installed tools to Toolshed 
 versions.  However, it has not gone well, and I think I am about to revert to 
 my previous approach.  Here's the problem:  building software from source on 
 any system requires certain tweaks to the build process which are dependent 
 on the target platform.  An example is the NCBI BLAST+ suite, which failed to 
 build on my (EL6) system, because it couldn't run /usr/bin/touch.  That's 
 pretty dumb, and pretty simple to solve in isolation - it needs to be running 
 /bin/touch instead.


Can we  continue this specific example here?:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-August/015890.html
...
http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-August/016287.html

Short answer, yes I know, a new install XML process being
used on the Test Tool Shed which fixes this (but breaks in
a not yet understood way on the Galaxy teams test cluster),
awaiting release to the main Tool Shed.

 But the general point is this:  it's not feasible (i.e. too much work, too 
 hard) to produce build scripts to build software from source that work on any 
 platform, even the common ones.  Packaging source code for a given platform 
 is a non-trivial task.  The RPM and deb packagers are doing a good job here.  
 It's a significant amount of work.  I know that, as I've been packaging 
 bioinformatics software as binary RPMs for EL6 for 18 months or so now, and 
 have done nearly 300 packages.

 What do we want?  Simply to be able to install a given version of some 
 software, and all its dependencies, with a single click, or a single command, 
 and have it Just Work (tm).  It's the dependencies that make this hard.  
 Things get installed in different ways on different systems.  Does your 
 platform need #include bam.h, or #include bam/bam.h?  If the former, then 
 you'll have to patch tophat, say, (in a trivial way) before building it.  I 
 think this is simply too hard to do by embedding some commands and 
 conditionals in Toolshed XML build files.


Indeed - nice tools being packaged will have something like
a ./configure script to take care of that, but not all :(

 It seems to me that a number of people out there are currently having some 
 issues installing tool dependencies from the Toolshed, because things are not 
 building as expected.  I think it's much easier for just one person to 
 troubleshoot why things go wrong when they are packaging the software for a 
 given platform, rather than for each end user (Galaxy admin) to wonder why a 
 tool failed to install.

 So, what to do?  My starting point is that I have packaged a large amount of 
 bioinformatics software for EL6, which is freely available at 
 http://rpm.agresearch.co.nz/.  I'm after some Galaxy tool wrappers for the 
 tools that we use here at AgResearch, which can simply make use of packages 
 installed from this repo.

 Is there any interest in exploring the merits or otherwise of this approach 
 in the Galaxy community?


There is a similar but probably larger set of Debian packages
available via Debian-Med and Bio-Linux too. The catch here
is can you install arbitrary versions of a tool in parallel? And
I think the answer sadly is no.

The idea of standard recipe templates (e.g. typical Python
install) James outlined here might help:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-August/016273.html

Peter

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Better packaging for toolshed binaries

2013-08-29 Thread James Taylor
On Thu, Aug 29, 2013 at 3:36 AM, Peter Cock p.j.a.c...@googlemail.com wrote:
 There is a similar but probably larger set of Debian packages
 available via Debian-Med and Bio-Linux too. The catch here
 is can you install arbitrary versions of a tool in parallel? And
 I think the answer sadly is no.

This is the crucial concern for us. The standard OS packaging
approaches (RPM and DEB) do not support this except very poorly. This
is something we absolutely need. There are other package managers that
do a better job (I'm quite fond of Homebrew on OS X, NIX also looks
nice) but would add more dependencies.

 The idea of standard recipe templates (e.g. typical Python
 install) James outlined here might help:
 http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-August/016273.html

I (as always) think a lot can be solved through abstraction. I'm
envisioning a very high level description of what it takes to install
a package, and then have different adapters to take that and install
it for a given OS.

--
James Taylor, Associate Professor, Biology/CS, Emory University
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Better packaging for toolshed binaries

2013-08-29 Thread John Chilton
On Thu, Aug 29, 2013 at 4:17 PM, Guest, Simon simon.gu...@agresearch.co.nz
wrote:
  There is a similar but probably larger set of Debian packages
  available via Debian-Med and Bio-Linux too. The catch here is can you
  install arbitrary versions of a tool in parallel? And I think the
  answer sadly is no.

 This is the crucial concern for us. The standard OS packaging approaches
 (RPM and DEB) do not support this except very poorly. This is something
we
 absolutely need. There are other package managers that do a better job
 (I'm quite fond of Homebrew on OS X, NIX also looks
 nice) but would add more dependencies.

 There are possibilities here, similar to things I've already been doing
in my RPM packaging.

 If you want to install multiple versions side by side, when you (or more
likely, me) are making the packages, you just make the version number part
of the package name, and install it out of the way somewhere (e.g.
/usr/libexec/tophat-2.0.9, rather than /usr/bin). Then, the package can
provide a versioned environment module as per
http://modules.sourceforge.net/. There could be a non-versioned environment
module which just gives you the latest and greatest version. So:

This is the same thing that the tool shed does. From Greg:

Following best practices, repositories of type Tool dependency definition
are named something like package_name_version (e.g.,
package_amos_3_1_0, package_ape_3_0, package_atlas_3_10, etc) and are
contained in the Tool Dependency Packages category in the Tool Shed. The
name of the repository contains the package name as well as the version
because the contents of the repository must contain only the recipe for
installing that specific version of that package. If a new version (say
3.1) of the ape package is introduced some time in the future, then a new
repository named package_ape_3_1 should be created to contain the recipe
for installing that version.

Tool dependency definition repositories may only have one installable
revision.

Toolshed has some advantages over OS packages, but I do not understand why
handling of multiple versions is considered by some among these.


 $ module load tophat/2.0.9
 # now that version is on the path

 # start again ...
 $ module load tophat
 # the latest and greatest tophat becomes available

 We've been using this to provide multiple versions of small tools, but
also bigger things like a version of Python more recent than the system
one. (Software Collections may be better for the latter though -
https://access.redhat.com/site/documentation/en-US/Red_Hat_Developer_Toolset/1/html/Software_Collections_Guide/
)

 I'm willing to explore the feasibility of overhauling the AgResearch RPM
repo to support multiple versions of packages in this or a similar way if
there's interest. There's clearly value in being able to select what
version of a tool you run, if it can be done in a way that doesn't encumber
those who just want to run a recent good version.

 Is there interest in this approach? (Note: I'm not committing to doing it
just yet.)

 cheers,
 Simon


 ===
 Attention: The information contained in this message and/or attachments
 from AgResearch Limited is intended only for the persons or entities
 to which it is addressed and may contain confidential and/or privileged
 material. Any review, retransmission, dissemination or other use of, or
 taking of any action in reliance upon, this information by persons or
 entities other than the intended recipients is prohibited by AgResearch
 Limited. If you have received this message in error, please notify the
 sender immediately.
 ===

 ___
 Please keep all replies on the list by using reply all
 in your mail client. To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Better packaging for toolshed binaries

2013-08-28 Thread Guest, Simon
Dear Galaxians,

This email is about difficulties with the current approach for installing tool 
dependency binaries from the Galaxy Toolshed, and what might be done to improve 
the situation.  It comes down to this:  packaging software to run on different 
systems is tricky.  It is a problem that has been solved by various Linux 
distributions with their packaging systems (RPM, deb, etc.), and package 
archives.  The Galaxy Toolshed is trying to solve this problem again, but so 
far it doesn't work very well.  There must be something better we can do.

Since gaining a better understanding from the Galaxy Community Conference of 
what the Toolshed is trying to do (versioned tools, reproducibility), I have 
been working on switching over from locally installed tools to Toolshed 
versions.  However, it has not gone well, and I think I am about to revert to 
my previous approach.  Here's the problem:  building software from source on 
any system requires certain tweaks to the build process which are dependent on 
the target platform.  An example is the NCBI BLAST+ suite, which failed to 
build on my (EL6) system, because it couldn't run /usr/bin/touch.  That's 
pretty dumb, and pretty simple to solve in isolation - it needs to be running 
/bin/touch instead.

But the general point is this:  it's not feasible (i.e. too much work, too 
hard) to produce build scripts to build software from source that work on any 
platform, even the common ones.  Packaging source code for a given platform is 
a non-trivial task.  The RPM and deb packagers are doing a good job here.  It's 
a significant amount of work.  I know that, as I've been packaging 
bioinformatics software as binary RPMs for EL6 for 18 months or so now, and 
have done nearly 300 packages.

What do we want?  Simply to be able to install a given version of some 
software, and all its dependencies, with a single click, or a single command, 
and have it Just Work (tm).  It's the dependencies that make this hard.  Things 
get installed in different ways on different systems.  Does your platform need 
#include bam.h, or #include bam/bam.h?  If the former, then you'll have to 
patch tophat, say, (in a trivial way) before building it.  I think this is 
simply too hard to do by embedding some commands and conditionals in Toolshed 
XML build files.

It seems to me that a number of people out there are currently having some 
issues installing tool dependencies from the Toolshed, because things are not 
building as expected.  I think it's much easier for just one person to 
troubleshoot why things go wrong when they are packaging the software for a 
given platform, rather than for each end user (Galaxy admin) to wonder why a 
tool failed to install.

So, what to do?  My starting point is that I have packaged a large amount of 
bioinformatics software for EL6, which is freely available at 
http://rpm.agresearch.co.nz/.  I'm after some Galaxy tool wrappers for the 
tools that we use here at AgResearch, which can simply make use of packages 
installed from this repo.

Is there any interest in exploring the merits or otherwise of this approach in 
the Galaxy community?

cheers,
Simon


Simon Guest
Senior UNIX Technical Consultant
AgResearch, New Zealand


===
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
===

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/