Re: [galaxy-dev] tool_dependencies.xml format

2013-09-17 Thread Bjoern Gruening
Hi James,

thanks for your thoughts on abstraction of common tasks.
For most of these things we have now patches in bitbucket.

 Similar recipes could be:
 
 autoconf: default to configure; make; make install, allow providing
 configuration options

https://bitbucket.org/galaxy/galaxy-central/pull-request/218/implementation-of-the-configure-make-make/diff

 make_install: just make; make install; allow providing make options

https://bitbucket.org/galaxy/galaxy-central/pull-request/217/implementation-of-the-make_install-action/diff

 python_virtualenv

Is that not supposed to work with the 'setup_virtualenv'?

 ruby_rbenv

From John:
https://bitbucket.org/galaxy/galaxy-central/pull-request/207/john-chiltons-august-2013-tool-shed/diff


 r_package

https://bitbucket.org/galaxy/galaxy-central/pull-request/219/implementation-of-the-a-r_environment-to/diff


Cheers,
Bjoern

 ...
 
 Basically, most of the times the steps to install a particular package
 are boilerplate, this would remove a ton of duplication in the recipe
 files. Also, a likely less popular proposal would be to go one step
 further, tool_dependencies.yaml:
 
 recipe: python_package_setuptools
 name: requests
 version: 1.2.3
 url: 
 http://pypi.python.org/packages/source/r/requests/requests-${version}.tar.gz
 
 -- jt
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] tool_dependencies.xml format

2013-08-27 Thread Nate Coraor
On Aug 26, 2013, at 11:59 AM, James Taylor wrote:

 On Mon, Aug 26, 2013 at 11:48 AM, John Chilton chil...@msi.umn.edu wrote:
 
 I think it is interesting that there was push back on providing
 infrastructure (tool actions) for obtaining CBL from github and
 performing installs based on it because it was not in the tool shed
 and therefore less reproducible, but the team believes infrastructure
 should be put in place to support pypi.
 
 Well, first, I'm not sure what the team believes, I'm stating what I
 believe and engaging in a discussion with the community. At some
 point this should evolve into what we are actually going to do and be
 codified in a spec as a Trello card, which is even then not set in
 stone.
 
 Second, I'm not suggesting we depend on PyPI. The nice thing about the
 second format I proposed on galaxy-dev is that we can easily parse out
 the URL and archive that file. Then someday we could provide a
 fallback repository where if the PyPI URL no longer works we still
 have it stored.

I concur here, the experience and lessons learned by long-established package 
and dependency managers can provide some useful guidance for us going forward.  
APT has long relied on a model of archiving upstream source (as well as 
distro-generated binary (dpkg) packages), cataloging changes as a set of 
patches, and maintaining an understanding of installed files, even those meant 
to be user-edited.  I think there is a strong advantage for us doing this as 
well.

 
 I think we all value reproduciblity here, but we make different
 calculations on what is reproducible. I think in terms of implementing
 the ideas James has laid out or similar things I have proposed, it
 might be beneficial to have some final answers on what external
 resources are allowed - both for obtaining a Galaxy IUC gold star and
 for the tool shed providing infrastructure to support their usage.
 
 My focus is ensuring that we can archive things that pass through the
 toolshed. Tarballs from *anywhere* are easy enough to deal with.
 External version control repositories are a bit more challenging,
 especially when you are pulling just a particular file out, so that's
 where things got a little hinky for me.
 
 Since we don't have the archival mechanism in place yet anyway, this
 is more a philosophical discussion and setting the right precedent.
 
 And yes, keeping an archive of all the software in the world is a
 scary prospect, though compared to the amount of data we currently
 keep for people it is a blip. And I'm not sure how else we can really
 achieve the level of reproducibility we desire.

One additional step that will assist with long-term archival is generating 
static metadata and allowing the packaging and dependency systems to work 
outside of the Galaxy and Tool Shed applications.  A package metadata catalog 
and package format that provided descriptions of packages on a generic 
webserver and installable without a running Galaxy instance are components that 
I believe are fairly important.

As for user-edited files, the env.sh files, which are generated at install-time 
and then essentially untracked afterward scare me a bit.  I think it'd be 
useful for the packaging system have a tighter concept of environment 
management.

These are just my opinions, of course, and are going to be very APT/dpkg-biased 
simply due to my experience with and favor for Debian-based distros and 
dependency/package management, but I think there are useful concepts in this 
(and other systems) that we can draw from.

Along those lines, one more idea I had thrown out a while ago was coming up 
with a way to incorporate (or at least automatically process so that we can 
convert to our format) the build definitions for other systems like MacPorts, 
BSD ports/pkgsrc, dpkg, rpm, etc. so that we can leverage the existing rules 
for building across our target platforms that have already been worked out by 
other package maintainers with more time.  I think this aligns pretty well with 
Brad's thinking with CloudBioLinux, the difference in implementation being that 
we require multiple installable versions and platform independence.

I am a bit worried that as we go down the repackage (almost) all dependencies 
path (which I do think is the right path), we also run the risk of most of our 
packages being out of date.  That's almost a guaranteed outcome when even the 
huge packaging projects (Debian, Ubuntu, etc.) are rife with out-of-date 
packages.  So being able to incorporate upstream build definitions may help us 
package dependencies quickly.

--nate

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/



Re: [galaxy-dev] tool_dependencies.xml format

2013-08-27 Thread John Chilton
Before I went on that tangent, I should have said I of course agree
with 100% of what James said in the original e-mail on this thread.
For what it is worth, I believe the higher-level constructs he
outlined are essential to the long term adoption of the tool shed.

On Tue, Aug 27, 2013 at 1:59 PM, Nate Coraor n...@bx.psu.edu wrote:
 On Aug 26, 2013, at 11:59 AM, James Taylor wrote:

 On Mon, Aug 26, 2013 at 11:48 AM, John Chilton chil...@msi.umn.edu wrote:

 I think it is interesting that there was push back on providing
 infrastructure (tool actions) for obtaining CBL from github and
 performing installs based on it because it was not in the tool shed
 and therefore less reproducible, but the team believes infrastructure
 should be put in place to support pypi.

 Well, first, I'm not sure what the team believes, I'm stating what I
 believe and engaging in a discussion with the community. At some
 point this should evolve into what we are actually going to do and be
 codified in a spec as a Trello card, which is even then not set in
 stone.

 Second, I'm not suggesting we depend on PyPI. The nice thing about the
 second format I proposed on galaxy-dev is that we can easily parse out
 the URL and archive that file. Then someday we could provide a
 fallback repository where if the PyPI URL no longer works we still
 have it stored.

 I concur here, the experience and lessons learned by long-established package 
 and dependency managers can provide some useful guidance for us going 
 forward.  APT has long relied on a model of archiving upstream source (as 
 well as distro-generated binary (dpkg) packages), cataloging changes as a set 
 of patches, and maintaining an understanding of installed files, even those 
 meant to be user-edited.  I think there is a strong advantage for us doing 
 this as well.


 I think we all value reproduciblity here, but we make different
 calculations on what is reproducible. I think in terms of implementing
 the ideas James has laid out or similar things I have proposed, it
 might be beneficial to have some final answers on what external
 resources are allowed - both for obtaining a Galaxy IUC gold star and
 for the tool shed providing infrastructure to support their usage.

 My focus is ensuring that we can archive things that pass through the
 toolshed. Tarballs from *anywhere* are easy enough to deal with.
 External version control repositories are a bit more challenging,
 especially when you are pulling just a particular file out, so that's
 where things got a little hinky for me.

 Since we don't have the archival mechanism in place yet anyway, this
 is more a philosophical discussion and setting the right precedent.

 And yes, keeping an archive of all the software in the world is a
 scary prospect, though compared to the amount of data we currently
 keep for people it is a blip. And I'm not sure how else we can really
 achieve the level of reproducibility we desire.

 One additional step that will assist with long-term archival is generating 
 static metadata and allowing the packaging and dependency systems to work 
 outside of the Galaxy and Tool Shed applications.  A package metadata catalog 
 and package format that provided descriptions of packages on a generic 
 webserver and installable without a running Galaxy instance are components 
 that I believe are fairly important.

 As for user-edited files, the env.sh files, which are generated at 
 install-time and then essentially untracked afterward scare me a bit.  I 
 think it'd be useful for the packaging system have a tighter concept of 
 environment management.

 These are just my opinions, of course, and are going to be very 
 APT/dpkg-biased simply due to my experience with and favor for Debian-based 
 distros and dependency/package management, but I think there are useful 
 concepts in this (and other systems) that we can draw from.

 Along those lines, one more idea I had thrown out a while ago was coming up 
 with a way to incorporate (or at least automatically process so that we can 
 convert to our format) the build definitions for other systems like MacPorts, 
 BSD ports/pkgsrc, dpkg, rpm, etc. so that we can leverage the existing rules 
 for building across our target platforms that have already been worked out by 
 other package maintainers with more time.  I think this aligns pretty well 
 with Brad's thinking with CloudBioLinux, the difference in implementation 
 being that we require multiple installable versions and platform independence.

The CloudBioLinux galaxy tool stuff used by Galaxy-P, CloudMan, and in
integrated into tool shed installs with pull request 207 is platform
independent (or as platform independent as the tool shed) and allows
multiple installable versions.


 I am a bit worried that as we go down the repackage (almost) all 
 dependencies path (which I do think is the right path), we also run the risk 
 of most of our packages being out of date.  That's almost a guaranteed 
 

[galaxy-dev] tool_dependencies.xml format

2013-08-26 Thread James Taylor
All,

I've been seeing some examples of tool_depedencies.xml come across of
the list, and I'm wondering if there are ways that it can be
simplified. When we were first defining these features, we talked
about having high level recipes for certain types of installs. This
could greatly simplify things. For example, can this:

tool_dependency
package name=requests version=1.2.3
install version=1.0
actions
action
type=download_by_urlhttp://pypi.python.org/packages/source/r/requests/re
quests-1.2.3.tar.gz/action
action
type=make_directory$INSTALL_DIR/lib/python/action
action type=shell_commandexport
PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python amp;amp;
python setup.py install --home $INSTALL_DIR
--install-scripts $INSTALL_DIR/bin/action
action type=set_environment
environment_variable name=PYTHONPATH
action=append_to$INSTALL_DIR/lib/python/environment_variable
environment_variable name=PATH
action=prepend_to$INSTALL_DIR/bin/environment_variable
/action

/actions
/install
readme
/readme
/package
/tool_dependency

Be simplified to:

tool_dependency
package name=requests version=1.2.3
install recipe=python_package_setuptools

url=http://pypi.python.org/packages/source/r/requests/requests-1.2.3.tar.gz;
/
/tool_dependency

The assumptions are: when version is not provided, it is 1.0 (we've
always maintained compatibility in the past for config files so
hopefully this never changes), when installing a python package the
install directories and environment variables that need to be set are
always the same.

Similar recipes could be:

autoconf: default to configure; make; make install, allow providing
configuration options
make_install: just make; make install; allow providing make options
python_virtualenv
ruby_rbenv
r_package
...

Basically, most of the times the steps to install a particular package
are boilerplate, this would remove a ton of duplication in the recipe
files. Also, a likely less popular proposal would be to go one step
further, tool_dependencies.yaml:

recipe: python_package_setuptools
name: requests
version: 1.2.3
url: 
http://pypi.python.org/packages/source/r/requests/requests-${version}.tar.gz

-- jt
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] tool_dependencies.xml format

2013-08-26 Thread John Chilton
James, et. al.

I think it is interesting that there was push back on providing
infrastructure (tool actions) for obtaining CBL from github and
performing installs based on it because it was not in the tool shed
and therefore less reproducible, but the team believes infrastructure
should be put in place to support pypi.

I understand there are any number of distinctions that could be made
here - perhaps you have made the calculation pypi is more stable than
github (either in terms or immutability or funding), perhaps the
setuptools mechanism is more general and could potentially support
grabbing these tar balls from the tool shed (or a tool shed adjacent
object store).

I think we all value reproduciblity here, but we make different
calculations on what is reproducible. I think in terms of implementing
the ideas James has laid out or similar things I have proposed, it
might be beneficial to have some final answers on what external
resources are allowed - both for obtaining a Galaxy IUC gold star and
for the tool shed providing infrastructure to support their usage.

I don't know if this takes for the form of the IUC voting or James
and/or Greg issuing a proclamation, but it would be good to get firm
answers on these two questions for the following sites rubygems, pypi,
github, bitbucket, cpan, cran, sourceforge, and google code. It would
also be great to have a process in place for deciding these questions
for future repositories.

Thanks,
-John

On Mon, Aug 26, 2013 at 9:05 AM, James Taylor ja...@jamestaylor.org wrote:
 All,

 I've been seeing some examples of tool_depedencies.xml come across of
 the list, and I'm wondering if there are ways that it can be
 simplified. When we were first defining these features, we talked
 about having high level recipes for certain types of installs. This
 could greatly simplify things. For example, can this:

 tool_dependency
 package name=requests version=1.2.3
 install version=1.0
 actions
 action
 type=download_by_urlhttp://pypi.python.org/packages/source/r/requests/re
 quests-1.2.3.tar.gz/action
 action
 type=make_directory$INSTALL_DIR/lib/python/action
 action type=shell_commandexport
 PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python amp;amp;
 python setup.py install --home $INSTALL_DIR
 --install-scripts $INSTALL_DIR/bin/action
 action type=set_environment
 environment_variable name=PYTHONPATH
 action=append_to$INSTALL_DIR/lib/python/environment_variable
 environment_variable name=PATH
 action=prepend_to$INSTALL_DIR/bin/environment_variable
 /action

 /actions
 /install
 readme
 /readme
 /package
 /tool_dependency

 Be simplified to:

 tool_dependency
 package name=requests version=1.2.3
 install recipe=python_package_setuptools

 url=http://pypi.python.org/packages/source/r/requests/requests-1.2.3.tar.gz;
 /
 /tool_dependency

 The assumptions are: when version is not provided, it is 1.0 (we've
 always maintained compatibility in the past for config files so
 hopefully this never changes), when installing a python package the
 install directories and environment variables that need to be set are
 always the same.

 Similar recipes could be:

 autoconf: default to configure; make; make install, allow providing
 configuration options
 make_install: just make; make install; allow providing make options
 python_virtualenv
 ruby_rbenv
 r_package
 ...

 Basically, most of the times the steps to install a particular package
 are boilerplate, this would remove a ton of duplication in the recipe
 files. Also, a likely less popular proposal would be to go one step
 further, tool_dependencies.yaml:

 recipe: python_package_setuptools
 name: requests
 version: 1.2.3
 url: 
 http://pypi.python.org/packages/source/r/requests/requests-${version}.tar.gz

 -- jt
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] tool_dependencies.xml format

2013-08-26 Thread James Taylor
On Mon, Aug 26, 2013 at 11:48 AM, John Chilton chil...@msi.umn.edu wrote:

 I think it is interesting that there was push back on providing
 infrastructure (tool actions) for obtaining CBL from github and
 performing installs based on it because it was not in the tool shed
 and therefore less reproducible, but the team believes infrastructure
 should be put in place to support pypi.

Well, first, I'm not sure what the team believes, I'm stating what I
believe and engaging in a discussion with the community. At some
point this should evolve into what we are actually going to do and be
codified in a spec as a Trello card, which is even then not set in
stone.

Second, I'm not suggesting we depend on PyPI. The nice thing about the
second format I proposed on galaxy-dev is that we can easily parse out
the URL and archive that file. Then someday we could provide a
fallback repository where if the PyPI URL no longer works we still
have it stored.

 I think we all value reproduciblity here, but we make different
 calculations on what is reproducible. I think in terms of implementing
 the ideas James has laid out or similar things I have proposed, it
 might be beneficial to have some final answers on what external
 resources are allowed - both for obtaining a Galaxy IUC gold star and
 for the tool shed providing infrastructure to support their usage.

My focus is ensuring that we can archive things that pass through the
toolshed. Tarballs from *anywhere* are easy enough to deal with.
External version control repositories are a bit more challenging,
especially when you are pulling just a particular file out, so that's
where things got a little hinky for me.

Since we don't have the archival mechanism in place yet anyway, this
is more a philosophical discussion and setting the right precedent.

And yes, keeping an archive of all the software in the world is a
scary prospect, though compared to the amount of data we currently
keep for people it is a blip. And I'm not sure how else we can really
achieve the level of reproducibility we desire.
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/