Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-05-23 Thread Björn Grüning
Hello Nate,

 On May 14, 2013, at 10:58 AM, John Chilton wrote:
 
  Hey Nate,
  
  On Tue, May 14, 2013 at 8:40 AM, Nate Coraor n...@bx.psu.edu wrote:
  Hi John,
  
  A few of us in the lab here at Penn State actually discussed automatic 
  creation of virtualenvs for dependency installations a couple weeks ago.  
  This was in the context of Bjoern's request for supporting compile-time 
  dependencies.  I think it's a great idea, but there's a limitation that 
  we'd need to account for.
  
  If you're going to have frequently used and expensive to build libraries 
  (e.g. numpy, R + rpy) in dependency-only repositories and then have your 
  tool(s) depend on those repositories, the activate method won't work.  
  virtualenvs cannot depend on other virtualenvs or be active at the same 
  time as other virtualenvs.  We could work around it by setting PYTHONPATH 
  in the dependencies' env.sh like we do now.  But then, other than making 
  installation a bit easier (e.g. by allowing the use of pip), we have not 
  gained much.
  
  I don't know what to make of your response. It seems like a no, but
  the word no doesn't appear anywhere.
 
 Sorry about being wishy-washy.  Unless anyone has any objections or can 
 foresee other problems, I would say yes to this.  But I believe it should not 
 break the concept of common-dependency-only repositories.
 
 I'm pretty sure that as long as the process of creating a venv also adds the 
 venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem 
 should be automatically dealt with.
 
  I don't know the particulars of rpy, but numpy installs fine via this
  method and I see no problem with each application having its own copy
  of numpy. I think relying on OS managed python packages for instance
  is something of a bad practice, when developing and distributing
  software I use virtualenvs for everything. I think that stand-alone
  python defined packages in the tool shed are directly analogous to OS
  managed packages.
 
 Completely agree that we want to avoid OS-managed python packages.  
 I had, in the past, considered that for something like numpy, we ought to 
 make it easy for an administrator to allow their own version of numpy to be 
 used, 
 since numpy can be linked against a number of optimized libraries for 
 significant performance gains, and this generally won't happen for 
 versions installed from the toolshed unless the system already has stuff like 
 atlas-dev installed.  
 But I think we still allow admins that possibility with reasonable ease since 
 dependency management in Galaxy is not a requirement.

The repository in the testtoolshed is now able to compile numpy against
atlas and lapack. It is a little bit of work but we can do such things
now.
(It still did not deactivate cpu-scaling during compilation, but I hope
that has not a big impact on performacne)

 What we do want to avoid is the situation where someone clones a new copy of 
 Galaxy, wants to install 10 different tools that all depend on numpy, 
 and has to wait an hour while 10 versions of numpy compile.  Add that in with 
 other tools that will have a similar process (installing R + packages + rpy) 
 plus the hope that down the line you'll be able to automatically maintain 
 separate builds for remote resources that are not the same (i.e. multiple 
 clusters with differing operating systems) 
 and this hopefully highlights why I think reducing duplication where possible 
 will be important.
 
  I also disagree we have not gained much. Setting up these repositories
  is a onerous, brittle process. This patch provides some high-level
  functionality for creating virtualenv's which negates the need for
  creating separate repositories per package.
 
 This is a good point.  I probably also sold short the benefit of being able 
 to install with pip, since this does indeed remove a similarly brittle and 
 tedious step of downloading and installing modules.
 
 --nate
 
  
  -John
  
  
  --nate
  
  On May 13, 2013, at 6:49 PM, John Chilton wrote:
  
  The proliferation of individual python package install definitions has
  continued and it has spread to some MSI managed tools. I worry about
  the tedium I will have to endure in the future if that becomes an
  established best practice :) so I have implemented the python version
  of what I had described in this thread:
  
  As patch:
  https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b.patch
  Pretty version:
  https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b
  
  I understand that there are going to be differing opinions as to
  whether this is the best way forward but I thought I would give my
  position a better chance of succeeding by providing an implementation.
  
  Thanks for your consideration,
  -John
  
  
  On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock p.j.a.c...@googlemail.com 
  wrote:
  On Tue, Apr 16, 2013 at 2:46 PM, John 

Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-05-20 Thread Nate Coraor
John,

Could you create a pull request with your changes from the branch in github?  
I'll accept them and then commit my additions and changes.  Today is the 
freeze so I'd like to get this in to the next release.

Thanks,
---nate

On May 17, 2013, at 11:21 AM, John Chilton wrote:

 Hey All,
 
 There was a long conversation about this topic in IRC yesterday (among
 people who don't actually use the tool shed all that frequently), I
 have posted it to the new unofficial Galaxy Google+ group if anyone
 would like to read and chime in.
 
 https://plus.google.com/111860405027053012444/posts/TkCFwA2jkDN
 
 -John
 
 
 On Tue, May 14, 2013 at 3:59 PM, Nate Coraor n...@bx.psu.edu wrote:
 Greg created the following card, and I'm working on a few changes to your 
 commit:
 
 https://trello.com/card/toolshed-consider-enhancing-tool-dependency-definition-framework-per-john-chilton-s-pull-request/506338ce32ae458f6d15e4b3/848
 
 Thanks,
 --nate
 
 On May 14, 2013, at 1:45 PM, Nate Coraor wrote:
 
 On May 14, 2013, at 10:58 AM, John Chilton wrote:
 
 Hey Nate,
 
 On Tue, May 14, 2013 at 8:40 AM, Nate Coraor n...@bx.psu.edu wrote:
 Hi John,
 
 A few of us in the lab here at Penn State actually discussed automatic 
 creation of virtualenvs for dependency installations a couple weeks ago.  
 This was in the context of Bjoern's request for supporting compile-time 
 dependencies.  I think it's a great idea, but there's a limitation that 
 we'd need to account for.
 
 If you're going to have frequently used and expensive to build libraries 
 (e.g. numpy, R + rpy) in dependency-only repositories and then have your 
 tool(s) depend on those repositories, the activate method won't work.  
 virtualenvs cannot depend on other virtualenvs or be active at the same 
 time as other virtualenvs.  We could work around it by setting PYTHONPATH 
 in the dependencies' env.sh like we do now.  But then, other than making 
 installation a bit easier (e.g. by allowing the use of pip), we have not 
 gained much.
 
 I don't know what to make of your response. It seems like a no, but
 the word no doesn't appear anywhere.
 
 Sorry about being wishy-washy.  Unless anyone has any objections or can 
 foresee other problems, I would say yes to this.  But I believe it should 
 not break the concept of common-dependency-only repositories.
 
 I'm pretty sure that as long as the process of creating a venv also adds 
 the venv's site-packages to PYTHONPATH in that dependency's env.sh, the 
 problem should be automatically dealt with.
 
 I don't know the particulars of rpy, but numpy installs fine via this
 method and I see no problem with each application having its own copy
 of numpy. I think relying on OS managed python packages for instance
 is something of a bad practice, when developing and distributing
 software I use virtualenvs for everything. I think that stand-alone
 python defined packages in the tool shed are directly analogous to OS
 managed packages.
 
 Completely agree that we want to avoid OS-managed python packages.  I had, 
 in the past, considered that for something like numpy, we ought to make it 
 easy for an administrator to allow their own version of numpy to be used, 
 since numpy can be linked against a number of optimized libraries for 
 significant performance gains, and this generally won't happen for versions 
 installed from the toolshed unless the system already has stuff like 
 atlas-dev installed.  But I think we still allow admins that possibility 
 with reasonable ease since dependency management in Galaxy is not a 
 requirement.
 
 What we do want to avoid is the situation where someone clones a new copy 
 of Galaxy, wants to install 10 different tools that all depend on numpy, 
 and has to wait an hour while 10 versions of numpy compile.  Add that in 
 with other tools that will have a similar process (installing R + packages 
 + rpy) plus the hope that down the line you'll be able to automatically 
 maintain separate builds for remote resources that are not the same (i.e. 
 multiple clusters with differing operating systems) and this hopefully 
 highlights why I think reducing duplication where possible will be 
 important.
 
 I also disagree we have not gained much. Setting up these repositories
 is a onerous, brittle process. This patch provides some high-level
 functionality for creating virtualenv's which negates the need for
 creating separate repositories per package.
 
 This is a good point.  I probably also sold short the benefit of being able 
 to install with pip, since this does indeed remove a similarly brittle and 
 tedious step of downloading and installing modules.
 
 --nate
 
 
 -John
 
 
 --nate
 
 On May 13, 2013, at 6:49 PM, John Chilton wrote:
 
 The proliferation of individual python package install definitions has
 continued and it has spread to some MSI managed tools. I worry about
 the tedium I will have to endure in the future if that becomes an
 established best practice :) so I have 

Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-05-20 Thread John Chilton
Done.

On the topic of the freeze, if Dannon's change requiring all metadata
to be set externally is going to be included I would suggest someone
looking at build_command_line in runners/__init__.py.

https://bitbucket.org/galaxy/galaxy-central/src/237209336f0337ea9f47df39548df3c11900f182/lib/galaxy/jobs/runners/__init__.py?at=default#cl-178

I think there is a bug that when metadata is set externally it masks
the return code of the tool (likewise if from_work_dir is used). I had
just created a trello (https://trello.com/c/JfB2w1Br) card with an
idea for how to address it, but I think the problem is going to be
more severe when everyone is setting metadata externally. I have only
observed this for the from_work_dir case, but based on code inspection
I don't know how setting metadata externally would be different.

Also, that same change broke the LWR so it would be very appreciated
if pr 166 could be accepted before release is tagged :) or at least
the first two changesets.

Thanks all,
-John


On Mon, May 20, 2013 at 8:17 AM, Nate Coraor n...@bx.psu.edu wrote:
 John,

 Could you create a pull request with your changes from the branch in github?  
 I'll accept them and then commit my additions and changes.  Today is the 
 freeze so I'd like to get this in to the next release.

 Thanks,
 ---nate

 On May 17, 2013, at 11:21 AM, John Chilton wrote:

 Hey All,

 There was a long conversation about this topic in IRC yesterday (among
 people who don't actually use the tool shed all that frequently), I
 have posted it to the new unofficial Galaxy Google+ group if anyone
 would like to read and chime in.

 https://plus.google.com/111860405027053012444/posts/TkCFwA2jkDN

 -John


 On Tue, May 14, 2013 at 3:59 PM, Nate Coraor n...@bx.psu.edu wrote:
 Greg created the following card, and I'm working on a few changes to your 
 commit:

 https://trello.com/card/toolshed-consider-enhancing-tool-dependency-definition-framework-per-john-chilton-s-pull-request/506338ce32ae458f6d15e4b3/848

 Thanks,
 --nate

 On May 14, 2013, at 1:45 PM, Nate Coraor wrote:

 On May 14, 2013, at 10:58 AM, John Chilton wrote:

 Hey Nate,

 On Tue, May 14, 2013 at 8:40 AM, Nate Coraor n...@bx.psu.edu wrote:
 Hi John,

 A few of us in the lab here at Penn State actually discussed automatic 
 creation of virtualenvs for dependency installations a couple weeks ago. 
  This was in the context of Bjoern's request for supporting compile-time 
 dependencies.  I think it's a great idea, but there's a limitation that 
 we'd need to account for.

 If you're going to have frequently used and expensive to build libraries 
 (e.g. numpy, R + rpy) in dependency-only repositories and then have your 
 tool(s) depend on those repositories, the activate method won't work.  
 virtualenvs cannot depend on other virtualenvs or be active at the same 
 time as other virtualenvs.  We could work around it by setting 
 PYTHONPATH in the dependencies' env.sh like we do now.  But then, other 
 than making installation a bit easier (e.g. by allowing the use of pip), 
 we have not gained much.

 I don't know what to make of your response. It seems like a no, but
 the word no doesn't appear anywhere.

 Sorry about being wishy-washy.  Unless anyone has any objections or can 
 foresee other problems, I would say yes to this.  But I believe it should 
 not break the concept of common-dependency-only repositories.

 I'm pretty sure that as long as the process of creating a venv also adds 
 the venv's site-packages to PYTHONPATH in that dependency's env.sh, the 
 problem should be automatically dealt with.

 I don't know the particulars of rpy, but numpy installs fine via this
 method and I see no problem with each application having its own copy
 of numpy. I think relying on OS managed python packages for instance
 is something of a bad practice, when developing and distributing
 software I use virtualenvs for everything. I think that stand-alone
 python defined packages in the tool shed are directly analogous to OS
 managed packages.

 Completely agree that we want to avoid OS-managed python packages.  I had, 
 in the past, considered that for something like numpy, we ought to make it 
 easy for an administrator to allow their own version of numpy to be used, 
 since numpy can be linked against a number of optimized libraries for 
 significant performance gains, and this generally won't happen for 
 versions installed from the toolshed unless the system already has stuff 
 like atlas-dev installed.  But I think we still allow admins that 
 possibility with reasonable ease since dependency management in Galaxy is 
 not a requirement.

 What we do want to avoid is the situation where someone clones a new copy 
 of Galaxy, wants to install 10 different tools that all depend on numpy, 
 and has to wait an hour while 10 versions of numpy compile.  Add that in 
 with other tools that will have a similar process (installing R + packages 
 + rpy) plus the hope that down the line you'll 

Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-05-17 Thread John Chilton
Hey All,

There was a long conversation about this topic in IRC yesterday (among
people who don't actually use the tool shed all that frequently), I
have posted it to the new unofficial Galaxy Google+ group if anyone
would like to read and chime in.

https://plus.google.com/111860405027053012444/posts/TkCFwA2jkDN

-John


On Tue, May 14, 2013 at 3:59 PM, Nate Coraor n...@bx.psu.edu wrote:
 Greg created the following card, and I'm working on a few changes to your 
 commit:

 https://trello.com/card/toolshed-consider-enhancing-tool-dependency-definition-framework-per-john-chilton-s-pull-request/506338ce32ae458f6d15e4b3/848

 Thanks,
 --nate

 On May 14, 2013, at 1:45 PM, Nate Coraor wrote:

 On May 14, 2013, at 10:58 AM, John Chilton wrote:

 Hey Nate,

 On Tue, May 14, 2013 at 8:40 AM, Nate Coraor n...@bx.psu.edu wrote:
 Hi John,

 A few of us in the lab here at Penn State actually discussed automatic 
 creation of virtualenvs for dependency installations a couple weeks ago.  
 This was in the context of Bjoern's request for supporting compile-time 
 dependencies.  I think it's a great idea, but there's a limitation that 
 we'd need to account for.

 If you're going to have frequently used and expensive to build libraries 
 (e.g. numpy, R + rpy) in dependency-only repositories and then have your 
 tool(s) depend on those repositories, the activate method won't work.  
 virtualenvs cannot depend on other virtualenvs or be active at the same 
 time as other virtualenvs.  We could work around it by setting PYTHONPATH 
 in the dependencies' env.sh like we do now.  But then, other than making 
 installation a bit easier (e.g. by allowing the use of pip), we have not 
 gained much.

 I don't know what to make of your response. It seems like a no, but
 the word no doesn't appear anywhere.

 Sorry about being wishy-washy.  Unless anyone has any objections or can 
 foresee other problems, I would say yes to this.  But I believe it should 
 not break the concept of common-dependency-only repositories.

 I'm pretty sure that as long as the process of creating a venv also adds the 
 venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem 
 should be automatically dealt with.

 I don't know the particulars of rpy, but numpy installs fine via this
 method and I see no problem with each application having its own copy
 of numpy. I think relying on OS managed python packages for instance
 is something of a bad practice, when developing and distributing
 software I use virtualenvs for everything. I think that stand-alone
 python defined packages in the tool shed are directly analogous to OS
 managed packages.

 Completely agree that we want to avoid OS-managed python packages.  I had, 
 in the past, considered that for something like numpy, we ought to make it 
 easy for an administrator to allow their own version of numpy to be used, 
 since numpy can be linked against a number of optimized libraries for 
 significant performance gains, and this generally won't happen for versions 
 installed from the toolshed unless the system already has stuff like 
 atlas-dev installed.  But I think we still allow admins that possibility 
 with reasonable ease since dependency management in Galaxy is not a 
 requirement.

 What we do want to avoid is the situation where someone clones a new copy of 
 Galaxy, wants to install 10 different tools that all depend on numpy, and 
 has to wait an hour while 10 versions of numpy compile.  Add that in with 
 other tools that will have a similar process (installing R + packages + rpy) 
 plus the hope that down the line you'll be able to automatically maintain 
 separate builds for remote resources that are not the same (i.e. multiple 
 clusters with differing operating systems) and this hopefully highlights why 
 I think reducing duplication where possible will be important.

 I also disagree we have not gained much. Setting up these repositories
 is a onerous, brittle process. This patch provides some high-level
 functionality for creating virtualenv's which negates the need for
 creating separate repositories per package.

 This is a good point.  I probably also sold short the benefit of being able 
 to install with pip, since this does indeed remove a similarly brittle and 
 tedious step of downloading and installing modules.

 --nate


 -John


 --nate

 On May 13, 2013, at 6:49 PM, John Chilton wrote:

 The proliferation of individual python package install definitions has
 continued and it has spread to some MSI managed tools. I worry about
 the tedium I will have to endure in the future if that becomes an
 established best practice :) so I have implemented the python version
 of what I had described in this thread:

 As patch:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b.patch
 Pretty version:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b

 I understand that there are going 

Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-05-15 Thread John Chilton
Hey Nate,

On Tue, May 14, 2013 at 8:40 AM, Nate Coraor n...@bx.psu.edu wrote:
 Hi John,

 A few of us in the lab here at Penn State actually discussed automatic 
 creation of virtualenvs for dependency installations a couple weeks ago.  
 This was in the context of Bjoern's request for supporting compile-time 
 dependencies.  I think it's a great idea, but there's a limitation that we'd 
 need to account for.

 If you're going to have frequently used and expensive to build libraries 
 (e.g. numpy, R + rpy) in dependency-only repositories and then have your 
 tool(s) depend on those repositories, the activate method won't work.  
 virtualenvs cannot depend on other virtualenvs or be active at the same time 
 as other virtualenvs.  We could work around it by setting PYTHONPATH in the 
 dependencies' env.sh like we do now.  But then, other than making 
 installation a bit easier (e.g. by allowing the use of pip), we have not 
 gained much.

I don't know what to make of your response. It seems like a no, but
the word no doesn't appear anywhere.

I don't know the particulars of rpy, but numpy installs fine via this
method and I see no problem with each application having its own copy
of numpy. I think relying on OS managed python packages for instance
is something of a bad practice, when developing and distributing
software I use virtualenvs for everything. I think that stand-alone
python defined packages in the tool shed are directly analogous to OS
managed packages.

I also disagree we have not gained much. Setting up these repositories
is a onerous, brittle process. This patch provides some high-level
functionality for creating virtualenv's which negates the need for
creating separate repositories per package.

-John


 --nate

 On May 13, 2013, at 6:49 PM, John Chilton wrote:

 The proliferation of individual python package install definitions has
 continued and it has spread to some MSI managed tools. I worry about
 the tedium I will have to endure in the future if that becomes an
 established best practice :) so I have implemented the python version
 of what I had described in this thread:

 As patch:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b.patch
 Pretty version:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b

 I understand that there are going to be differing opinions as to
 whether this is the best way forward but I thought I would give my
 position a better chance of succeeding by providing an implementation.

 Thanks for your consideration,
 -John


 On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock p.j.a.c...@googlemail.com 
 wrote:
 On Tue, Apr 16, 2013 at 2:46 PM, John Chilton chil...@msi.umn.edu wrote:
 Stepping back a little, is the right way to address Python
 dependencies?

 Looks like I missed this thread, hence:
 http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html

 I was a big advocate for inter-repository dependencies,
 but I think taking it to the level of individual python packages might
 be going too far - my thought was they were needed for big 100Mb
 programs and stuff like that.

 It should work but it is a lot of boilerplate for something which
 should be more automated.

 At the Java jar/Python library/Ruby gem
 level I think using some of the platform specific packaging stuff to
 creating isolated environments for each program might be a better way
 to go.

 I agree, the best way forward isn't obvious here, and it may make
 sense to have tailored solutions for Python, Perl, Java, R, Ruby,
 etc packages rather than the current Tool Shed package solution.

 I've like to be able to just continue to write this kind of thing in my
 tool XML files and have it actually taken care of (rather than ignored):

 requirements
 requirement type=python-modulenumpy/requirement
 requirement type=python-moduleBio/requirement
 /requirements

 Adding a version key would be sensible, handling min/max etc
 as per Python packaging norms.

 Peter
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-05-14 Thread Nate Coraor
Hi John,

A few of us in the lab here at Penn State actually discussed automatic creation 
of virtualenvs for dependency installations a couple weeks ago.  This was in 
the context of Bjoern's request for supporting compile-time dependencies.  I 
think it's a great idea, but there's a limitation that we'd need to account for.

If you're going to have frequently used and expensive to build libraries (e.g. 
numpy, R + rpy) in dependency-only repositories and then have your tool(s) 
depend on those repositories, the activate method won't work.  virtualenvs 
cannot depend on other virtualenvs or be active at the same time as other 
virtualenvs.  We could work around it by setting PYTHONPATH in the 
dependencies' env.sh like we do now.  But then, other than making installation 
a bit easier (e.g. by allowing the use of pip), we have not gained much.

--nate

On May 13, 2013, at 6:49 PM, John Chilton wrote:

 The proliferation of individual python package install definitions has
 continued and it has spread to some MSI managed tools. I worry about
 the tedium I will have to endure in the future if that becomes an
 established best practice :) so I have implemented the python version
 of what I had described in this thread:
 
 As patch:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b.patch
 Pretty version:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b
 
 I understand that there are going to be differing opinions as to
 whether this is the best way forward but I thought I would give my
 position a better chance of succeeding by providing an implementation.
 
 Thanks for your consideration,
 -John
 
 
 On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 On Tue, Apr 16, 2013 at 2:46 PM, John Chilton chil...@msi.umn.edu wrote:
 Stepping back a little, is the right way to address Python
 dependencies?
 
 Looks like I missed this thread, hence:
 http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
 
 I was a big advocate for inter-repository dependencies,
 but I think taking it to the level of individual python packages might
 be going too far - my thought was they were needed for big 100Mb
 programs and stuff like that.
 
 It should work but it is a lot of boilerplate for something which
 should be more automated.
 
 At the Java jar/Python library/Ruby gem
 level I think using some of the platform specific packaging stuff to
 creating isolated environments for each program might be a better way
 to go.
 
 I agree, the best way forward isn't obvious here, and it may make
 sense to have tailored solutions for Python, Perl, Java, R, Ruby,
 etc packages rather than the current Tool Shed package solution.
 
 I've like to be able to just continue to write this kind of thing in my
 tool XML files and have it actually taken care of (rather than ignored):
 
 requirements
 requirement type=python-modulenumpy/requirement
 requirement type=python-moduleBio/requirement
 /requirements
 
 Adding a version key would be sensible, handling min/max etc
 as per Python packaging norms.
 
 Peter
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-05-14 Thread Nate Coraor
On May 14, 2013, at 10:58 AM, John Chilton wrote:

 Hey Nate,
 
 On Tue, May 14, 2013 at 8:40 AM, Nate Coraor n...@bx.psu.edu wrote:
 Hi John,
 
 A few of us in the lab here at Penn State actually discussed automatic 
 creation of virtualenvs for dependency installations a couple weeks ago.  
 This was in the context of Bjoern's request for supporting compile-time 
 dependencies.  I think it's a great idea, but there's a limitation that we'd 
 need to account for.
 
 If you're going to have frequently used and expensive to build libraries 
 (e.g. numpy, R + rpy) in dependency-only repositories and then have your 
 tool(s) depend on those repositories, the activate method won't work.  
 virtualenvs cannot depend on other virtualenvs or be active at the same time 
 as other virtualenvs.  We could work around it by setting PYTHONPATH in the 
 dependencies' env.sh like we do now.  But then, other than making 
 installation a bit easier (e.g. by allowing the use of pip), we have not 
 gained much.
 
 I don't know what to make of your response. It seems like a no, but
 the word no doesn't appear anywhere.

Sorry about being wishy-washy.  Unless anyone has any objections or can foresee 
other problems, I would say yes to this.  But I believe it should not break the 
concept of common-dependency-only repositories.

I'm pretty sure that as long as the process of creating a venv also adds the 
venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem 
should be automatically dealt with.

 I don't know the particulars of rpy, but numpy installs fine via this
 method and I see no problem with each application having its own copy
 of numpy. I think relying on OS managed python packages for instance
 is something of a bad practice, when developing and distributing
 software I use virtualenvs for everything. I think that stand-alone
 python defined packages in the tool shed are directly analogous to OS
 managed packages.

Completely agree that we want to avoid OS-managed python packages.  I had, in 
the past, considered that for something like numpy, we ought to make it easy 
for an administrator to allow their own version of numpy to be used, since 
numpy can be linked against a number of optimized libraries for significant 
performance gains, and this generally won't happen for versions installed from 
the toolshed unless the system already has stuff like atlas-dev installed.  But 
I think we still allow admins that possibility with reasonable ease since 
dependency management in Galaxy is not a requirement.

What we do want to avoid is the situation where someone clones a new copy of 
Galaxy, wants to install 10 different tools that all depend on numpy, and has 
to wait an hour while 10 versions of numpy compile.  Add that in with other 
tools that will have a similar process (installing R + packages + rpy) plus the 
hope that down the line you'll be able to automatically maintain separate 
builds for remote resources that are not the same (i.e. multiple clusters with 
differing operating systems) and this hopefully highlights why I think reducing 
duplication where possible will be important.

 I also disagree we have not gained much. Setting up these repositories
 is a onerous, brittle process. This patch provides some high-level
 functionality for creating virtualenv's which negates the need for
 creating separate repositories per package.

This is a good point.  I probably also sold short the benefit of being able to 
install with pip, since this does indeed remove a similarly brittle and tedious 
step of downloading and installing modules.

--nate

 
 -John
 
 
 --nate
 
 On May 13, 2013, at 6:49 PM, John Chilton wrote:
 
 The proliferation of individual python package install definitions has
 continued and it has spread to some MSI managed tools. I worry about
 the tedium I will have to endure in the future if that becomes an
 established best practice :) so I have implemented the python version
 of what I had described in this thread:
 
 As patch:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b.patch
 Pretty version:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b
 
 I understand that there are going to be differing opinions as to
 whether this is the best way forward but I thought I would give my
 position a better chance of succeeding by providing an implementation.
 
 Thanks for your consideration,
 -John
 
 
 On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock p.j.a.c...@googlemail.com 
 wrote:
 On Tue, Apr 16, 2013 at 2:46 PM, John Chilton chil...@msi.umn.edu wrote:
 Stepping back a little, is the right way to address Python
 dependencies?
 
 Looks like I missed this thread, hence:
 http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html
 
 I was a big advocate for inter-repository dependencies,
 but I think taking it to the level of individual python packages might
 be going too far - my 

Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-05-14 Thread Nate Coraor
Greg created the following card, and I'm working on a few changes to your 
commit:

https://trello.com/card/toolshed-consider-enhancing-tool-dependency-definition-framework-per-john-chilton-s-pull-request/506338ce32ae458f6d15e4b3/848

Thanks,
--nate

On May 14, 2013, at 1:45 PM, Nate Coraor wrote:

 On May 14, 2013, at 10:58 AM, John Chilton wrote:
 
 Hey Nate,
 
 On Tue, May 14, 2013 at 8:40 AM, Nate Coraor n...@bx.psu.edu wrote:
 Hi John,
 
 A few of us in the lab here at Penn State actually discussed automatic 
 creation of virtualenvs for dependency installations a couple weeks ago.  
 This was in the context of Bjoern's request for supporting compile-time 
 dependencies.  I think it's a great idea, but there's a limitation that 
 we'd need to account for.
 
 If you're going to have frequently used and expensive to build libraries 
 (e.g. numpy, R + rpy) in dependency-only repositories and then have your 
 tool(s) depend on those repositories, the activate method won't work.  
 virtualenvs cannot depend on other virtualenvs or be active at the same 
 time as other virtualenvs.  We could work around it by setting PYTHONPATH 
 in the dependencies' env.sh like we do now.  But then, other than making 
 installation a bit easier (e.g. by allowing the use of pip), we have not 
 gained much.
 
 I don't know what to make of your response. It seems like a no, but
 the word no doesn't appear anywhere.
 
 Sorry about being wishy-washy.  Unless anyone has any objections or can 
 foresee other problems, I would say yes to this.  But I believe it should not 
 break the concept of common-dependency-only repositories.
 
 I'm pretty sure that as long as the process of creating a venv also adds the 
 venv's site-packages to PYTHONPATH in that dependency's env.sh, the problem 
 should be automatically dealt with.
 
 I don't know the particulars of rpy, but numpy installs fine via this
 method and I see no problem with each application having its own copy
 of numpy. I think relying on OS managed python packages for instance
 is something of a bad practice, when developing and distributing
 software I use virtualenvs for everything. I think that stand-alone
 python defined packages in the tool shed are directly analogous to OS
 managed packages.
 
 Completely agree that we want to avoid OS-managed python packages.  I had, in 
 the past, considered that for something like numpy, we ought to make it easy 
 for an administrator to allow their own version of numpy to be used, since 
 numpy can be linked against a number of optimized libraries for significant 
 performance gains, and this generally won't happen for versions installed 
 from the toolshed unless the system already has stuff like atlas-dev 
 installed.  But I think we still allow admins that possibility with 
 reasonable ease since dependency management in Galaxy is not a requirement.
 
 What we do want to avoid is the situation where someone clones a new copy of 
 Galaxy, wants to install 10 different tools that all depend on numpy, and has 
 to wait an hour while 10 versions of numpy compile.  Add that in with other 
 tools that will have a similar process (installing R + packages + rpy) plus 
 the hope that down the line you'll be able to automatically maintain separate 
 builds for remote resources that are not the same (i.e. multiple clusters 
 with differing operating systems) and this hopefully highlights why I think 
 reducing duplication where possible will be important.
 
 I also disagree we have not gained much. Setting up these repositories
 is a onerous, brittle process. This patch provides some high-level
 functionality for creating virtualenv's which negates the need for
 creating separate repositories per package.
 
 This is a good point.  I probably also sold short the benefit of being able 
 to install with pip, since this does indeed remove a similarly brittle and 
 tedious step of downloading and installing modules.
 
 --nate
 
 
 -John
 
 
 --nate
 
 On May 13, 2013, at 6:49 PM, John Chilton wrote:
 
 The proliferation of individual python package install definitions has
 continued and it has spread to some MSI managed tools. I worry about
 the tedium I will have to endure in the future if that becomes an
 established best practice :) so I have implemented the python version
 of what I had described in this thread:
 
 As patch:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b.patch
 Pretty version:
 https://github.com/jmchilton/galaxy-central/commit/161d3b288016077a99fb7196b6e08fe7d690f34b
 
 I understand that there are going to be differing opinions as to
 whether this is the best way forward but I thought I would give my
 position a better chance of succeeding by providing an implementation.
 
 Thanks for your consideration,
 -John
 
 
 On Wed, Apr 17, 2013 at 3:56 PM, Peter Cock p.j.a.c...@googlemail.com 
 wrote:
 On Tue, Apr 16, 2013 at 2:46 PM, John Chilton chil...@msi.umn.edu wrote:
 Stepping 

Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-04-17 Thread Peter Cock
On Tue, Apr 16, 2013 at 2:46 PM, John Chilton chil...@msi.umn.edu wrote:
 Stepping back a little, is the right way to address Python
 dependencies?

Looks like I missed this thread, hence:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-April/014169.html

 I was a big advocate for inter-repository dependencies,
 but I think taking it to the level of individual python packages might
 be going too far - my thought was they were needed for big 100Mb
 programs and stuff like that.

It should work but it is a lot of boilerplate for something which
should be more automated.

 At the Java jar/Python library/Ruby gem
 level I think using some of the platform specific packaging stuff to
 creating isolated environments for each program might be a better way
 to go.

I agree, the best way forward isn't obvious here, and it may make
sense to have tailored solutions for Python, Perl, Java, R, Ruby,
etc packages rather than the current Tool Shed package solution.

I've like to be able to just continue to write this kind of thing in my
tool XML files and have it actually taken care of (rather than ignored):

requirements
 requirement type=python-modulenumpy/requirement
 requirement type=python-moduleBio/requirement
/requirements

Adding a version key would be sensible, handling min/max etc
as per Python packaging norms.

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-04-16 Thread Björn Grüning
Hi Greg.

  
  If numpy is not required for compiling matplotlib components (i.e., 
  matplotlib components just use numpy after installation), 
  then you should be able to make this work using a complex repository 
  dependency for numpy in your tool_dependencies.xml definition for 
  matplotlib.  
  The discussion for doing this is at 
  http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_repository_dependencies:_tool_dependency_definitions_that_contain_repository_dependency_definitions
  
  Thanks! But it is required at compile time. 
 
 Ok, we may need to do a bit of work to support this requirement, but I'm not 
 quite sure. 
 What I've described to you should still be your approach, but we'll need to 
 ensure that the package_numpy_1_7_1 repository is installed before the 
 package_matplotlib_1_2_1 is installed. 
 Guaranteeing this is not currently possible, but this is a feature  am hoping 
 to have available this week. This is a feature that Ira Cooke has needed for 
 his repositories. 
 When the feature is available, it will support an attribute named 
 prior_installation_required in the repository tag, so this tag will look 
 something like:
 repository toolshed=www name=xxx owner=yyy changeset_revision=zzz 
 prior_installation_required=True /

Such a tag would also ensure that we do not end up in a dependency-loop
right? 

 What this will do is skip installation of the repository that contains this 
 dependency until the repository that is associated with the 
 prior_installation_required 
 attribute is installed (unless that repository is not in the current list of 
 repositories being installed).
 
 What I think still needs to be worked out is how to ensure that the 
 tool_dependencies.xml definition that installs the matplotlib package 
 will find the previously installed numpy binary during compilation of 
 matplotlib. 
 Currently, the numpy binary will only be available to the installed and 
 compiled matplotlib binary. 
 I'll create a Trello card for this and let you know an estimate of when it 
 will be available.

I already created a trello card after talking with InitHello in IRC.
https://trello.com/c/QTeSmNSs

My idea would be to populate all env.sh scripts associated from all
repository toolshed=www  tags during the execution of 
action type=shell_command/action commands.

The tool author can add the ./lib/ folder to LD_LIBRARY_PATH and can use
it in any compile-time depending program, as long as repository
toolshed=www name=dep_with_populated_LD_LIBRARY_PATH is included.

 
  
  By the way,
  
  I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 
  repository on the test tool shed includes the following contents. 
  Is this the repository you are working with?  Strangely, the repository 
  dependency should be invalid because it should not be possible 
  for a repository to define a dependency upon any revision of itself.  You 
  may have uncovered a way to do this using a tool dependency 
  definition with a complex repository dependency.  I'll look into this and 
  make sure to provide a fix for the scenario you used.
  
  Oh, ok. in revision 3 of both packages you should see, what I was
  trying.
 
 Ok, revision 3 looks good as long as it correctly installs and compiles numpy.
 
 
  
  Instead of the above approach, your approach here should be to include 
  only the tool_dependencies.cml definition file for installing only numpy 
  version 1.7.1 in a r
  epository named package_numpy_1_7_1 (use the full version in naming the 
  repository).  You should create a separate repository named 
  package_matplotlib_1_2_1 that 
  similarly contains a single tool_dependencies.xml file that (in addition 
  to defining how to install and compile mtplotlib) defines a complex 
  repository dependency 
  on the package_numpy_1_7_1 repository as described in the wiki at the link 
  above.  
  
  This approach creates 2 separate orphan tool dependencies, the second of 
  which (matplotlib) has a complex repository dependency on the first 
  (numpy).  
  When you install the package_matplotlib_1_2_1 repository and check the box 
  for handling tool dependencies during the installation, it will install 
  the package_numpy_1_7_1 
  repository and create a pointer to the numpy binary in the env.sh file 
  within the package_matplotlib_1_2_1 repository environment.  This enables 
  matplotlib to locate the required version of numpy.
  
  I know this is a bit tricky, so please let me know if it still does not 
  make sense.
  
  Lets see if I got it right.
  
  repository_dependencies.xml will be pared first. The defined repo's and
  the included and populated system variables will be available in
  tool_dependencies.xml, which is parsed afterwards. Is that correct?
 
 I'm not quite sure I understand your statements above, but I've looked at 
 revision 3 of your package_matplotlib_1_2_1 repository and the 
 tool_dependencies.xml definition looks good (with the 

Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-04-16 Thread John Chilton
Stepping back a little, is the right way to address Python
dependencies? I was a big advocate for inter-repository dependencies,
but I think taking it to the level of individual python packages might
be going too far - my thought was they were needed for big 100Mb
programs and stuff like that. At the Java jar/Python library/Ruby gem
level I think using some of the platform specific packaging stuff to
creating isolated environments for each program might be a better way
to go.

Brad, Enis, and I came up with this idea to use virtualenv to
automatically create environments for Galaxy tools in CloudBioLinux
based on a requirements file and then activating that environment in
the tool's env.sh file.

https://github.com/chapmanb/cloudbiolinux/commit/0e4489275bba2e8f77e1218e3cc1604afbfb559d

It would be easier for tool authors if they could just say here is a
requirements.txt file and have the Python environment automatically
created or here is a Gemfile and use rvm+bundler to automatically
configure a Ruby environment.

Thanks,
-John


On Tue, Apr 16, 2013 at 4:50 AM, Björn Grüning
bjoern.gruen...@pharmazie.uni-freiburg.de wrote:
 Hi Greg.

 
  If numpy is not required for compiling matplotlib components (i.e., 
  matplotlib components just use numpy after installation),
  then you should be able to make this work using a complex repository 
  dependency for numpy in your tool_dependencies.xml definition for 
  matplotlib.
  The discussion for doing this is at 
  http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_repository_dependencies:_tool_dependency_definitions_that_contain_repository_dependency_definitions
 
  Thanks! But it is required at compile time.

 Ok, we may need to do a bit of work to support this requirement, but I'm not 
 quite sure.
 What I've described to you should still be your approach, but we'll need to 
 ensure that the package_numpy_1_7_1 repository is installed before the 
 package_matplotlib_1_2_1 is installed.
 Guaranteeing this is not currently possible, but this is a feature  am 
 hoping to have available this week. This is a feature that Ira Cooke has 
 needed for his repositories.
 When the feature is available, it will support an attribute named 
 prior_installation_required in the repository tag, so this tag will look 
 something like:
 repository toolshed=www name=xxx owner=yyy changeset_revision=zzz 
 prior_installation_required=True /

 Such a tag would also ensure that we do not end up in a dependency-loop
 right?

 What this will do is skip installation of the repository that contains this 
 dependency until the repository that is associated with the 
 prior_installation_required
 attribute is installed (unless that repository is not in the current list of 
 repositories being installed).

 What I think still needs to be worked out is how to ensure that the 
 tool_dependencies.xml definition that installs the matplotlib package
 will find the previously installed numpy binary during compilation of 
 matplotlib.
 Currently, the numpy binary will only be available to the installed and 
 compiled matplotlib binary.
 I'll create a Trello card for this and let you know an estimate of when it 
 will be available.

 I already created a trello card after talking with InitHello in IRC.
 https://trello.com/c/QTeSmNSs

 My idea would be to populate all env.sh scripts associated from all
 repository toolshed=www  tags during the execution of
 action type=shell_command/action commands.

 The tool author can add the ./lib/ folder to LD_LIBRARY_PATH and can use
 it in any compile-time depending program, as long as repository
 toolshed=www name=dep_with_populated_LD_LIBRARY_PATH is included.


 
  By the way,
 
  I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 
  repository on the test tool shed includes the following contents.
  Is this the repository you are working with?  Strangely, the repository 
  dependency should be invalid because it should not be possible
  for a repository to define a dependency upon any revision of itself.  You 
  may have uncovered a way to do this using a tool dependency
  definition with a complex repository dependency.  I'll look into this and 
  make sure to provide a fix for the scenario you used.
 
  Oh, ok. in revision 3 of both packages you should see, what I was
  trying.

 Ok, revision 3 looks good as long as it correctly installs and compiles 
 numpy.


 
  Instead of the above approach, your approach here should be to include 
  only the tool_dependencies.cml definition file for installing only numpy 
  version 1.7.1 in a r
  epository named package_numpy_1_7_1 (use the full version in naming the 
  repository).  You should create a separate repository named 
  package_matplotlib_1_2_1 that
  similarly contains a single tool_dependencies.xml file that (in addition 
  to defining how to install and compile mtplotlib) defines a complex 
  repository dependency
  on the package_numpy_1_7_1 repository as described in 

[galaxy-dev] tool_dependencies inside tool_dependencies

2013-04-15 Thread Björn Grüning
Hi,

is there a general rule to handle dependencies inside of
tool_dependencies.xml? 

Lets assume I write a matplotlib orphan tool_dependencies.xml file.
matplotlib depends on numpy. Numpy has already a orphan definition.

Is there a way to include numpy as dependency inside the
matplotlib-definition, so that I did not need to fetch and compile numpy
inside of matplotlib?

I tried to specify it beforehand but that did not work.

Thanks!
Bjoern

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-04-15 Thread Björn Grüning
Hi Greg,

 If numpy is not required for compiling matplotlib components (i.e., 
 matplotlib components just use numpy after installation), 
 then you should be able to make this work using a complex repository 
 dependency for numpy in your tool_dependencies.xml definition for matplotlib. 
  
 The discussion for doing this is at 
 http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_repository_dependencies:_tool_dependency_definitions_that_contain_repository_dependency_definitions

Thanks! But it is required at compile time. 

 By the way,
 
 I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 repository 
 on the test tool shed includes the following contents. 
 Is this the repository you are working with?  Strangely, the repository 
 dependency should be invalid because it should not be possible 
 for a repository to define a dependency upon any revision of itself.  You may 
 have uncovered a way to do this using a tool dependency 
 definition with a complex repository dependency.  I'll look into this and 
 make sure to provide a fix for the scenario you used.

Oh, ok. in revision 3 of both packages you should see, what I was
trying.

 Instead of the above approach, your approach here should be to include only 
 the tool_dependencies.cml definition file for installing only numpy version 
 1.7.1 in a r
 epository named package_numpy_1_7_1 (use the full version in naming the 
 repository).  You should create a separate repository named 
 package_matplotlib_1_2_1 that 
 similarly contains a single tool_dependencies.xml file that (in addition to 
 defining how to install and compile mtplotlib) defines a complex repository 
 dependency 
 on the package_numpy_1_7_1 repository as described in the wiki at the link 
 above.  
 
 This approach creates 2 separate orphan tool dependencies, the second of 
 which (matplotlib) has a complex repository dependency on the first (numpy).  
 When you install the package_matplotlib_1_2_1 repository and check the box 
 for handling tool dependencies during the installation, it will install the 
 package_numpy_1_7_1 
 repository and create a pointer to the numpy binary in the env.sh file within 
 the package_matplotlib_1_2_1 repository environment.  This enables matplotlib 
 to locate the required version of numpy.
 
 I know this is a bit tricky, so please let me know if it still does not make 
 sense.

Lets see if I got it right.

repository_dependencies.xml will be pared first. The defined repo's and
the included and populated system variables will be available in
tool_dependencies.xml, which is parsed afterwards. Is that correct?

I will try that. Thanks!
Bjoern

 Thanks very much,
 
 
 Greg Von Kuster
 
 
 On Apr 15, 2013, at 3:29 PM, Björn Grüning wrote:
 
  Hi,
  
  is there a general rule to handle dependencies inside of
  tool_dependencies.xml? 
  
  Lets assume I write a matplotlib orphan tool_dependencies.xml file.
  matplotlib depends on numpy. Numpy has already a orphan definition.
  
  Is there a way to include numpy as dependency inside the
  matplotlib-definition, so that I did not need to fetch and compile numpy
  inside of matplotlib?
  
  I tried to specify it beforehand but that did not work.
  
  Thanks!
  Bjoern
  
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
  
  To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] tool_dependencies inside tool_dependencies

2013-04-15 Thread Greg Von Kuster
Hi Björn,

On Apr 15, 2013, at 6:31 PM, Björn Grüning wrote:

 Hi Greg,
 
 If numpy is not required for compiling matplotlib components (i.e., 
 matplotlib components just use numpy after installation), 
 then you should be able to make this work using a complex repository 
 dependency for numpy in your tool_dependencies.xml definition for 
 matplotlib.  
 The discussion for doing this is at 
 http://wiki.galaxyproject.org/DefiningRepositoryDependencies#Complex_repository_dependencies:_tool_dependency_definitions_that_contain_repository_dependency_definitions
 
 Thanks! But it is required at compile time. 

Ok, we may need to do a bit of work to support this requirement, but I'm not 
quite sure.  What I've described to you should still be your approach, but 
we'll need to ensure that the package_numpy_1_7_1 repository is installed 
before the package_matplotlib_1_2_1 is installed.  Guaranteeing this is not 
currently possible, but this is a feature I am hoping to have available this 
week.  This is a feature that Ira Cooke has needed for his repositories.  When 
the feature is available, it will support an attribute named 
prior_installation_required in the repository tag, so this tag will look 
something like:

repository toolshed=www name=xxx owner=yyy changeset_revision=zzz 
prior_installation_required=True /

What this will do is skip installation of the repository that contains this 
dependency until the repository that is associated with the 
prior_installation_required attribute is installed (unless that repository is 
not in the current list of repositories being installed).

What I think still needs to be worked out is how to ensure that the 
tool_dependencies.xml definition that installs the matplotlib package will find 
the previously installed numpy binary during compilation of matplotlib.  
Currently, the numpy binary will only be available to the installed and 
compiled matplotlib binary.  I'll create a Trello card for this and let you 
know an estimate of when it will be available.


 
 By the way,
 
 I noticed that revision 2:c5fbe4aa5a74 of your package_numpy_1_7 repository 
 on the test tool shed includes the following contents. 
 Is this the repository you are working with?  Strangely, the repository 
 dependency should be invalid because it should not be possible 
 for a repository to define a dependency upon any revision of itself.  You 
 may have uncovered a way to do this using a tool dependency 
 definition with a complex repository dependency.  I'll look into this and 
 make sure to provide a fix for the scenario you used.
 
 Oh, ok. in revision 3 of both packages you should see, what I was
 trying.

Ok, revision 3 looks good as long as it correctly installs and compiles numpy.


 
 Instead of the above approach, your approach here should be to include only 
 the tool_dependencies.cml definition file for installing only numpy version 
 1.7.1 in a r
 epository named package_numpy_1_7_1 (use the full version in naming the 
 repository).  You should create a separate repository named 
 package_matplotlib_1_2_1 that 
 similarly contains a single tool_dependencies.xml file that (in addition to 
 defining how to install and compile mtplotlib) defines a complex repository 
 dependency 
 on the package_numpy_1_7_1 repository as described in the wiki at the link 
 above.  
 
 This approach creates 2 separate orphan tool dependencies, the second of 
 which (matplotlib) has a complex repository dependency on the first (numpy). 
  
 When you install the package_matplotlib_1_2_1 repository and check the box 
 for handling tool dependencies during the installation, it will install the 
 package_numpy_1_7_1 
 repository and create a pointer to the numpy binary in the env.sh file 
 within the package_matplotlib_1_2_1 repository environment.  This enables 
 matplotlib to locate the required version of numpy.
 
 I know this is a bit tricky, so please let me know if it still does not make 
 sense.
 
 Lets see if I got it right.
 
 repository_dependencies.xml will be pared first. The defined repo's and
 the included and populated system variables will be available in
 tool_dependencies.xml, which is parsed afterwards. Is that correct?

I'm not quite sure I understand your statements above, but I've looked at 
revision 3 of your package_matplotlib_1_2_1 repository and the 
tool_dependencies.xml definition looks good (with the exception of the 
currently unsupported prior_installation_required attribute), so I think 
you've successfully deciphered my documentation.

I'll make sure to keep you informed as I make progress on the missing pieces 
that will support what you need this week.


 
 I will try that. Thanks!
 Bjoern
 
 Thanks very much,
 
 
 Greg Von Kuster
 
 
 On Apr 15, 2013, at 3:29 PM, Björn Grüning wrote:
 
 Hi,
 
 is there a general rule to handle dependencies inside of
 tool_dependencies.xml? 
 
 Lets assume I write a matplotlib orphan tool_dependencies.xml file.
 matplotlib depends on