Re: [galaxy-dev] What is the correct place under Galaxy for a database that's created by a tool?

2014-06-17 Thread Björn Grüning

Hi Melissa,

Am 17.06.2014 01:07, schrieb Melissa Cline:

Hi folks,

Hopefully this is a quick question.  I'm working on a set of tools that
will fire off a VM from within Galaxy and will then communicate with the
VM.  The VM will create a local database.


Are we talking here about a local Galaxy database or a tool specific 
database?


best,
Bjoern

 The vision is that this won't be

a shared database; in a shared Galaxy instance, each user will have his or
her own database.  What is the best place to create this database under the
Galaxy file system?

Thanks!

Melissa



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Using Mesos to Enable distributed computing under Galaxy?

2014-06-17 Thread Kyle Ellrott
Glad to see someone else is playing around with Mesos.
I have a mesos branch that is getting a little long in the tooth. I'd like
to get a straight job runner (non-LWR, with a shared file system) running
under mesos for Galaxy before I submit that work for a pull request.

The hackathon is only 12 days away! Hopefully we'll be able to make some
progress on these sorts of projects.

Kyle



On Sun, Jun 15, 2014 at 4:06 PM, John Chilton jmchil...@gmail.com wrote:

 Hey Kyle, all,

   If anyone wants to play with running Galaxy jobs within an Apache
 Mesos environment I have added a prototype of this feature to the LWR.


 https://bitbucket.org/jmchilton/lwr/commits/555438d2fe266899338474b25c540fef42bcece7

 https://bitbucket.org/jmchilton/lwr/commits/9748b3035dbe3802d4136a6a1028df8395a9aeb3

 This work distributes jobs across a Mesos cluster and injects a
 MESOS_URL environment variable into the job runtime environment in
 case the jobs themselves want to take advantage of Mesos.

 The advantage of the LWR versus a traditional Galaxy runner is that
 the job can be staged to remote resources without shared disk. Prior
 to this I was imaging the LWR to be useful in cases where Galaxy and
 remote cluster don't share common disk but where there is in fact a
 shared scratch directory or something across the remote cluster as
 well a resource manager. The LWR Mesos framework however has the
 actual compute servers themselves stage the job up and down - so you
 could imagine distributing Galaxy across large clusters without any
 shared disk whatsoever - that could be very cool and help scale say
 cloud applications.

 Downsides of an LWR-based approach versus a Galaxy approach is that it
 is less mature and there is more stuff to configure - need to
 configure a Galaxy job_conf plugin and destination, need to configure
 the LWR itself, need to configure a message queue (for this variant of
 LWR operation anyway - it should be possible to drive this via the LWR
 in web server mode but I haven't added it yet). I would be more than
 happy to continue to see progress toward Mesos support in Galaxy
 proper.

 It is strictly a prototype so far - a sort of playground if anyone
 wants to play with these ideas and build something cool. It really is
 a framework right - not so much a job scheduler so I am not sure it
 is very immediately useful - but I imagine one could build cool stuff
 on top of it.

 Next, I think I would like to add Apache Aurora
 (http://aurora.incubator.apache.org/) support - because it seems like
 a much more traditional resource manager but built on top of Mesos so
 it would be more practical for traditional Galaxy-style jobs. Doesn't
 buy you anything in terms of parallelization but it would fit better
 with Galaxy.

 -John


 On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott kellr...@soe.ucsc.edu
 wrote:
  I think one of the aspects where Galaxy is a bit soft is the ability to
 do
  distributed tasks. The current system of split/replicate/merge tasks
 based
  on file type is a bit limited and hard for tool developers to expand
 upon.
  Distributed computing is a non-trival thing to implement and I think it
  would be a better use of our time to use an already existing framework.
 And
  it would also mean one less API for tool writers to have to develop for.
  I was wondering if anybody has looked at Mesos (
 http://mesos.apache.org/ ).
  You can see an overview of the Mesos architecture at
  https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
  The important thing about Mesos is that it provides an API for C/C++,
  Java/Scala and Python to write distributed frameworks. There are already
  implementations of frameworks for common parallel programming systems
 such
  as:
   - Hadoop (https://github.com/mesos/hadoop)
   - MPI
  (
 https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md
 )
   - Spark (http://spark-project.org)
  And you can find example Python framework at
  https://github.com/apache/mesos/tree/master/src/examples/python
 
  Integration with Galaxy would have three parts:
  1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
  passed to tool wrappers and allows them to contact the local mesos
  infrastructure (assuming the system has been configured) or pass a null
 if
  the system isn't available.
  2) Write a tool runner that works as a mesos framework to executes single
  cpu jobs on the distributed system.
  3) For instances where mesos is not available at a system wide level (say
  they only have access to an SGE based cluster), but the user wants to run
  distributed jobs, write a wrapper that can create a mesos cluster using
 the
  existing queueing system. For example, right now I run a Mesos system
 under
  the SGE queue system.
 
  I'm curious to see what other people think.
 
  Kyle
 
  ___
  Please keep all replies on the list by using reply all
  in your 

Re: [galaxy-dev] R bioconductor dependencies when creating toolshed installation

2014-06-17 Thread Stef van Lieshout
Say option 3 is the way to go, would you say every new version of an R
package should be wrapped in a new galaxy package (and give them names
like matrixStats_0_10_0) or create one package (matrixStats) and
update that one if a new version is worth an update. In the first way
there would be an enormous amount of packages ;)

Also if you do need an external R script as you say, how would I
construct my tool_dependencies.xml to execute R code?

And last, if that approach doesn't work out for me, how can copy a file
in the repository to the installation dir? (to execute it with Rscript)

Many thanks,
Stef


- Original message -
From: Kandalaft, Iyad iyad.kandal...@agr.gc.ca
To: Stef van Lieshout stefvanliesh...@fastmail.fm,
galaxy-dev@lists.bx.psu.edu galaxy-dev@lists.bx.psu.edu
Subject: RE: [galaxy-dev] R bioconductor dependencies when creating
toolshed installation
Date: Mon, 16 Jun 2014 18:19:46 +

I would typically recommend Option 3 as it is the best practice. 
However, human resources limit this as a viable option even though this
should be the Gold Standard that you aim for.  This allows you to
reuse the dependencies later for other tool wrappers AND you don't have
to re-install dependencies every time you make a modification to your
tool wrapper repository.  While briefly looking at Bioconductor, it
seems that they keep old version of packages (ex:
http://www.bioconductor.org/packages/2.13/data/experiment/bin/windows/contrib/3.0/AmpAffyExample_1.2.13.zip),
where using the URLs directly might be advantageous if their BiocLite
doesn't allow you to define which version to install.  You don't
necessarily need to have an external R script for the installation
because many of these commands can be done within the
tool_dependencies.xml.

Regards,


Iyad Kandalaft
Microbial Biodiversity Bioinformatics
Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
960 Carling Ave.| 960 Ave. Carling
Ottawa, ON| Ottawa (ON) K1A 0C6
E-mail Address / Adresse courriel  iyad.kandal...@agr.gc.ca
Telephone | Téléphone 613-759-1228
Facsimile | Télécopieur 613-759-1701
Teletypewriter | Téléimprimeur 613-773-2600
Government of Canada | Gouvernement du Canada 



-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Stef van
Lieshout
Sent: Monday, June 16, 2014 10:04 AM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] R bioconductor dependencies when creating toolshed
installation

Hi all,

I'm running into some difficulties on how to setup the installation
procedure for a galaxy tool which executes an R script and has certain
dependencies (mainly bioconductor packages). R can deal with
dependencies, packages can be installed with install.packages (has a
dependencies argument) or biocLite() for bioconductor packages.

Yet, now I want my tool to be available at toolsheds. To do this I see
several options:

1) setting up tool_dependencies.xml with R CMD INSTALL for all
packages. BUT: need to download all dependencies before install, and can
older versions still be downloaded? Maybe need to upload them to
toolshed too..

2) setting up tool_dependencies.xml to call an installation script with
Rscript (where I could use install.packages), BUT: Dependencies are
taken care of. But how do I select specific (older) versions, because if
I dont, installing at different time can give different version.

3) creating a repository for each package and have all of them as
requirement in my galaxy tool. BUT: a lot of work for a lot of
dependencies

All have pros and cons, how do people deal with this?

Stef
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this and other
Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] R version 3.1.0

2014-06-17 Thread Stef van Lieshout
Any plans to create a R 3.1.0 repository?

Thanks,
Stef

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] R bioconductor dependencies when creating toolshed installation

2014-06-17 Thread Björn Grüning

Hi Stef,

for R packages we have a special installation routine that will 
(hapefully) make your life easier.



I'm running into some difficulties on how to setup the installation
procedure for a galaxy tool which executes an R script and has certain
dependencies (mainly bioconductor packages). R can deal with
dependencies, packages can be installed with install.packages (has a
dependencies argument) or biocLite() for bioconductor packages.

Yet, now I want my tool to be available at toolsheds. To do this I see
several options:


Great!


1) setting up tool_dependencies.xml with R CMD INSTALL for all
packages. BUT: need to download all dependencies before install, and can
older versions still be downloaded? Maybe need to upload them to
toolshed too..


It is all a matter of how reproducible you want to have your tool.
If you want 100% reproducibility, you need to mirror the source packages 
somehow, because bioc will not store older versions. At least that is 
not guaranteed.


I'm using a special github repository for that purpose:
https://github.com/bgruening/download_store

R CMD INSTALL is not needed, see below.


2) setting up tool_dependencies.xml to call an installation script with
Rscript (where I could use install.packages), BUT: Dependencies are
taken care of. But how do I select specific (older) versions, because if
I dont, installing at different time can give different version.


Older versions is not possible as far as I know.


3) creating a repository for each package and have all of them as
requirement in my galaxy tool. BUT: a lot of work for a lot of
dependencies


Imho, we should have one R repository with a handful of standard 
packages included in the toolshed.
Like packages_r_3_0_1. You should depend on that repository and 
additionally define one second dependency. Lets say your tool is called 
deseq2 than create one additional tool_dependencies.xml file called 
package_deseq2_1_2_10. In that definition you will install every 
dependency you need in addition to R.


Here is one example:
https://github.com/bgruening/galaxytools/blob/master/deseq2/tool_dependencies.xml
https://github.com/bgruening/galaxytools/blob/master/orphan_tool_dependencies/package_deseq2_1_2_10/tool_dependencies.xml

The really nice part is the setup_r_environment function from the 
toolshed. It will install source packages for you automatically. All you 
need to do is to name the package or, as shown in the example, specify 
the location of the source package.


The only downside is that the order of these packages is important. If 
you are interested we have a script that will give you the correct 
dependency tree of a given package.


Hope that helps,
Bjoern



All have pros and cons, how do people deal with this?

Stef
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] R bioconductor dependencies when creating toolshed installation

2014-06-17 Thread Björn Grüning

Hi Stef,

fortunately it is way easier than that. Please have a look at the 
setup_r_environment installation routine :)


Cheers,
Bjoern

Am 17.06.2014 11:07, schrieb Stef van Lieshout:

Say option 3 is the way to go, would you say every new version of an R
package should be wrapped in a new galaxy package (and give them names
like matrixStats_0_10_0) or create one package (matrixStats) and
update that one if a new version is worth an update. In the first way
there would be an enormous amount of packages ;)

Also if you do need an external R script as you say, how would I
construct my tool_dependencies.xml to execute R code?

And last, if that approach doesn't work out for me, how can copy a file
in the repository to the installation dir? (to execute it with Rscript)

Many thanks,
Stef


- Original message -
From: Kandalaft, Iyad iyad.kandal...@agr.gc.ca
To: Stef van Lieshout stefvanliesh...@fastmail.fm,
galaxy-dev@lists.bx.psu.edu galaxy-dev@lists.bx.psu.edu
Subject: RE: [galaxy-dev] R bioconductor dependencies when creating
toolshed installation
Date: Mon, 16 Jun 2014 18:19:46 +

I would typically recommend Option 3 as it is the best practice.
However, human resources limit this as a viable option even though this
should be the Gold Standard that you aim for.  This allows you to
reuse the dependencies later for other tool wrappers AND you don't have
to re-install dependencies every time you make a modification to your
tool wrapper repository.  While briefly looking at Bioconductor, it
seems that they keep old version of packages (ex:
http://www.bioconductor.org/packages/2.13/data/experiment/bin/windows/contrib/3.0/AmpAffyExample_1.2.13.zip),
where using the URLs directly might be advantageous if their BiocLite
doesn't allow you to define which version to install.  You don't
necessarily need to have an external R script for the installation
because many of these commands can be done within the
tool_dependencies.xml.

Regards,


Iyad Kandalaft
Microbial Biodiversity Bioinformatics
Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
960 Carling Ave.| 960 Ave. Carling
Ottawa, ON| Ottawa (ON) K1A 0C6
E-mail Address / Adresse courriel  iyad.kandal...@agr.gc.ca
Telephone | Téléphone 613-759-1228
Facsimile | Télécopieur 613-759-1701
Teletypewriter | Téléimprimeur 613-773-2600
Government of Canada | Gouvernement du Canada



-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Stef van
Lieshout
Sent: Monday, June 16, 2014 10:04 AM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] R bioconductor dependencies when creating toolshed
installation

Hi all,

I'm running into some difficulties on how to setup the installation
procedure for a galaxy tool which executes an R script and has certain
dependencies (mainly bioconductor packages). R can deal with
dependencies, packages can be installed with install.packages (has a
dependencies argument) or biocLite() for bioconductor packages.

Yet, now I want my tool to be available at toolsheds. To do this I see
several options:

1) setting up tool_dependencies.xml with R CMD INSTALL for all
packages. BUT: need to download all dependencies before install, and can
older versions still be downloaded? Maybe need to upload them to
toolshed too..

2) setting up tool_dependencies.xml to call an installation script with
Rscript (where I could use install.packages), BUT: Dependencies are
taken care of. But how do I select specific (older) versions, because if
I dont, installing at different time can give different version.

3) creating a repository for each package and have all of them as
requirement in my galaxy tool. BUT: a lot of work for a lot of
dependencies

All have pros and cons, how do people deal with this?

Stef
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this and other
Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] R bioconductor dependencies when creating toolshed installation

2014-06-17 Thread Stef van Lieshout
Hi Bjoern,

That looks much better indeed ;) The only problem I still have then is
that I need R 3.1.0 for a bioconductor 2.14 package (have send a new
mailing list msg for that). Looking at the xml of other versions it's
not something I will easily do myself.

What will happen if I do not specify the R dependency (package_r_3_0_3
in your example code) but do specify the download/install of packages,
guess these get installed in the default R instance?

Related to that, how can I call a specific instance of R in de tool.xml
without specifying the full path to the tool. Eg, in the tool.xml I now
do:

command
  /path/to/lib/R/R-3.1.0/bin/Rscript
  /path/to/galaxy-dist/tools/testdir/tool.R $config
/command

Where normally you can do:

command interpreter=Rscript
  tool.R $config
/command

Thanks again!
Stef
 

- Original message -
From: Björn Grüning bjoern.gruen...@gmail.com
To: Stef van Lieshout stefvanliesh...@fastmail.fm,
galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] R bioconductor dependencies when creating
toolshed installation
Date: Tue, 17 Jun 2014 15:17:36 +0200

Hi Stef,

for R packages we have a special installation routine that will 
(hapefully) make your life easier.

 I'm running into some difficulties on how to setup the installation
 procedure for a galaxy tool which executes an R script and has certain
 dependencies (mainly bioconductor packages). R can deal with
 dependencies, packages can be installed with install.packages (has a
 dependencies argument) or biocLite() for bioconductor packages.

 Yet, now I want my tool to be available at toolsheds. To do this I see
 several options:

Great!

 1) setting up tool_dependencies.xml with R CMD INSTALL for all
 packages. BUT: need to download all dependencies before install, and can
 older versions still be downloaded? Maybe need to upload them to
 toolshed too..

It is all a matter of how reproducible you want to have your tool.
If you want 100% reproducibility, you need to mirror the source packages 
somehow, because bioc will not store older versions. At least that is 
not guaranteed.

I'm using a special github repository for that purpose:
https://github.com/bgruening/download_store

R CMD INSTALL is not needed, see below.

 2) setting up tool_dependencies.xml to call an installation script with
 Rscript (where I could use install.packages), BUT: Dependencies are
 taken care of. But how do I select specific (older) versions, because if
 I dont, installing at different time can give different version.

Older versions is not possible as far as I know.

 3) creating a repository for each package and have all of them as
 requirement in my galaxy tool. BUT: a lot of work for a lot of
 dependencies

Imho, we should have one R repository with a handful of standard 
packages included in the toolshed.
Like packages_r_3_0_1. You should depend on that repository and 
additionally define one second dependency. Lets say your tool is called 
deseq2 than create one additional tool_dependencies.xml file called 
package_deseq2_1_2_10. In that definition you will install every 
dependency you need in addition to R.

Here is one example:
https://github.com/bgruening/galaxytools/blob/master/deseq2/tool_dependencies.xml
https://github.com/bgruening/galaxytools/blob/master/orphan_tool_dependencies/package_deseq2_1_2_10/tool_dependencies.xml

The really nice part is the setup_r_environment function from the 
toolshed. It will install source packages for you automatically. All you 
need to do is to name the package or, as shown in the example, specify 
the location of the source package.

The only downside is that the order of these packages is important. If 
you are interested we have a script that will give you the correct 
dependency tree of a given package.

Hope that helps,
Bjoern


 All have pros and cons, how do people deal with this?

 Stef
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] R bioconductor dependencies when creating toolshed installation

2014-06-17 Thread Björn Grüning

Hi Stef,

Am 17.06.2014 15:40, schrieb Stef van Lieshout:

Hi Bjoern,

That looks much better indeed ;) The only problem I still have then is
that I need R 3.1.0 for a bioconductor 2.14 package (have send a new
mailing list msg for that). Looking at the xml of other versions it's
not something I will easily do myself.


If you can wait a little bit we (the IUC, or more concrete Dave Bouvier) 
will take care of that and create such a repository.



What will happen if I do not specify the R dependency (package_r_3_0_3
in your example code) but do specify the download/install of packages,
guess these get installed in the default R instance?


Puh, to be honest, I do not know. I never tested it without a real 
instance. I guess it will pick the default version.



Related to that, how can I call a specific instance of R in de tool.xml
without specifying the full path to the tool. Eg, in the tool.xml I now
do:

command
   /path/to/lib/R/R-3.1.0/bin/Rscript
   /path/to/galaxy-dist/tools/testdir/tool.R $config
/command

Where normally you can do:

command interpreter=Rscript
   tool.R $config
/command


You should always use the latter version, without a path to R. Setting 
the correct path or assuming the default should be handled by Galaxy. 
The correct R version will be created with the requirement tag. You 
can specify 3.1 as soon as we have it :)


You can thank Dave for the new R packages, he spend much time in 
creating a big R binary that can run on almost all architectures.


Cheers,
Bjoern


Thanks again!
Stef


- Original message -
From: Björn Grüning bjoern.gruen...@gmail.com
To: Stef van Lieshout stefvanliesh...@fastmail.fm,
galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] R bioconductor dependencies when creating
toolshed installation
Date: Tue, 17 Jun 2014 15:17:36 +0200

Hi Stef,

for R packages we have a special installation routine that will
(hapefully) make your life easier.


I'm running into some difficulties on how to setup the installation
procedure for a galaxy tool which executes an R script and has certain
dependencies (mainly bioconductor packages). R can deal with
dependencies, packages can be installed with install.packages (has a
dependencies argument) or biocLite() for bioconductor packages.

Yet, now I want my tool to be available at toolsheds. To do this I see
several options:


Great!


1) setting up tool_dependencies.xml with R CMD INSTALL for all
packages. BUT: need to download all dependencies before install, and can
older versions still be downloaded? Maybe need to upload them to
toolshed too..


It is all a matter of how reproducible you want to have your tool.
If you want 100% reproducibility, you need to mirror the source packages
somehow, because bioc will not store older versions. At least that is
not guaranteed.

I'm using a special github repository for that purpose:
https://github.com/bgruening/download_store

R CMD INSTALL is not needed, see below.


2) setting up tool_dependencies.xml to call an installation script with
Rscript (where I could use install.packages), BUT: Dependencies are
taken care of. But how do I select specific (older) versions, because if
I dont, installing at different time can give different version.


Older versions is not possible as far as I know.


3) creating a repository for each package and have all of them as
requirement in my galaxy tool. BUT: a lot of work for a lot of
dependencies


Imho, we should have one R repository with a handful of standard
packages included in the toolshed.
Like packages_r_3_0_1. You should depend on that repository and
additionally define one second dependency. Lets say your tool is called
deseq2 than create one additional tool_dependencies.xml file called
package_deseq2_1_2_10. In that definition you will install every
dependency you need in addition to R.

Here is one example:
https://github.com/bgruening/galaxytools/blob/master/deseq2/tool_dependencies.xml
https://github.com/bgruening/galaxytools/blob/master/orphan_tool_dependencies/package_deseq2_1_2_10/tool_dependencies.xml

The really nice part is the setup_r_environment function from the
toolshed. It will install source packages for you automatically. All you
need to do is to name the package or, as shown in the example, specify
the location of the source package.

The only downside is that the order of these packages is important. If
you are interested we have a script that will give you the correct
dependency tree of a given package.

Hope that helps,
Bjoern



All have pros and cons, how do people deal with this?

Stef
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/




Re: [galaxy-dev] R bioconductor dependencies when creating toolshed installation

2014-06-17 Thread Stef van Lieshout
Bjoern and Dave,

That sounds great. Of course my next question will be how much is a
little bit ;) It just that I have to move on for now and make things at
least work, so I might try it with using the default R instance now, but
as soon as a 3.1.0 package is out I will definitely pick it up!

Stef

- Original message -
From: Björn Grüning bjoern.gruen...@gmail.com
To: Stef van Lieshout stefvanliesh...@fastmail.fm,
galaxy-dev@lists.bx.psu.edu, Dave Bouvier d...@bx.psu.edu
Subject: Re: [galaxy-dev] R bioconductor dependencies when creating
toolshed installation
Date: Tue, 17 Jun 2014 17:37:24 +0200

Hi Stef,

Am 17.06.2014 15:40, schrieb Stef van Lieshout:
 Hi Bjoern,

 That looks much better indeed ;) The only problem I still have then is
 that I need R 3.1.0 for a bioconductor 2.14 package (have send a new
 mailing list msg for that). Looking at the xml of other versions it's
 not something I will easily do myself.

If you can wait a little bit we (the IUC, or more concrete Dave Bouvier) 
will take care of that and create such a repository.

 What will happen if I do not specify the R dependency (package_r_3_0_3
 in your example code) but do specify the download/install of packages,
 guess these get installed in the default R instance?

Puh, to be honest, I do not know. I never tested it without a real 
instance. I guess it will pick the default version.

 Related to that, how can I call a specific instance of R in de tool.xml
 without specifying the full path to the tool. Eg, in the tool.xml I now
 do:

 command
/path/to/lib/R/R-3.1.0/bin/Rscript
/path/to/galaxy-dist/tools/testdir/tool.R $config
 /command

 Where normally you can do:

 command interpreter=Rscript
tool.R $config
 /command

You should always use the latter version, without a path to R. Setting 
the correct path or assuming the default should be handled by Galaxy. 
The correct R version will be created with the requirement tag. You 
can specify 3.1 as soon as we have it :)

You can thank Dave for the new R packages, he spend much time in 
creating a big R binary that can run on almost all architectures.

Cheers,
Bjoern

 Thanks again!
 Stef


 - Original message -
 From: Björn Grüning bjoern.gruen...@gmail.com
 To: Stef van Lieshout stefvanliesh...@fastmail.fm,
 galaxy-dev@lists.bx.psu.edu
 Subject: Re: [galaxy-dev] R bioconductor dependencies when creating
 toolshed installation
 Date: Tue, 17 Jun 2014 15:17:36 +0200

 Hi Stef,

 for R packages we have a special installation routine that will
 (hapefully) make your life easier.

 I'm running into some difficulties on how to setup the installation
 procedure for a galaxy tool which executes an R script and has certain
 dependencies (mainly bioconductor packages). R can deal with
 dependencies, packages can be installed with install.packages (has a
 dependencies argument) or biocLite() for bioconductor packages.

 Yet, now I want my tool to be available at toolsheds. To do this I see
 several options:

 Great!

 1) setting up tool_dependencies.xml with R CMD INSTALL for all
 packages. BUT: need to download all dependencies before install, and can
 older versions still be downloaded? Maybe need to upload them to
 toolshed too..

 It is all a matter of how reproducible you want to have your tool.
 If you want 100% reproducibility, you need to mirror the source packages
 somehow, because bioc will not store older versions. At least that is
 not guaranteed.

 I'm using a special github repository for that purpose:
 https://github.com/bgruening/download_store

 R CMD INSTALL is not needed, see below.

 2) setting up tool_dependencies.xml to call an installation script with
 Rscript (where I could use install.packages), BUT: Dependencies are
 taken care of. But how do I select specific (older) versions, because if
 I dont, installing at different time can give different version.

 Older versions is not possible as far as I know.

 3) creating a repository for each package and have all of them as
 requirement in my galaxy tool. BUT: a lot of work for a lot of
 dependencies

 Imho, we should have one R repository with a handful of standard
 packages included in the toolshed.
 Like packages_r_3_0_1. You should depend on that repository and
 additionally define one second dependency. Lets say your tool is called
 deseq2 than create one additional tool_dependencies.xml file called
 package_deseq2_1_2_10. In that definition you will install every
 dependency you need in addition to R.

 Here is one example:
 https://github.com/bgruening/galaxytools/blob/master/deseq2/tool_dependencies.xml
 https://github.com/bgruening/galaxytools/blob/master/orphan_tool_dependencies/package_deseq2_1_2_10/tool_dependencies.xml

 The really nice part is the setup_r_environment function from the
 toolshed. It will install source packages for you automatically. All you
 need to do is to name the package or, as shown in the example, specify
 the location of the source package.

 The only 

Re: [galaxy-dev] Per-tool configuration

2014-06-17 Thread Jan Kanis
Too bad there aren't any really good options. I will use the environment
variable approach for the query size limit. For the gene bank links I guess
modifying the .loc file is the least bad way. Maybe it can be merged into
galaxy_blast, that would at least solve the interoperability problems.

@Peter: One potential problem in merging my blast2html tool could be that I
have written it in python3, and the current tool wrapper therefore installs
python3 and a host of its dependencies, making for a quite large download.

Jan


On 16 June 2014 09:08, Peter Cock p.j.a.c...@googlemail.com wrote:

 On Mon, Jun 16, 2014 at 4:18 AM, John Chilton jmchil...@gmail.com wrote:
  Hello Jan,
 
  Thanks for the clarification. Not quite what I was expecting so I am
  glad I asked - I don't have great answers for either case so hopefully
  other people will have some ideas.
 
  For the first use case - I would just specify some default input to
  supply to the input wrapper - lets call this N - add a parameter to
  the tool wrapper --limit-size=N - test that and then allow it to be
  overridden via an environment variable - so in your command block use
  --limit-size=\${BLAST_QUERY_LIMIT:N}. This will use N is not limit
  is set, but deployers can set limits. There are a number of ways to
  set such variables - DRM specific environment files, login rc files,
  etc Just this last release I added the ability to define
  environment variables right in job_conf.xml
  (
 https://bitbucket.org/galaxy/galaxy-central/pull-request/378/allow-specification-of-environment/diff
 ).
  I thought the tool shed might have a way to collect such definitions
  as well and insert them into package files - but Google failed to find
  this for me.

 Hmm. Jan emailed me off list earlier about this. We could insert
 a pre-BLAST script to check the size of the query FASTA file,
 and abort if it is too large (e.g. number of queries, total sequence
 length, perhaps scaled according to the database size if we want
 to get clever?).

 I was hoping there was a more general mechanism in Galaxy -
 after all, BLAST is by no means the only computationally
 expensive tool ;)

 We have had query files of 20,000 and more genes against NR
 (both BLASTP and BLASTX), but our Galaxy has task-splitting
 enabled so this becomes 20 (or more) individual cluster jobs
 of 1000 queries each. This works fine apart from the occasional
 glitch with the network drive when the data is merged afterwards.
 (We know this failed once shortly after the underlying storage
 had been expanded, and would have been under heavy load
 rebalancing the data across the new disks.)

  Not sure about how to proceed with the second use case - extending the
  .loc file should work locally - I am not sure it is feasible within
  the context of the existing tool shed tools, data manager, etc You
  could certainly duplicate this stuff with your modifications - this
  how down sides in terms of interoperability though.

 Currently the BLAST wrappers use the *.loc files directly, but
 this is likely to switch to the newer Data Manager approach.
 That may or may not complicate local modifications like adding
 extra columns...

  Sorry I don't have great answers for either question,
  -John

 Thanks John,

 Peter

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] What is the correct place under Galaxy for a database that's created by a tool?

2014-06-17 Thread Melissa Cline
Good thing to clarify: this is a tool-specific database, created by tools
that are running inside Galaxy but that should persist after the individual
tools are done with their execution.


On Mon, Jun 16, 2014 at 11:23 PM, Björn Grüning bjoern.gruen...@gmail.com
wrote:

 Hi Melissa,

 Am 17.06.2014 01:07, schrieb Melissa Cline:

  Hi folks,

 Hopefully this is a quick question.  I'm working on a set of tools that
 will fire off a VM from within Galaxy and will then communicate with the
 VM.  The VM will create a local database.


 Are we talking here about a local Galaxy database or a tool specific
 database?

 best,
 Bjoern


  The vision is that this won't be

 a shared database; in a shared Galaxy instance, each user will have his or
 her own database.  What is the best place to create this database under
 the
 Galaxy file system?

 Thanks!

 Melissa



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] What is the correct place under Galaxy for a database that's created by a tool?

2014-06-17 Thread Björn Grüning

Hi,

Am 17.06.2014 21:25, schrieb Melissa Cline:

Good thing to clarify: this is a tool-specific database, created by tools
that are running inside Galaxy but that should persist after the individual
tools are done with their execution.


Should it persist as output data or forever even if I start my workflow 
from scratch?


What will happen to the db if I reload a tool and rerun it. Will it 
extend the database or will it use a new one from scratch?


You have access to the user name if you are running the tool. With that 
you can create a table for every user and store your data user-specific.


Cheers,
Bjoern



On Mon, Jun 16, 2014 at 11:23 PM, Björn Grüning bjoern.gruen...@gmail.com
wrote:


Hi Melissa,

Am 17.06.2014 01:07, schrieb Melissa Cline:

  Hi folks,


Hopefully this is a quick question.  I'm working on a set of tools that
will fire off a VM from within Galaxy and will then communicate with the
VM.  The VM will create a local database.



Are we talking here about a local Galaxy database or a tool specific
database?

best,
Bjoern


  The vision is that this won't be


a shared database; in a shared Galaxy instance, each user will have his or
her own database.  What is the best place to create this database under
the
Galaxy file system?

Thanks!

Melissa



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/





___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Per-tool configuration

2014-06-17 Thread Peter Cock
On Tue, Jun 17, 2014 at 4:57 PM, Jan Kanis jan.c...@jankanis.nl wrote:
 Too bad there aren't any really good options. I will use the environment
 variable approach for the query size limit.

Are you using the optional job splitting (parallelism) feature in Galaxy?
That seems to be me to be a good place to insert a Galaxy level
job size limit. e.g. BLAST+ jobs are split into 1000 query chunks,
so you might wish to impose a 25 chunk limit?

Long term being able to set limits on the input file parameters
of each tool would be nicer - e.g. Limit BLASTN to at most
20,000 queries, limit MIRA to at most 50GB FASTQ files, etc.

 For the gene bank links I guess modifying the .loc file is the least
 bad way. Maybe it can be merged into galaxy_blast, that would at
 least solve the interoperability problems.

It would have to be sufficiently general, and backward compatible.

FYI other people have also looked at extending the blast *.loc
files (e.g. adding a category column for helping filter down a
very large BLAST database list).

 @Peter: One potential problem in merging my blast2html tool
 could be that I have written it in python3, and the current tool
 wrapper therefore installs python3 and a host of its dependencies,
 making for a quite large download.

Without seeing your code, it is hard to say, but actually writing
Python code which works unmodified under Python 2.7 and
Python 3 is quite doable (and under Python 2.6 with a few
more provisos). Both NumPy and Biopython do this if you
wanted some reassurance.

On the other hand, Galaxy itself will need to more to Python 3
at some point, and certainly individual tools will too. This will
probably mean (as with Linux Python packages) having double
entries on the ToolSehd (one for Python 2, one for Python 3),

e.g ToolShed package for NumPy under Python 2 (done)
and under Python 3 (needed).

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] What is the correct place under Galaxy for a database that's created by a tool?

2014-06-17 Thread Melissa Cline
Hi Björn,

The database should persist forever, even if the workflow is restarted.
 I'm not sure about the distinction between forever and output data, but
the database should remain in place until the user takes specific action to
make it go away.

If you reload a tool, it will reload/extend the existing database.

If we have access to the user name, and can create user-specific data
within one single database, that should address our needs.

Thanks!

Melissa



On Tue, Jun 17, 2014 at 12:32 PM, Björn Grüning bjoern.gruen...@gmail.com
wrote:

 Hi,

 Am 17.06.2014 21:25, schrieb Melissa Cline:

  Good thing to clarify: this is a tool-specific database, created by tools
 that are running inside Galaxy but that should persist after the
 individual
 tools are done with their execution.


 Should it persist as output data or forever even if I start my workflow
 from scratch?

 What will happen to the db if I reload a tool and rerun it. Will it extend
 the database or will it use a new one from scratch?

 You have access to the user name if you are running the tool. With that
 you can create a table for every user and store your data user-specific.

 Cheers,
 Bjoern



 On Mon, Jun 16, 2014 at 11:23 PM, Björn Grüning 
 bjoern.gruen...@gmail.com
 wrote:

  Hi Melissa,

 Am 17.06.2014 01:07, schrieb Melissa Cline:

   Hi folks,


 Hopefully this is a quick question.  I'm working on a set of tools that
 will fire off a VM from within Galaxy and will then communicate with the
 VM.  The VM will create a local database.


 Are we talking here about a local Galaxy database or a tool specific
 database?

 best,
 Bjoern


   The vision is that this won't be

  a shared database; in a shared Galaxy instance, each user will have his
 or
 her own database.  What is the best place to create this database under
 the
 Galaxy file system?

 Thanks!

 Melissa



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Per-tool configuration

2014-06-17 Thread John Chilton
On Tue, Jun 17, 2014 at 2:55 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 On Tue, Jun 17, 2014 at 4:57 PM, Jan Kanis jan.c...@jankanis.nl wrote:
 Too bad there aren't any really good options. I will use the environment
 variable approach for the query size limit.

 Are you using the optional job splitting (parallelism) feature in Galaxy?
 That seems to be me to be a good place to insert a Galaxy level
 job size limit. e.g. BLAST+ jobs are split into 1000 query chunks,
 so you might wish to impose a 25 chunk limit?

 Long term being able to set limits on the input file parameters
 of each tool would be nicer - e.g. Limit BLASTN to at most
 20,000 queries, limit MIRA to at most 50GB FASTQ files, etc.

Trello card created, please vote!

https://trello.com/c/0XQXVhRz


 For the gene bank links I guess modifying the .loc file is the least
 bad way. Maybe it can be merged into galaxy_blast, that would at
 least solve the interoperability problems.

 It would have to be sufficiently general, and backward compatible.

 FYI other people have also looked at extending the blast *.loc
 files (e.g. adding a category column for helping filter down a
 very large BLAST database list).

 @Peter: One potential problem in merging my blast2html tool
 could be that I have written it in python3, and the current tool
 wrapper therefore installs python3 and a host of its dependencies,
 making for a quite large download.

 Without seeing your code, it is hard to say, but actually writing
 Python code which works unmodified under Python 2.7 and
 Python 3 is quite doable (and under Python 2.6 with a few
 more provisos). Both NumPy and Biopython do this if you
 wanted some reassurance.

 On the other hand, Galaxy itself will need to more to Python 3
 at some point, and certainly individual tools will too. This will
 probably mean (as with Linux Python packages) having double
 entries on the ToolSehd (one for Python 2, one for Python 3),

I certainly hope Galaxy can move to Python 3 at some point... being a
pessimist though I would place bets against it :).


 e.g ToolShed package for NumPy under Python 2 (done)
 and under Python 3 (needed).

 Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Import of Capsules failing on local toolshed instance

2014-06-17 Thread Will Holtz
It looks like Dave fixed this issue in 4c58aa19a3d7. Thank you!

However I am still having import issues. I am now getting the message:
Archive of repository package_bowtie_0_12_7 owned by devteam
Import failed: repository owner devteam does not have an account in this
Tool Shed.
This is on a local toolshed running 9b78595ec11 where I am performing the
input from an admin account. I'm guessing the issue is that I have
'use_remote_user=True' for LDAP authentication and that means that a
devteam account cannot be automatically created to allow this capsule to be
added without modification. Perhaps on import of a capsule (by an
administrator) they could be given the option of preserving the existing
owners or re-assigning ownership to an existing user (defaulting to self)?

Of course, what I really want is inter-toolshed dependancies. Maybe I'm
missing something, but I'm finding it quite painful just to get a tool
development environment setup that makes use of any existing repositories.

thank you for your help,

-Will



On Wed, Jun 11, 2014 at 12:40 PM, Will Holtz who...@lygos.com wrote:

 I am now able to export capsules from the main/test toolsheds -- thanks
 Dave! When attempting to import these capsules into my local toolshed 
 (latest_2014.06.02
 for changeset fb68af9a775a) I receive the following error:

 URL: https://galaxy.lygos.com:99/repository/import_capsule
 File
 '/home/galaxy/galaxy-dist/lib/galaxy/web/framework/middleware/error.py',
 line 149 in __call__
   app_iter = self.application(environ, sr_checker)
 File
 '/home/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/debug/prints.py',
 line 106 in __call__
   environ, self.app)
 File
 '/home/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/wsgilib.py',
 line 543 in intercept_output
   app_iter = application(environ, replacement_start_response)
 File
 '/home/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/recursive.py',
 line 84 in __call__
   return self.application(environ, start_response)
 File
 '/home/galaxy/galaxy-dist/lib/galaxy/webapps/tool_shed/framework/middleware/remoteuser.py',
 line 74 in __call__
   return self.app( environ, start_response )
 File
 '/home/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpexceptions.py',
 line 633 in __call__
   return self.application(environ, start_response)
 File '/home/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 132
 in __call__
   return self.handle_request( environ, start_response )
 File '/home/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 190
 in handle_request
   body = method( trans, **kwargs )
 File
 '/home/galaxy/galaxy-dist/lib/galaxy/webapps/tool_shed/controllers/repository.py',
 line 1992 in import_capsule
   import_util.check_status_and_reset_downloadable( trans,
 import_results_tups )
 File '/home/galaxy/galaxy-dist/lib/tool_shed/util/import_util.py', line 34
 in check_status_and_reset_downloadable
   tip_changeset_revision = repository.tip( trans.app )
 AttributeError: 'NoneType' object has no attribute 'tip'

 I have seen the same behavior for capsules based on the following
 repositories from the test toolshed: package_biopython_1_62,
 package_vienna_rna_2_1, and package_bowtie_0_12_7. I am logged in as an
 admin user for the import process.

 thanks,
 -Will

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Postgresql database time wrong...

2014-06-17 Thread Neil.Burdett
Hi
   I have quite a strange issue. I have a local install of Galaxy setup. When I 
type 'date' on my Ubuntu machine I get something like:

Wed Jun 18 09:25:22 EST 2014

When i then execute a job and look in the database at the create_time i.e.

# select create_time from job order by create_time;

I get

2014-06-17 23:20:00.133828

So about 10 hours different. Is there some configuration I need to set as 
Brisbane is 10hrs ahead of GMT (coincidence?)

Thanks
Neil
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Postgresql database time wrong...

2014-06-17 Thread Will Holtz
Postgres generally stores datatime fields in GMT, and then translates them
to the local time zone when generating a query. Check the TimeZone variable
in your postgres.conf.

http://www.postgresql.org/docs/9.3/static/datatype-datetime.html#DATATYPE-TIMEZONES

-Will


On Tue, Jun 17, 2014 at 4:29 PM, neil.burd...@csiro.au wrote:

  Hi
I have quite a strange issue. I have a local install of Galaxy setup.
 When I type 'date' on my Ubuntu machine I get something like:

 Wed Jun 18 09:25:22 EST 2014

 When i then execute a job and look in the database at the create_time i.e.

 # select create_time from job order by create_time;

 I get

 2014-06-17 23:20:00.133828

 So about 10 hours different. Is there some configuration I need to set as
 Brisbane is 10hrs ahead of GMT (coincidence?)

 Thanks
 Neil

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Import of Capsules failing on local toolshed instance

2014-06-17 Thread Greg Von Kuster
Hello Will,

On Jun 17, 2014, at 6:06 PM, Will Holtz who...@lygos.com wrote:

 It looks like Dave fixed this issue in 4c58aa19a3d7. Thank you!
 
 However I am still having import issues. I am now getting the message:
 Archive of repository package_bowtie_0_12_7 owned by devteam
 Import failed: repository owner devteam does not have an account in this Tool 
 Shed.
 This is on a local toolshed running 9b78595ec11 where I am performing the 
 input from an admin account. I'm guessing the issue is that I have 
 'use_remote_user=True' for LDAP authentication and that means that a devteam 
 account cannot be automatically created to allow this capsule to be added 
 without modification.

Sorry you're still running into problems.  No regular development or testing is 
done using the use_remote_user setting due to resource limitations.

 Perhaps on import of a capsule (by an administrator) they could be given the 
 option of preserving the existing owners or re-assigning ownership to an 
 existing user (defaulting to self)?

This would be non-trivial, and probably would introduce fragility into the 
process.  However, I believe there is a solution (see my next comment) although 
I haven't tested it with the use_remote_user setting.
 
 Of course, what I really want is inter-toolshed dependancies. Maybe I'm 
 missing something, but I'm finding it quite painful just to get a tool 
 development environment setup that makes use of any existing repositories.

There was some work done in this area in the June 2, 2014 release.  Here is 
some information for more easily setting up a local development Tool Shed that 
use new features introduced in the release.  Hopefully this will help.

Greg Von Kuster
Bootstrapping a New Development Tool Shed

The June 2, 2014 release introduces the ability to easily bootstrap a new 
development Tool Shed to prepare it for importing a repository capsule whose 
contained repositories can be used as the foundation for developing new Galaxy 
tools and other utilities.  This development Tool Shed can be treated as a 
component in a multi-step process that simplifies and streamlines Galaxy tool 
development and validation in the local Tool Shed and moving the validated 
repositories into the test or main public Galaxy Tool Sheds.  Tool Shed 
framework enhancements included in the June 2, 2014 release support this 
overall process, which will be explained fully in a future article.  Here we’ll 
restrict our discussion to highlights of the enhancements.

Several files are included in a new directory named 
~/lib/tool_shed/scripts/api/bootstrap_from_toolshed.  The file named 
user_info.xml.sample should be copied to a file  with the same name, but 
eliminating the .sample extension (i.e., user_info.xml).  The information in 
this file is used to automatically create a new “admin” user account in your 
local development Tool Shed.  This should be the account you use in the test 
and main public Galaxy Tool Sheds if you plan to export your work from your 
development Tool Shed and import it into one or both of the public Tool Sheds.

If you plan to use this new bootstrapping process, make sure your local 
development Tool Shed environment is pristine:

The hgweb.config file must be empty or missing (it will automatically get 
created if it doesn’t exist) and the configured location for repositories must 
be an empty directory.
The configured database must be new and not yet migrated.
Make sure the ~/lib/tool_shed/scripts/api/bootstrap_from_toolshed/user_info.xml 
file contains the desired account information.
The ~/run_tool_shed.sh script, used for starting up a Tool Shed, has been 
enhanced to enable this bootstrapping process by using a new 
-bootstrap_from_tool_shed flag.  Here’s an example.

%sh run_tool_shed.sh -bootstrap_from_tool_shed http://toolshed.g2.bx.psu.edu
The above example will initialize a local development Tool Shed (here we’ll 
assume its URL is http://localhost:9009) by bootstrapping from the main public 
Galaxy Tool Shed.  The bootstrapping process will perform the following actions 
in the order listed.

Ensure the Tool Shed environment is pristine (e.g., empty hgweb.config file and 
new database that has not been migrated).
Copy all .sample files configured in run_tool_shed.sh.
Run the database migration process.
Execute the script 
~/lib/tool_shed/scripts/bootstrap_tool_shed/create_user_with_api_key.py 
~/tool_shed_wsgi.ini to create a new user and an associated API key using the 
information defined in 
~/lib/tool_shed/scripts/bootstrap_tool_shed/user_info.xml.
Automatically modify the ~/tool_shed_wsgi.ini file to configure the above user 
as an admin_user for the Tool Shed.
Start the Tool Shed web server.
Execute the script ~/lib/tool_shed/scripts/api/create_users.py -a api_key -f 
http://toolshed.g2.bx.psu.edu -t http://localhost:9009.
Execute the script ~/lib/tool_shed/scripts/api/create_categories.py -a 
api_key -f http://toolshed.g2.bx.psu./edu -t 

Re: [galaxy-dev] Postgresql database time wrong...

2014-06-17 Thread Neil.Burdett
Thanks, but looking at /etc/postgresql/9.1/main/postgresql.conf i have the 
following:
#timezone = '(defaults to server environment setting)'
#timezone_abbreviations = 'Default' # Select the set of available time zone
# abbreviations.  Currently, there are
#   Default
#   Australia
#   India
# You can create your own file in
# share/timezonesets/.

So i assume it should get the time from the ubuntu machine it runs on. I have 
not done any configuration to the postgresql database. Only installing it.

Neil

From: Will Holtz [who...@lygos.com]
Sent: Wednesday, June 18, 2014 10:25 AM
To: Burdett, Neil (CCI, Herston - RBWH)
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Postgresql database time wrong...

Postgres generally stores datatime fields in GMT, and then translates them to 
the local time zone when generating a query. Check the TimeZone variable in 
your postgres.conf.

http://www.postgresql.org/docs/9.3/static/datatype-datetime.html#DATATYPE-TIMEZONES

-Will


On Tue, Jun 17, 2014 at 4:29 PM, 
neil.burd...@csiro.aumailto:neil.burd...@csiro.au wrote:
Hi
   I have quite a strange issue. I have a local install of Galaxy setup. When I 
type 'date' on my Ubuntu machine I get something like:

Wed Jun 18 09:25:22 EST 2014

When i then execute a job and look in the database at the create_time i.e.

# select create_time from job order by create_time;

I get

2014-06-17 23:20:00.133828

So about 10 hours different. Is there some configuration I need to set as 
Brisbane is 10hrs ahead of GMT (coincidence?)

Thanks
Neil

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/