Re: [galaxy-dev] Examples of Galaxy tools in the toolsheds that install and run JAR files properly?

2014-09-03 Thread Lukasse, Pieter
Hi Melissa,

Just commenting on point 1 of your email: did you try the “Reset metadata “ 
option? See screenshot below:

[cid:image001.jpg@01CFC756.F193E4B0]

For points 2 and 3: I normally look at the status (Installed/Green) and test 
whether the correct version appear in the menu for the users. I tend to ignore 
other messages or temporary(?)hiccups in the installation process web page.


Regards,

Pieter.



From: melissa.s.cl...@gmail.com [mailto:melissa.s.cl...@gmail.com] On Behalf Of 
Melissa Cline
Sent: woensdag 3 september 2014 3:23
To: Dave Bouvier
Cc: Lukasse, Pieter; Peter Cock; Galaxy Dev
Subject: Re: [galaxy-dev] Examples of Galaxy tools in the toolsheds that 
install and run JAR files properly?

Peter, Pieter and Dave, thank you for the pointers to your tools - they've been 
extremely helpful!  Now I can see how the process is supposed to work.  I'm not 
really sure where mine is going wrong, but maybe someone here will have ideas.

So folks, I've had partial success with a repository that includes a 
tool_dependencies.xml, which sets two environment variables and moves a JAR 
file from REPOSITORY_INSTALL_DIR to INSTALL_DIR.  But the following things 
suggest to me that it's only a partial success, and I'm very interested in any 
insights to clear them up.


1. I have my repository checked into an internal tool shed.  Somewhere in the 
course of development, I specified a buggy dependency, and now I can't seem to 
clear it up.  When I go to (re)install my tool from the tool shed, here's what 
I see in the way of dependencies:

Tool dependencies - these dependencies may not be required by tools in this 
repository

Name

Version

Type

Orphan

JAR_PAHT

set_environment

yes


and as you might imagine, I've checked the obvious things, and there is no more 
reference to JAR_PAHT (typos and all) anywhere in or around my tool, or in my 
galaxy-dist directory.  It seems like there's some old metadata cruft that 
hasn't been cleared out.  How can I clear it out?


2. When I install my tool, I see the following messages in paster.log:
---
tool_shed.galaxy_install.repository_dependencies.repository_dependency_manager 
DEBUG 2014-09-02 17:13:59,279 Building repository dependency relationships...
172.30.0.22 - - [02/Sep/2014:17:13:59 -0700] POST 
/admin_toolshed/prepare_for_install HTTP/1.1 200 - 
http://tcga1:1235/admin_toolshed/prepare_for_install?tool_shed_url=http://medbook.ucsc.edu:9009/repository_ids=decab5ee1e95b10bchangeset_revisions=ff9b02e50bcf;
 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/36.0.1985.143 Safari/537.36
172.30.0.22 - - [02/Sep/2014:17:14:02 -0700] POST 
/admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - 
http://tcga1:1235/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; 
Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/36.0.1985.143 Safari/537.36
tool_shed.util.shed_util_common DEBUG 2014-09-02 17:14:03,477 Error attempting 
to get tool shed status for installed repository start_xena: HTTP Error 404: 
Not Found
Attempting older 'check_for_updates' method.
---
Should I be concerned about these last messages?  I don't remember seeing them 
with Peter, Pieter and Dave's tools.

3. After my tool has installed, there is no INSTALLATION.log in my INSTALL_DIR. 
 Does this mean that the installation process somehow terminated early?  There 
is an env.sh file, with the correct values of the environment variables I'm 
setting in my tool_dependencies.xml, and my jar file is copied to INSTALL_DIR.  
In my Galaxy window, my tool is indicated as Installed, in green.  Here are 
the last messages I see in paster.log:
tool_shed.galaxy_install.install_manager DEBUG 2014-09-02 17:14:04,630 Changing 
status for tool dependency installXena from Installing to Installed.
tool_shed.galaxy_install.install_manager DEBUG 2014-09-02 17:14:04,669 Tool 
dependency installXena version 1.0 has been installed in 
/inside/home/cline/src/galaxy-dist/tool_dependencies/installXena/1.0/melissacline/start_xena/3e4683c94d8d.
172.30.0.22 - - [02/Sep/2014:17:13:59 -0700] POST 
/admin_toolshed/manage_repositories HTTP/1.1 302 - 
http://tcga1:1235/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; 
Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/36.0.1985.143 Safari/537.36
172.30.0.22 - - [02/Sep/2014:17:14:04 -0700] GET 
/admin_toolshed/monitor_repository_installation?tool_shed_repository_ids=3f5830403180d620
 HTTP/1.1 200 - http://tcga1:1235/admin_toolshed/prepare_for_install; 
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/36.0.1985.143 Safari/537.36
172.30.0.22 - - [02/Sep/2014:17:14:05 -0700] POST 
/admin_toolshed/repository_installation_status_updates HTTP/1.1 200 - 
http://tcga1:1235/admin_toolshed/prepare_for_install; Mozilla/5.0 (Macintosh; 
Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/36.0.1985.143 

[galaxy-dev] Startup error after restoring Galaxy DB from backup

2014-09-03 Thread Graeme Grimes

I have tried to restart Galaxy after restoring my database from a backup.

Here is the error message I get in the log file. Any idea what is wrong 
and how to fix this problem?


--
galaxy.jobs DEBUG 2014-09-03 08:33:46,367 Loading job configuration from 
/export/users/galaxy/galaxy-test/universe_wsgi.ini

galaxy.jobs DEBUG 2014-09-03 08:33:46,367 Done loading job configuration
Traceback (most recent call last):
  File 
/export/users/galaxy/galaxy-test/lib/galaxy/webapps/galaxy/buildapp.py, line 
35, in app_factory

app = UniverseApplication( global_conf = global_conf, **kwargs )
  File /export/users/galaxy/galaxy-test/lib/galaxy/app.py, line 102, 
in __init__
self.toolbox = tools.ToolBox( tool_configs, self.config.tool_path, 
self )
  File /export/users/galaxy/galaxy-test/lib/galaxy/tools/__init__.py, 
line 118, in __init__

self.load_integrated_tool_panel_keys()
  File /export/users/galaxy/galaxy-test/lib/galaxy/tools/__init__.py, 
line 283, in load_integrated_tool_panel_keys

tree = parse_xml( self.integrated_tool_panel_config )
  File /export/users/galaxy/galaxy-test/lib/galaxy/util/__init__.py, 
line 132, in parse_xml

tree = ElementTree.parse(fname)
  File 
/export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py, 
line 859, in parse

tree.parse(source, parser)
  File 
/export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py, 
line 583, in parse

parser.feed(data)
  File 
/export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py, 
line 1242, in feed

self._parser.Parse(data, 0)
ExpatError: not well-formed (invalid token): line 117, column 1
Removing PID file /var/run/paster.pid
-


Thanks,

Graeme


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] directory as an input file

2014-09-03 Thread John Chilton
Collection are one potential answer for how users can specify the set
of stuff that belongs in the directory. For explicitly dealing with
applications that consume directories - I think it is best to just
create the directory and link in files (if possible) before the tool
runs.

commandmkdir input_dir;
#for $i, $input_file in enumerate($input_files)# ln -s $input_file
input_dir/$i; #end for
my_application input_dir
/command
inputs
  param name=input_files type=data format=bam multiple=true /
  ...

You can also do this sort of thing in a wrapper. If needed you can
build more interesting command-lines this way that add extensions, use
names, etc peptideshaker is an example of a fairly complex tool
that uses an idiom like this and doesn't resort to a helper wrapper
(https://toolshed.g2.bx.psu.edu/repository/browse_repository?id=13a5bad5c984db6f#).

I am currently working on support for tools that actually produce
collections of files this way - I think I will probably land up adding
some high-level utilities for doing stuff like this for those
scenarios. But if you tool just produces a couple files and consume a
directory - no need to necessarily resort to collections (as the tool
author - your users will probably want to if they want to use these
tools in workflows).

-John


On Tue, Sep 2, 2014 at 9:51 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 You might be able to do this by accepting a collection of
 SAM/BAM files as input instead. This is a quite new feature
 in Galaxy, see:

 https://wiki.galaxyproject.org/News/2014_06_02_Galaxy_Distribution

 Peter

 On Wed, Sep 3, 2014 at 10:00 AM, Philippe Moncuquet
 philippe.m...@gmail.com wrote:
 Hi,

 I am trying to write a wrapper for a tool that take a directory containing
 SAM/BAM files as an input. I am not sure how to do that, is there another
 tool that implements this and that I can have a look at ? Any suggestions
 would be greatly appreciated.

 Regards,
 Philip

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Startup error after restoring Galaxy DB from backup

2014-09-03 Thread Daniel Blankenberg
Hi Graeme,

It looks like your integrated_tool_panel.xml file has been corrupted. You can 
move/remove this file and it will be recreated the next time Galaxy is started 
up.


Thanks for using Galaxy,

Dan


On Sep 3, 2014, at 3:54 AM, Graeme Grimes graeme.gri...@igmm.ed.ac.uk wrote:

 I have tried to restart Galaxy after restoring my database from a backup.
 
 Here is the error message I get in the log file. Any idea what is wrong and 
 how to fix this problem?
 
 --
 galaxy.jobs DEBUG 2014-09-03 08:33:46,367 Loading job configuration from 
 /export/users/galaxy/galaxy-test/universe_wsgi.ini
 galaxy.jobs DEBUG 2014-09-03 08:33:46,367 Done loading job configuration
 Traceback (most recent call last):
  File 
 /export/users/galaxy/galaxy-test/lib/galaxy/webapps/galaxy/buildapp.py, 
 line 35, in app_factory
app = UniverseApplication( global_conf = global_conf, **kwargs )
  File /export/users/galaxy/galaxy-test/lib/galaxy/app.py, line 102, in 
 __init__
self.toolbox = tools.ToolBox( tool_configs, self.config.tool_path, self )
  File /export/users/galaxy/galaxy-test/lib/galaxy/tools/__init__.py, line 
 118, in __init__
self.load_integrated_tool_panel_keys()
  File /export/users/galaxy/galaxy-test/lib/galaxy/tools/__init__.py, line 
 283, in load_integrated_tool_panel_keys
tree = parse_xml( self.integrated_tool_panel_config )
  File /export/users/galaxy/galaxy-test/lib/galaxy/util/__init__.py, line 
 132, in parse_xml
tree = ElementTree.parse(fname)
  File 
 /export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py,
  line 859, in parse
tree.parse(source, parser)
  File 
 /export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py,
  line 583, in parse
parser.feed(data)
  File 
 /export/users/galaxy/galaxy-test/eggs/elementtree-1.2.6_20050316-py2.6.egg/elementtree/ElementTree.py,
  line 1242, in feed
self._parser.Parse(data, 0)
 ExpatError: not well-formed (invalid token): line 117, column 1
 Removing PID file /var/run/paster.pid
 -
 
 
 Thanks,
 
 Graeme
 
 
 -- 
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Concept for a Galaxy Versioned Fasta Data Retrieval Tool

2014-09-03 Thread Dooley, Damion
Hi,

There have been a few comments about how general we could make the system for 
Galaxy use or just as a stand-alone command line driven tool.  So some notes 
below about what I could see it taking on.  Given the scale of the sequencing 
data problem, I'm sure the Galaxy community has important feedback on this.

I looked at git annex and it appears to me that though it promises to keep 
track of and synchronize network located files, it doesn't do versioning on 
them - am I wrong about that?

I also looked at https://code.google.com/p/leveldb/ , also a key value database 
which relies more heavily on indexes - but I see that though this is well-tuned 
to answering key queries, it isn't particularly good at storing and retrieving 
entire versions of a database that could be many gigabytes long, which is our 
mission.

It is relatively easy to generalize the simple keydb prototype I wrote so that 
it can handle any key-value database - including binary content and even binary 
key data, not just text (fasta sequences).  So a name change for the tool is a 
good idea. 

I want a versioning system that doesn't assume the incoming master file of 
key-value pairs is in the same order as it was on a previous import run.  I was 
afraid that any arbitrary change in the order of content on the source server 
could completely destroy the efficiency of a differential approach.  Git 
assumes its content is like a document - so it generates a slew of inserts and 
deletes, in fact provides no benefit, if the fasta entries are rearranged.  I 
tested helping git overcome this hurdle by converting the fasta content to 1 
line key/value fasta entries, and sorting them before git processing. That 
seemed to work for some smaller and larger nucleotide fasta files (tested 10m 
to 2gb) but failed when it came to processing protein fasta files; though 
possibly that was because of the fasta data line length.  That became another 
concern - thinking that git was failing because each line of the input file was 
many thousands of characters long.

So having done a keydb versioning engine that works and performs as well as 
git, I am definitely shying away from git now as unreliable on certain kinds of 
data.  The keydb approach is able to generate a version file at about the same 
speed that it takes to read the latest version of the same db, i.e. at 50mb/s 
on a standard hard drive.

An extension to keydb that enables it to take in just a list of adds or deletes 
or updates is desirable but that can come later.  More efficiency can be had by 
fine-tuning the updates so that one whole line of key-value doesn't have to 
replace the previous one but that's for later too.

A generalization note that the keydb approach works where the keys are a sparse 
array.  There's nothing stopping the keys from representing a 2D or 3D sparse 
array of data as long as the coordinates are coded uniquely into the one key 
list.

For those interested in versioning XML data there is an interesting summary of 
the challenges here:  
http://useless-factor.blogspot.ca/2008/01/matching-diffing-and-merging-xml.html 
.  It leaves me thinking that quick versioning of xml data could only be 
accomplished if it could somehow be converted into a key-value db, i.e. with 
each top level xml record identified by a unique key.

I could see breaking larger keydb databases up into smaller chunks for data 
retrieval and fast parallel processing - the usual approach being to separate 
the sorted key-value db out into files based on the first character or two in 
the key of each record.
  
Does this go along with people's expectations?

Cheers,

Damion 


From: Björn Grüning [bjoern.gruen...@gmail.com]
Sent: Monday, September 01, 2014 12:47 PM
To: Dooley, Damion; Björn Grüning; galaxy-dev@lists.bx.psu.edu
Cc: Hsiao, William
Subject: Re: [galaxy-dev] Concept for a Galaxy Versioned Fasta Data Retrieval 
Tool

Am 25.08.2014 um 18:05 schrieb Dooley, Damion:
 Ok, I'll be very happy to see what you've accomplished there.  I will read 
 through what you've done when I return from vacation in a week!

 A key need is to have whatever data comes in show up as linked data in one's 
 history to avoid server overhead;
a second objective was to not need to modify existing workflows - as
long as they could work of data in history that is typed appropriately.
  So your 'select type' solution sounds intreguing!

 And certainly interested in your use of git - I tried using git, using a 
 1-line fasta data format, but git seemed to choke on protein fasta files?
 And did it run into performance problems with larger files?  That was my 
 experience.  I think I read its authors say that its upper limit was 15gb.

This is probably true for one large file. I'm storing the entire PDB in
git since a few years. One entry one file and it works fine.

Do you know git annex? https://git-annex.branchable.com/

That was the motivation for writing a simple