Re: [galaxy-dev] Trackster and gff file with multiple chromosome annotations

2012-10-31 Thread Yec'han Laizet

I will modify the gff file as you mentioned and update galaxy.

Thanks a lot.

Yec'han




Yec'han LAIZET
Ingenieur
Plateforme Genome Transcriptome
Tel: 05 57 12 27 75
_
INRA-UMR BIOGECO 1202
Equipe Genetique
69 route d'Arcachon
33612 CESTAS


Le 29/10/2012 15:59, Jeremy Goecks a écrit :
Whatever the file type I set for the gff file (gff3, gff or gtf), I 
get the transcript_id error:


Traceback (most recent call last):
 File 
/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py, 
line 91, in

   main()
 File 
/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/converters/interval_to_fli.py, 
line 30, in main

   for feature in read_unordered_gtf( open( in_fname, 'r' ) ):
 File 
/home/pgtgal/galaxy-dist/lib/galaxy/datatypes/util/gff_util.py, 
line 375, in read_unordered_gtf

   transcript_id = line_attrs[ 'transcript_id' ]
KeyError: 'transcript_id'


This was due to an incomplete feature. Turns out that GFF support 
hadn't been included in feature search; I've added it in -central 
changeset fa045aad74e9:


https://bitbucket.org/galaxy/galaxy-central/changeset/fa045aad74e90f16995e0cbb670a59e6b9becbed


Is the gff file not correct?


I believe there is an issue with your GFF: it is using non-standard 
identifiers in the attributes (last) column. To the best of my 
knowledge, 'name' is not a valid field for connecting features in GFF3 
(which is my best guess for the file version), but your GFF uses this 
field anyways.


To fix this issue, I replaced 'name' with 'ID' (which is compliant 
GFF3) from the command line:


--
% sed s/name/ID/ ~/Downloads/test.gff  ~/Downloads/test_with_ids.gff
--

and this fixed the issue.

Finally, there is a sed wrapper in the toolshed should you want to do 
this conversion in Galaxy:


http://toolshed.g2.bx.psu.edu/repository/browse_categories?sort=nameoperation=view_or_manage_repositoryf-deleted=Falsef-free-text-search=sedid=9652a50c5a932f3e

Best,
J.


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Galaxy local install

2012-10-31 Thread Hans-Rudolf Hotz

Hi Vladimir


I contacted with this question vendor tech support (Dell), but they
could not answer (or did not want to) and directed me to Galaxy
developers. I am using RHEL58 and SciLinux55 and want to install a local
instance of Galaxy. Both my systems are based on Python 2.4. Question –
can I install Python 2.6/2.7 locally without messing up the system? I
was advised earlier not to make system install, but being unhealthy
curious I did and ended up with reinstalling SciLinux 55 from scratch.
How to make sure 2.6/2.7 will not mess up the system’s Python?



just install Python 2.6 somewhere in on your box (ie in the parallel to 
the galaxy directory) and follow the steps described under: Check your 
Python version on this wiki page:

http://wiki.g2.bx.psu.edu/Admin/Get%20Galaxy

I recently just did it for one of our development boxes (which has 
Python 2.5) to allow galaxy to run with Python 2.6



Regards, Hans-Rudolf




Thanks,

Vladimir



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Parallelism tag and job splitter

2012-10-31 Thread Peter Cock
On Wednesday, October 31, 2012, Edward Hills wrote:

 Thanks Peter.

 My next question is, I have found that VCF files don't get split properly
 as the header is not included in the second file as is usually required by
 tools (such as vcf-subset). I have read the code and am happy to implement
 this functionality but am not to sure where this would best be done.

 I see a class Text ( data ) which looks like every datatype is sent to.
 Would it be best to implement a VCF class which is called when the datatype
 is VCF?

 Cheers,
 Ed


VCF is I assume defined as a subclass of Text, so inherits the naive simple
splitting implemented for text files (which doesn't know about headers).

Have a look at the SAM splitting code (under lib/galaxy/datatypes/*.py) as
an example where header aware splitting was done. You'll probably need to
implement something similar.

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Resend: Unnamed histories proliferating, can't get to my data

2012-10-31 Thread Karger, Amir
Hi. Resending because I got no response. Can anybody suggest anything that
might explain this, or tell me how I can troubleshoot? Where to look in
the Python code? Whether anybody has seen anything like this? Our beta
tester can't actually test anything. This occurs whether he does the
FTP-style upload or uploads through the browser.

Thanks,

-Amir Karger

On 10/23/12 2:42 PM, Karger, Amir amir_kar...@hms.harvard.edu wrote:

I'm using Galaxy from June, 2012. (Sorry if there's already a fix.)

We've got it working in production. We've gotten whole pipelines to run.
However, we occasionally get situations where we upload file (using the
FTP mechanism), which seems to be fine, but then I can't get to the data.
I went to Saved Histories, and selected Switch, and it outlined the line
in blue and wrote current history next to it. But the right pane still
shows Unnamed history with no data in it. Then if I go back to Saved
Histories, I get one or two new Unnamed histories, created within the last
few minutes.

I just tried to View the history, which worked (in the middle pane) and
clicked import and start using history. This seemed to work, but I got
three panes inside the middle pane! When I go back (again) to saved
histories, there are 3 histories - one the imported one with 2 steps, two
unnamed histories, all created  1 minute ago.

We just asked a beta tester to play with things, and he uploaded two
fastqs, but had what sounds like a similar problem.

Any thoughts on what's happening?

Thanks,

-Amir Karger
Research Computing
Harvard Medical School



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Incorrect chain order for SSL certificates on Galaxy main

2012-10-31 Thread Brad Chapman

Hi all;
I ran into SSL certification errors when using Java to connect to Galaxy
main via the API. My knowledge of this stuff is minimal, but I did some
searching and discovered that the certificate chain on Galaxy main is a problem:

https://www.ssllabs.com/ssltest/analyze.html?d=main.g2.bx.psu.edu

Looking at the chain with openssl shows a swap of the AddTrust and Internet2
certificates:

$ openssl s_client -connect main.g2.bx.psu.edu:443
CONNECTED(0003)
depth=2 C = SE, O = AddTrust AB, OU = AddTrust External TTP Network, CN = 
AddTrust External CA Root
verify error:num=19:self signed certificate in certificate chain
verify return:0
---
Certificate chain
 0 s:/C=US/postalCode=16802/ST=PA/L=University Park/O=The Pennsylvania State 
University/OU=Center for Comparative Genomics and 
Bioinformatics/CN=bigsky.bx.psu.edu
   i:/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
 1 s:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External 
CA Root
   i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External 
CA Root
 2 s:/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
   i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External 
CA Root
---

As a result, more picky verification mechanisms fail because of the self
signed certificate in the middle of the chain instead of as the root.

It appears you can fix this by adjusting the order of certificates
in nginx:

http://webmasters.stackexchange.com/questions/27842/how-to-prevent-ssl-certificate-chain-not-sorted/28074#28074
http://nginx.org/en/docs/http/configuring_https_servers.html#chains

Hope this helps,
Brad
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Incorrect chain order for SSL certificates on Galaxy main

2012-10-31 Thread Nate Coraor
On Oct 31, 2012, at 8:55 AM, Brad Chapman wrote:

 
 Hi all;
 I ran into SSL certification errors when using Java to connect to Galaxy
 main via the API. My knowledge of this stuff is minimal, but I did some
 searching and discovered that the certificate chain on Galaxy main is a 
 problem:
 
 https://www.ssllabs.com/ssltest/analyze.html?d=main.g2.bx.psu.edu
 
 Looking at the chain with openssl shows a swap of the AddTrust and Internet2
 certificates:
 
 $ openssl s_client -connect main.g2.bx.psu.edu:443
 CONNECTED(0003)
 depth=2 C = SE, O = AddTrust AB, OU = AddTrust External TTP Network, CN = 
 AddTrust External CA Root
 verify error:num=19:self signed certificate in certificate chain
 verify return:0
 ---
 Certificate chain
 0 s:/C=US/postalCode=16802/ST=PA/L=University Park/O=The Pennsylvania State 
 University/OU=Center for Comparative Genomics and 
 Bioinformatics/CN=bigsky.bx.psu.edu
   i:/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
 1 s:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External 
 CA Root
   i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External 
 CA Root
 2 s:/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
   i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External 
 CA Root
 ---
 
 As a result, more picky verification mechanisms fail because of the self
 signed certificate in the middle of the chain instead of as the root.
 
 It appears you can fix this by adjusting the order of certificates
 in nginx:
 
 http://webmasters.stackexchange.com/questions/27842/how-to-prevent-ssl-certificate-chain-not-sorted/28074#28074
 http://nginx.org/en/docs/http/configuring_https_servers.html#chains
 
 Hope this helps,
 Brad

Hi Brad,

Thanks for catching this.  It's been fixed.

--nate

 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Amazon

2012-10-31 Thread Scooter Willis
Started up a cluster on Amazon using the Launch a Galaxy Cloud Instance and got 
the following message. Since I don't have any control over where the instances 
are run not sure how I can control this. The last 4 or 5 times I have started 
up an existing instance has worked with no problem.

Messages (CRITICAL messages cannot be dismissed.)

 1.  [CRITICAL] Volume 'vol-f882ca85' is located in the wrong availability zone 
for this instance. You MUST terminate this instance and start a new one in zone 
'us-east-1a'. (2012-10-31 14:25:20)
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Amazon

2012-10-31 Thread Dannon Baker
For this instance, you'll need to restart using the old method for launching 
via the console, specifying the zone 1b.  Detection of the zone volumes are in 
for existing clusters and specifying those for launch is on the short list of 
things coming up for cloud launch. 

On Oct 31, 2012, at 10:50 AM, Scooter Willis hwil...@scripps.edu wrote:

 Tried it again and same error message. The volume was originally created in 
 us-east-1b and newly created instances are being started in us-east-1a. 
 Shouldn't the availability zone be set to us-east-1b when the instance is 
 requested or that info stored in the properties file in the S3 bucket?
 
 Any suggestions?
 
 From: Scooter Willis hwil...@scripps.edu
 Date: Wednesday, October 31, 2012 10:32 AM
 To: galaxy-dev@lists.bx.psu.edu galaxy-dev@lists.bx.psu.edu
 Subject: Amazon
 
 Started up a cluster on Amazon using the Launch a Galaxy Cloud Instance and 
 got the following message. Since I don't have any control over where the 
 instances are run not sure how I can control this. The last 4 or 5 times I 
 have started up an existing instance has worked with no problem.
 
 Messages (CRITICAL messages cannot be dismissed.)
 [CRITICAL] Volume 'vol-f882ca85' is located in the wrong availability zone 
 for this instance. You MUST terminate this instance and start a new one in 
 zone 'us-east-1a'. (2012-10-31 14:25:20)
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] which .loc file for SAM to BAM?

2012-10-31 Thread Andreas Kuntzagk

Hi,

I'm still setting up a local galaxy. Currently I'm testing the setup of NGS tools. If I try SAM to 
BAM for a BAM file that has hg18 set as build I get a message that
Sequences are not currently available for the specified build. I guess that I have either to 
manipulate one of the .loc files (but which?) or have to download additional data from rsync server.

(I already have the tool-data/shared/hg18 completely)

regards, Andreas

Btw. Do you have any plans to ease the pain on adding additional builds? Something simpler than 
having to add one line for each build*tool combo? These lines seem very redundant to me.


--
Andreas Kuntzagk

SystemAdministrator

Berlin Institute for Medical Systems Biology at the
Max-Delbrueck-Center for Molecular Medicine
Robert-Roessle-Str. 10, 13125 Berlin, Germany

http://www.mdc-berlin.de/en/bimsb/BIMSB_groups/Dieterich
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] which .loc file for SAM to BAM?

2012-10-31 Thread Carlos Borroto
On Wed, Oct 31, 2012 at 11:30 AM, Andreas Kuntzagk
andreas.kuntz...@mdc-berlin.de wrote:
 Hi,

 I'm still setting up a local galaxy. Currently I'm testing the setup of NGS
 tools. If I try SAM to BAM for a BAM file that has hg18 set as build I
 get a message that
 Sequences are not currently available for the specified build. I guess
 that I have either to manipulate one of the .loc files (but which?) or have
 to download additional data from rsync server.
 (I already have the tool-data/shared/hg18 completely)


The .loc file you want to modify is 'tool-data/sam_fa_indices.loc'.
You can find information about this subject in the wiki[1]. Although
the table there is not complete, so you could always find the right
xml under 'tools' and poke inside to find a line like this one:
validator type=dataset_metadata_in_file
filename=sam_fa_indices.loc metadata_name=dbkey
metadata_column=1 message=Sequences are not currently available for
the specified build. line_startswith=index /

[1]http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup

And I agree, dealing with .loc files is quite cumbersome.

Hope it helps,
Carlos
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Error trying to run functional tests on a single tool

2012-10-31 Thread Dan Tenenbaum
Hi,

I'm trying to test out the functional testing mechanism by running it
on an existing Galaxy tool.

First I ran
./run_functional_tests.sh -list

which produced a list of tools I can test. I chose 'vcf_annotate' and
tested it as follows:

./run_functional_tests.sh -id vcf_annotate

This produced a lot of output which included an exception trace. The
output was not conclusive as to whether the test ran or was
successful.

The output is too long for this mailing list but you can find it here:
https://gist.github.com/3988398

I am reluctant to try and excerpt the relevant bits because it's hard
for me to know what is relevant and what is not.

I am running the latest Galaxy (just did hg pull/hg update and migrated).
This is on a Mac OS X 10.7.4 machine with python 2.7.

When I run the same command on a linux machine, it works (though it
took me a while to find the test output; it was buried in a lot of
output that also contained (apparently irrelevant) stack traces).

So perhaps there is something wrong with my configuration.

Hope someone can help me out.

Also had a couple of newbie questions about the functional test framework.
1) Why does it use tool_conf.xml.sample instead of tool_conf.xml? Can
I change it to use tool_conf.xml? This way I do not need to add tools
to two places in order to test them. (Plus the name of
tool_conf.xml.sample indicates that it is just a demo file).
2) run_functional_tests.sh -list lists tools (such as 'upload1') that
do not have functional tests, so cannot (if my understanding is
correct) be tested with this script. Perhaps it would make more sense
not to list these tools?

Thanks,
Dan
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Accessing Galaxy API from Java

2012-10-31 Thread Brad Chapman

Scooter;
(cc'ing the dev list and updating the subject line in case others are 
interested)

 I have been looking for Java related API's to run workflows externally and
 haven't found anything searching message forums etc. Would like to
 automate data coming off up hiseq uploaded to Amazon S3 and then
 programmatically from external process import the fastq files and kick off
 a workflow to process. If you know of any docs or Java API for doing this
 kind of external control can you point me to it.

John Chilton has a Java library to access the API through Java:

https://github.com/jmchilton/blend4j

which should cover lots of this. If you're interested in other JVM
languages, I built a small Clojure wrapper around this to simplify some
tasks:

https://github.com/chapmanb/clj-blend

We'd definitely love to have more people involved, so if any
functionality you need is missing please feel free to submit
pull requests.

Brad
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] output name of downloaded datasets

2012-10-31 Thread Jeremy Goecks
Downloading data is handled in lib/galaxywebapps/galaxy/controllers/dataset.py, 
method display(), which in turn calls this line:

--
return data.datatype.display_data(trans, data, preview, filename, to_ext, 
chunk, **kwd)
--

Which, in most cases, calls display_data in lib/galaxy/datatypes/data.py

In this method, you can see how the download name is created:

--
valid_chars = 
'.,^_-()[]0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
fname = ''.join(c in valid_chars and c or '_' for c in data.name)[0:150]
--

Best,
J.

On Oct 31, 2012, at 12:37 PM, julie dubois wrote:

 Hello, 
 
 
 My goal is to introduce, in the xml file of one tool like MACS for example, a 
 supplementary command to redirect the output in another directory (+ creating 
 link between this and the directory of galaxy outputs).
 But I want to rename my output with the same name that the downloading tools 
 create in this way : GALAXY-NumOfDatasetInHistory[NameOfInput].bed
 
 And I don't find where this downloading tool is and so I don't find how 
 create this name.
 
 Thanks.
 julie
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Galaxy processing

2012-10-31 Thread Jeremy Goecks
 Where do I find info if the installed applications make use of multiple nodes 
 via MPI(etc) which would indicate the benefit of starting up X number of 
 nodes for faster processing?

You'll need to look at the individual tool documentation. In general, many 
tools uses multiple cores, few use MPI for multi-node computing.

 If a workflow has multiple initial inputs for say processing NGS exome data 
 from tumor and blood(gets compared later in the workflow) will each step get 
 sent to a different node(without a dependency) or will the entire workflow 
 run on one node?

If you've set up Galaxy to use a job scheduler (e.g. SGE/PBS), multiple nodes 
can be used. Multiple nodes will be used on the cloud:

http://wiki.g2.bx.psu.edu/CloudMan

 If I have NGS data for 20 patients sitting in a S3 bucket and want a specific 
 workflow run against each patient data input(s) does this require manual 
 selection of files by a user or can the workflow be automated?

Automation via the API is possible; unfortunately, most API documentation is in 
the Py/Sphinx docs for now, so you'll have to dig and/or use the sample scripts 
in galaxy_dir/scripts/api

 Can I programmatically start a workflow remotely(via REST) where I have 
 automated the process of uploading NGS data to S3 and know the input file(s) 
 per workflow?

Yes.

 Is it possible to present credentials in a workflow for downloading a file 
 via S3 where I require authentication before a file can be downloaded?

You can restrict dataset access using role-based security.

 Does a roadmap exist for what is planned in the future?

Roadmap at a very high level is in this presentation: 
http://wiki.g2.bx.psu.edu/Documents/Presentations/GCC2012?action=AttachFiledo=gettarget=State.pdf

 For example any additional tools NGS tools like Abyss going to make into the 
 build?

The framework is being separated from tools. The best place to look for tools 
is in the toolshed, where there is an abyss wrapper:

http://toolshed.g2.bx.psu.edu/

 Interested in NGS software that handles the dynamics of cancer for gene 
 fusion events, CNVs(etc) when dealing with NGS data.

There is active work on cancer tools for Galaxy. Keeping an eye on the toolshed 
is a good idea here.


Best,
J.

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] user management problem

2012-10-31 Thread Jordi Vaquero
Hello, 
I am trying to configure my galaxy instance and I have two problem. The first 
one is that I cannot delete users, I created some users for testing, I enabled 
the option on the universe_wsg.ini, and the button appears, but the users set 
only marked as deleted but they didn't disappear from the users list. Is that 
normal?

The second problem is that I am trying to set an email confirmation for ensure 
that the users email exists, there is any way to do that? I have introduced the 
email information on the ini file, but I cannot see any other option for 
enabling that.

Thanks to everyone for your help

Jordi


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Join version 1.0.0 error

2012-10-31 Thread Scooter Willis
Using large amazon instance

Trying to do an interval join of SNPs as output from pileup 120,000 
regions(5.5Mb) with with snp135Common 12,000,000(425Mb) and get the following 
errors. The goal is to pickup rs id's for known SNPs in the list of SNPs.

Is this a memory issue?

I was able to do the operation against chr1 as a test. Thought about chaining 
the outputs and doing against a file for each chromosome to make smaller files 
but then I have a mess where rs id's are in different columns.

71: Join on data 38 and data 36
0 bytes
An error occurred running this job: 
/opt/sge/default/spool/execd/ip-10-191-53-90/job_scripts/14: line 13: 5517 
Killed python /mnt/galaxyTools/galaxy-central/tools/new_operations/gops_join.py 
/mnt/galaxyData/files/000/dataset_75.dat 
/mnt/galaxyData/files/000/dataset_77.dat
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Join version 1.0.0 error

2012-10-31 Thread Scooter Willis
Did a subtract first to get a known list of rs SNPs that will be found in the 
tumor SNPs. That ran without error. Doing a join of the subtracted list of rs 
SNPs and the tumor SNPs.

So something different in the join code then in the subtract code.

From: Scooter Willis hwil...@scripps.edumailto:hwil...@scripps.edu
Date: Wednesday, October 31, 2012 5:42 PM
To: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu 
galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: Join version 1.0.0 error

Using large amazon instance

Trying to do an interval join of SNPs as output from pileup 120,000 
regions(5.5Mb) with with snp135Common 12,000,000(425Mb) and get the following 
errors. The goal is to pickup rs id's for known SNPs in the list of SNPs.

Is this a memory issue?

I was able to do the operation against chr1 as a test. Thought about chaining 
the outputs and doing against a file for each chromosome to make smaller files 
but then I have a mess where rs id's are in different columns.

71: Join on data 38 and data 36
0 bytes
An error occurred running this job: 
/opt/sge/default/spool/execd/ip-10-191-53-90/job_scripts/14: line 13: 5517 
Killed python /mnt/galaxyTools/galaxy-central/tools/new_operations/gops_join.py 
/mnt/galaxyData/files/000/dataset_75.dat 
/mnt/galaxyData/files/000/dataset_77.dat
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Empty TopHat output

2012-10-31 Thread Mohammad Heydarian
We are still getting empty TopHat output files on our Galaxy instance on
the cloud. We see that TopHat is generating data while the tool is running
(by monitoring our disk usage on the Amazon cloud), but the output is
empty files.

Is anyone else having this issue? Does anyone have any suggestions?

Many thanks in advance!


Cheers,
Mo Heydarian



On Mon, Oct 15, 2012 at 4:53 AM, Joachim Jacob joachim.ja...@vib.be wrote:

 The same here.

 Cheers,
 Joachim

 --
 Joachim Jacob, PhD

 Rijvisschestraat 120, 9052 Zwijnaarde
 Tel: +32 9 244.66.34
 Bioinformatics Training and Services (BITS)
 http://www.bits.vib.be
 @bitsatvib


 __**_
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Empty TopHat output

2012-10-31 Thread Jeremy Goecks
Given that this doesn't seem to be happening on our public server or on local 
instances, my best guess is that the issue is old code. Are you running the 
most recent dist?

J.

On Oct 31, 2012, at 7:37 PM, Mohammad Heydarian wrote:

 We are still getting empty TopHat output files on our Galaxy instance on the 
 cloud. We see that TopHat is generating data while the tool is running (by 
 monitoring our disk usage on the Amazon cloud), but the output is empty 
 files.
 
 Is anyone else having this issue? Does anyone have any suggestions?
 
 Many thanks in advance!
 
 
 Cheers, 
 Mo Heydarian
 
 
 
 On Mon, Oct 15, 2012 at 4:53 AM, Joachim Jacob joachim.ja...@vib.be wrote:
 The same here.
 
 Cheers,
 Joachim
 
 -- 
 Joachim Jacob, PhD
 
 Rijvisschestraat 120, 9052 Zwijnaarde
 Tel: +32 9 244.66.34
 Bioinformatics Training and Services (BITS)
 http://www.bits.vib.be
 @bitsatvib
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
 
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Empty TopHat output

2012-10-31 Thread Jeremy Goecks
In this case, it's useful to differentiate between (i) the AMI that Galaxy 
Cloud uses and (ii) the Galaxy code running on the cloud. I suspect that (ii) 
is out of data for you; this is not (yet) automatically updated, even when 
starting a new instance.

Try using the admin console to update to the most recent Galaxy dist using this 
URL:

https://bitbucket.org/galaxy/galaxy-dist/

(not galaxy-central, as is the default)

Best,
J.

On Oct 31, 2012, at 8:36 PM, Mohammad Heydarian wrote:

 We are running galaxy-cloudman-2011-03-22 (ami-da58aab3). 
 
 Our latest instance was loaded up just last week.
 
 Thanks!
 
 Cheers, 
 Mo Heydarian
 
 PhD candidate 
 The Johns Hopkins School of Medicine
 Department of Biological Chemistry 
 725 Wolfe Street
 414 Hunterian 
 Baltimore, MD 21205
 
 
 
 On Wed, Oct 31, 2012 at 8:30 PM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:
 Given that this doesn't seem to be happening on our public server or on local 
 instances, my best guess is that the issue is old code. Are you running the 
 most recent dist?
 
 J.
 
 
 On Oct 31, 2012, at 7:37 PM, Mohammad Heydarian wrote:
 
 We are still getting empty TopHat output files on our Galaxy instance on the 
 cloud. We see that TopHat is generating data while the tool is running (by 
 monitoring our disk usage on the Amazon cloud), but the output is empty 
 files.
 
 Is anyone else having this issue? Does anyone have any suggestions?
 
 Many thanks in advance!
 
 
 Cheers, 
 Mo Heydarian
 
 
 
 On Mon, Oct 15, 2012 at 4:53 AM, Joachim Jacob joachim.ja...@vib.be wrote:
 The same here.
 
 Cheers,
 Joachim
 
 -- 
 Joachim Jacob, PhD
 
 Rijvisschestraat 120, 9052 Zwijnaarde
 Tel: +32 9 244.66.34
 Bioinformatics Training and Services (BITS)
 http://www.bits.vib.be
 @bitsatvib
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
 
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
 
 

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Parallelism tag and job splitter

2012-10-31 Thread Edward Hills
Hi Peter, thanks again.

Turns out that it has been implemented by the looks of it in
lib/galaxy/datatypes/tabular.py under class Vcf. However, despite this, it
is always the Text class in data.py that is loaded and not the proper Vcf
one.

Can you point me in the direction of where the type is chosen?

Cheers,
Ed

On Wed, Oct 31, 2012 at 9:46 PM, Peter Cock p.j.a.c...@googlemail.comwrote:



 On Wednesday, October 31, 2012, Edward Hills wrote:

 Thanks Peter.

 My next question is, I have found that VCF files don't get split properly
 as the header is not included in the second file as is usually required by
 tools (such as vcf-subset). I have read the code and am happy to implement
 this functionality but am not to sure where this would best be done.

 I see a class Text ( data ) which looks like every datatype is sent to.
 Would it be best to implement a VCF class which is called when the datatype
 is VCF?

 Cheers,
 Ed


 VCF is I assume defined as a subclass of Text, so inherits the naive
 simple splitting implemented for text files (which doesn't know about
 headers).

 Have a look at the SAM splitting code (under lib/galaxy/datatypes/*.py) as
 an example where header aware splitting was done. You'll probably need to
 implement something similar.

 Peter


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] (no subject)

2012-10-31 Thread Sachit Adhikari
Hello Everyone,

I am about to write a syncing tool for Galaxy like Dropbox using Python
with the progress bar. How do I integrate it with galaxy? It would be easy
for client to upload files using the syncing tool. Are there any syncing
tools available for Galaxy?

Thanks
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Dataset upload fail

2012-10-31 Thread Vladimir Yamshchikov
Local install of Galaxy on SciLinux55. Fails to upload 5.2 GB fastq file
from local HD, while normally loading smaller fastq and fasta datasets
(less than 1 GB). Chunks of 1.2 GB size remain in */database/tmp, which
all represent beginning of the file that fails to upload. Several
attempts to upload made and several chunks of the same size are present.
Can I just copy the file dataset to database directory instead of
uploading through web interface?

This shows up when clicking the button Run this job again

⇝  Exception: Failed to get job information for dataset hid 5
clear this
clear this
URL: http://127.0.0.1:8080/tool_runner/rerun?id=8
Module weberror.evalexception.middleware:364 in respond view
  try:
__traceback_supplement__ = errormiddleware.Supplement,
self, environ
app_iter = self.application(environ,
detect_start_response)
try:
return_iter = list(app_iter)  app_iter =
self.application(environ, detect_start_response)
Module paste.debug.prints:98 in __call__ view
  try:
status, headers, body = wsgilib.intercept_output(
environ, self.app)
if status is None:
# Some error occurred  environ, self.app)
Module paste.wsgilib:539 in intercept_output view
  data.append(headers)
return output.write
app_iter = application(environ, replacement_start_response)
if data[0] is None:
return (None, None, app_iter)  app_iter =
application(environ, replacement_start_response)
Module paste.recursive:80 in __call__ view
  environ['paste.recursive.script_name'] = my_script_name
try:
return self.application(environ, start_response)
except ForwardRequestException, e:
middleware = CheckForRecursionMiddleware(  return
self.application(environ, start_response)
Module paste.httpexceptions:632 in __call__ view
 []).append(HTTPException)
try:
return self.application(environ, start_response)
except HTTPException, exc:
return exc(environ, start_response)  return
self.application(environ, start_response)
Module galaxy.web.framework.base:160 in __call__ view
  kwargs.pop( '_', None )
try:
body = method( trans, **kwargs )
except Exception, e:
body = self.handle_controller_exception( e, trans,
**kwargs )  body = method( trans, **kwargs )
Module galaxy.webapps.galaxy.controllers.tool_runner:129 in rerun
view
  job = data.creating_job
if not job:
raise Exception(Failed to get job information for
dataset hid %d % data.hid)
# Get the tool object
tool_id = job.tool_id  raise Exception(Failed to get job
information for dataset hid %d % data.hid)
Exception: Failed to get job information for dataset hid 5
URL: http://127.0.0.1:8080/tool_runner/rerun?id=8 File
'/home/yaximik/galaxy-dist/eggs/WebError-0.8a-py2.7.egg/weberror/evalexception/middleware.py',
 line 364 in respond app_iter = self.application(environ, 
detect_start_response) File 
'/home/yaximik/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/debug/prints.py', 
line 98 in __call__ environ, self.app) File 
'/home/yaximik/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/wsgilib.py', line 539 
in intercept_output app_iter = application(environ, replacement_start_response) 
File '/home/yaximik/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/recursive.py', 
line 80 in __call__ return self.application(environ, start_response) File 
'/home/yaximik/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/httpexceptions.py', 
line 632 in __call__ return self.application(environ, start_response) File 
'/home/yaximik/galaxy-dist/lib/galaxy/web/framework/base.py', line 160 in 
__call__ body = method( trans, **kwargs ) File 
'/home/yaximik/galaxy-dist/lib/galaxy/webapps/galaxy/controllers/tool_runner.py',
 line 129 in rerun raise Exception(Failed to get job information for dataset 
hid %d % data.hid) Exception: Failed to get job information for dataset hid 5
?xml version=1.0 ? traceback sysinfo language version=2.7
Python /language /sysinfo stack frame module
weberror.evalexception.middleware /module
filename 
/home/yaximik/galaxy-dist/eggs/WebError-0.8a-py2.7.egg/weberror/evalexception/middleware.py
 /filename line 364 /line function respond /function operation 
app_iter = self.application(environ, detect_start_response) /operation 
operation_context try: __traceback_supplement__ = errormiddleware.Supplement, 
self, environ app_iter = self.application(environ, detect_start_response) try: 
return_iter = list(app_iter) /operation_context /frame frame module 
paste.debug.prints /module filename 
/home/yaximik/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/debug/prints.py 
/filename line 98 /line function __call__