[galaxy-user] problem using Depth of Coverage (GATK)

2013-04-08 Thread Gema Sanz
Hello, I´m trying to use depth of coverage to check the coverage of my
reads. I already have the bam files (created with sam to bam) but they are
still not recognized by depth of coverage and I got this error message:

Sequences are not currently available for the specified build

I used human (homo sapiens) hg19 full for mapping but I can´t select it,
it only allows b37 version. I tried to change the build in edit parameters
to b37 and then it is recognized but I got another error at the end of the
analysis.

Any suggestions?

Thank you very much in advance

Gema

 

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Peak Overlap Analysis, allowing space between overlapping peaks

2013-04-08 Thread Jennifer Jackson

Hello Eric,

I am not sure if you have already explored the other tools in this same 
tool group or not, but if you haven't, the tool  Cluster the intervals 
of a dataset may be what you are looking for. Help/graphics for usage 
is on the tool form itself. Note that the coordinates would not be 
strictly for the overlapping base regions only, but that there are 
some output alternatives.


Protocol 4 from this publication also contains a review and example 
usage tutorial for interval operation tools:

https://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012

Often one tool does not fit all cases, and a combination of tools is 
best. For example, if you just wanted coordinates for the overlapping 
portions (including the specified distance), perhaps start by using a 
tool like Get flanks, set your desired distance and base off the 
peaks, merge the result back with the peaks to create the query interval 
sets, then intersect those extended peaks. This is just a general idea 
of how to string together tools - other/additional manipulations may be 
needed to create a complete workflow to meet your exact goals.


Take care,

Jen
Galaxy team

On 3/29/13 1:30 PM, Eric Van Otterloo wrote:

Hello -

I have been trying to find a solution to identify overlapping peaks 
between two ChIP-Seq datasets.  I have used the Intersect the 
intervals of two datasets function, under the _Operate on Genomic 
Intervals_ toolset - however, I would like to be able to specify a 
given distance between the peaks to still be counted as overlapping, 
and this tool requires at least 1bp overlap between peaks to be 
counted.  For example, even if two peaks are within 500bp of each 
other (but don't overlap) I would like to score this as overlapping 
and get the resulting genomic coordinates for downstream analysis.


Thanks in advance for your help!

Eric Van Otterloo


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] FW: Change format with edit attributes

2013-04-08 Thread Jennifer Jackson

Gema,

Are you issues with getting data into the correct format resolved? I see 
that Dan and others provided all of the help, but the times that these 
all posted along with your posts varied and there are a few threads, so 
I wanted to be sure you had what you needed.


To be clear - you will need to submit data with fastqsanger format to 
the mapping tool. If you only have fasta, then using the tool NGS: QC 
and manipulation - Combine FASTA and QUAL is the correct choice. You 
can do this before or after splitting. The assignment to fastqsanger 
can also be done before or after splitting. The issue you were most 
likely originally facing was leaving the data assigned as simply fastq 
(and possibly assigning fasta data as fastq).


This wiki has related help about datatypes and tools. I also added in a 
new line to cover this specific use case, should it help others:

http://wiki.galaxyproject.org/Support#Tool_doesn.27t_recognize_dataset

I see that you have another question about genomes and GATK - I will 
respond to that thread separately.


Best,

Jen
Galaxy team
On 4/3/13 7:57 AM, Gema Sanz Santos wrote:

Hi Peter,

Thank you for your fast answer.

I just want to know how can I use output files from Barcode splitter 
to use them into Bowtie for Illumina because I can´t see any tool to 
convert FASTA to FASTAQ. How can I continue with the mapping using the 
files from Barcode splitter?


Best,
Gema

From: Peter Cock p.j.a.c...@googlemail.com 
mailto:p.j.a.c...@googlemail.com

Date: Wednesday, April 3, 2013 4:42 PM
To: Gema Sanz Santos ge2sa...@gmail.com mailto:ge2sa...@gmail.com
Cc: galaxy-user@lists.bx.psu.edu mailto:galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Change format with edit attributes



On Wed, Apr 3, 2013 at 3:40 PM, Gema Sanz Santos ge2sa...@gmail.com 
mailto:ge2sa...@gmail.com wrote:


Hello,

I'm trying to change the format to the output files from Barcode
splitter from FASTA to FASTAQ so I can use them in Bowtie for
Illumina. I've read that it can be done through the edit
attributes, I go to datatype and select fastaq, save and then go
to convert format and press convert but the resulting file is 0
bytes and is not recognized by Bowtie.

I´ve also tried to upload by copying the link and selecting fastaq
as format but in this case, I got the file shown in the picture
and it is not recognized by Bowtie again.


What can I do?? I don´t know how to continue because I´m not able
to change the format to fastaq!

Thank you very much for your help in advance

Best,
Gema


Hi Gema,

There seem to be several factors confusing you here.

The screenshot shows FASTA data wrongly labelled as FASTQ.

The Galaxy edit attributes does NOT actually edit the data. There 
are separate tools which can convert from one format to another, which 
gives you a new entry in the history (another green box on the right).


You can convert from FASTQ to FASTA, but doing the opposite is not 
possible without inventing quality scores (e.g. give everything score 30).


Does that help?

Peter



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] merging fastq files

2013-04-08 Thread Jennifer Jackson

Hi Andrew,

Merging the data prior to upload would probably be simplest. Files in a 
galaxy history are not in .tar format at this time.


Loading forward and reverse separately will most likely be important 
from a scientific perspective for analysis.


Once ready for upload, you can tar or gz - as long as each load is a 
single file - or leave uncompressed - either is fine. Using FTP is 
required for larger data (= 2G) and using a client that will allow you 
to track progress/resume an interrupted load can be helpful. Each file 
can be up to 50G in size if you have an account.

http://wiki.galaxyproject.org/FTPUpload

Hopefully this helps,

Jen
Galaxy team

On 4/5/13 3:20 AM, Thompson, Andrew wrote:


Hi

I have received Illumina paired-end genome sequence data as a .tar 
file. When unpacked the data for each genome accession is split into 
about 100 fastq files. Total of about 37 Gpb per genome.


Can you recommend the best way to organise this data prior to mapping 
to reference genome?


I can concatenate unpacked files using DOS command line into forward 
and reverse before uploading: is this the best approach? Is there a 
tools that will start with the .tar file?


Andrew



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Regarding a cuffdiff output

2013-04-08 Thread Jennifer Jackson

Hi Yona,

Yes, the GTF file is most likely the problem due to it lacking certain 
attributes that Cuffdiff requires to perform these calculations. You 
will also want to double check that the reference genome and GTF file 
(where you source it next) are an exact match - both the genome build 
and the identifier format. If either are not a match, you will not get 
the expected or full results that Cuffdiff can produce.


This wiki has some help;
http://wiki.galaxyproject.org/Support#Interpreting_scientific_results
See Tools on the Main server: Example ? RNA-seq analysis tools.

The links to the Cufflinks web site explains the attributes that 
Cuffdiff is looking for, links to the iGenomes datasets available (best 
to use if your genome is represented), and a pointer to the tool's user 
group. Two iGenomes GTF files are also already available in Galaxy 
(hg19, mm9) in Shared Data - Data Libraries - iGenomes. The link to 
our tutorial and FAQ has help about how the GTF files are used along 
with troubleshooting advice.


Best,

Jen
Galaxy team

On 4/3/13 8:28 AM, Yona Kim wrote:

Dear galaxy users

Hello. I have a quick question about Cuffdiff analysis.
I have obtained two SRA files and converted them to fastq files which 
were uploaded to Galaxy via FTP server. My analysis was followed by 
Fastq groomer, Tophat, Cufflinks, Cuffcompare, and eventually 
Cuffdiff. (Gene annotation was also downloaded from UCSC table browser 
in GTF format) I've downloaded gene differential expression testing, 
one of the output files of Cuffdiff, and viewed it in excel sheet. 
However, I have only zeros recorded for value_1, value_2, log2, 
test_stat and only ones recorded for p_value and q_value.


Is it likely that I might have obtained wrong gene annotation file and 
caused this problem?


Thank you

Yona Kim
Department of Genetics
Rutgers University - New Brunswick Campus


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

[galaxy-user] Extracting sequences for transcripts from reference genome

2013-04-08 Thread Lizex Husselmann
Dear Galaxy community

I'm new to galaxy and would like to ask the following:
I have trimmed, QC'ed my data received from Illumina HiScan SQ, paired and 
single end data. Mapped using Tophat, run cufflinks, cuffmerge and cuffdiff. I 
would like to analyze the gene_exp.diff file by extracting the significant 
transcripts. I've used grep yes to extract only the significant transcripts. 
From this info I have the locus start and end coordinates of each transcript 
for example XLOC_000544XLOC_000544-chr1:12763969-12765675C0
C4OK3.164871628.259.00696-4.570224.8722e-06
0.00905256yes. 
How can I go about to extract this information/or sequence from the reference 
genome.

Kind regards

Lizex 
This message is confidential and may be covered by legal professional 
privilege. It must not be read, copied, disclosed or used in any other manner 
by any person other than the addressee(s). Unauthorised use, disclosure or 
copying is strictly prohibited and may be unlawful. The views expressed in this 
email are those of the sender, unless otherwise stated. If you have received 
this email in error, please contact ARC Service Desk immediately. 
(mailto:serviced...@arc.agric.za)

To report incidents of fraud and / or corruption in the ARC use our Ethics 
Hotline by:

Phone number  : 0800 000 604
Fax number  : 0800 00 7788
Email address   : a...@tip-offs.com
 Please Call me  : 32840
 Website: www.tip-offs.com

For more information on the ARC Ethics Hotline, please visit our website at 
www.arc.agric.za.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

[galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-08 Thread Du, Jianguang
Hi All,

I have a very basic question. I have RNA-seq datasets of several cell types and 
want to compare the alternative splicing events between cell types. The reads 
are 36nt in length. Are these reads long enough to map on the splicing 
jucntions accurately when I run Tophat with stringent parameters (no mismatch)?

Thanks.

Best,

Jianguang Du


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/