[galaxy-user] problem using Depth of Coverage (GATK)
Hello, I´m trying to use depth of coverage to check the coverage of my reads. I already have the bam files (created with sam to bam) but they are still not recognized by depth of coverage and I got this error message: Sequences are not currently available for the specified build I used human (homo sapiens) hg19 full for mapping but I can´t select it, it only allows b37 version. I tried to change the build in edit parameters to b37 and then it is recognized but I got another error at the end of the analysis. Any suggestions? Thank you very much in advance Gema ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Peak Overlap Analysis, allowing space between overlapping peaks
Hello Eric, I am not sure if you have already explored the other tools in this same tool group or not, but if you haven't, the tool Cluster the intervals of a dataset may be what you are looking for. Help/graphics for usage is on the tool form itself. Note that the coordinates would not be strictly for the overlapping base regions only, but that there are some output alternatives. Protocol 4 from this publication also contains a review and example usage tutorial for interval operation tools: https://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 Often one tool does not fit all cases, and a combination of tools is best. For example, if you just wanted coordinates for the overlapping portions (including the specified distance), perhaps start by using a tool like Get flanks, set your desired distance and base off the peaks, merge the result back with the peaks to create the query interval sets, then intersect those extended peaks. This is just a general idea of how to string together tools - other/additional manipulations may be needed to create a complete workflow to meet your exact goals. Take care, Jen Galaxy team On 3/29/13 1:30 PM, Eric Van Otterloo wrote: Hello - I have been trying to find a solution to identify overlapping peaks between two ChIP-Seq datasets. I have used the Intersect the intervals of two datasets function, under the _Operate on Genomic Intervals_ toolset - however, I would like to be able to specify a given distance between the peaks to still be counted as overlapping, and this tool requires at least 1bp overlap between peaks to be counted. For example, even if two peaks are within 500bp of each other (but don't overlap) I would like to score this as overlapping and get the resulting genomic coordinates for downstream analysis. Thanks in advance for your help! Eric Van Otterloo ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] FW: Change format with edit attributes
Gema, Are you issues with getting data into the correct format resolved? I see that Dan and others provided all of the help, but the times that these all posted along with your posts varied and there are a few threads, so I wanted to be sure you had what you needed. To be clear - you will need to submit data with fastqsanger format to the mapping tool. If you only have fasta, then using the tool NGS: QC and manipulation - Combine FASTA and QUAL is the correct choice. You can do this before or after splitting. The assignment to fastqsanger can also be done before or after splitting. The issue you were most likely originally facing was leaving the data assigned as simply fastq (and possibly assigning fasta data as fastq). This wiki has related help about datatypes and tools. I also added in a new line to cover this specific use case, should it help others: http://wiki.galaxyproject.org/Support#Tool_doesn.27t_recognize_dataset I see that you have another question about genomes and GATK - I will respond to that thread separately. Best, Jen Galaxy team On 4/3/13 7:57 AM, Gema Sanz Santos wrote: Hi Peter, Thank you for your fast answer. I just want to know how can I use output files from Barcode splitter to use them into Bowtie for Illumina because I can´t see any tool to convert FASTA to FASTAQ. How can I continue with the mapping using the files from Barcode splitter? Best, Gema From: Peter Cock p.j.a.c...@googlemail.com mailto:p.j.a.c...@googlemail.com Date: Wednesday, April 3, 2013 4:42 PM To: Gema Sanz Santos ge2sa...@gmail.com mailto:ge2sa...@gmail.com Cc: galaxy-user@lists.bx.psu.edu mailto:galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Change format with edit attributes On Wed, Apr 3, 2013 at 3:40 PM, Gema Sanz Santos ge2sa...@gmail.com mailto:ge2sa...@gmail.com wrote: Hello, I'm trying to change the format to the output files from Barcode splitter from FASTA to FASTAQ so I can use them in Bowtie for Illumina. I've read that it can be done through the edit attributes, I go to datatype and select fastaq, save and then go to convert format and press convert but the resulting file is 0 bytes and is not recognized by Bowtie. I´ve also tried to upload by copying the link and selecting fastaq as format but in this case, I got the file shown in the picture and it is not recognized by Bowtie again. What can I do?? I don´t know how to continue because I´m not able to change the format to fastaq! Thank you very much for your help in advance Best, Gema Hi Gema, There seem to be several factors confusing you here. The screenshot shows FASTA data wrongly labelled as FASTQ. The Galaxy edit attributes does NOT actually edit the data. There are separate tools which can convert from one format to another, which gives you a new entry in the history (another green box on the right). You can convert from FASTQ to FASTA, but doing the opposite is not possible without inventing quality scores (e.g. give everything score 30). Does that help? Peter ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] merging fastq files
Hi Andrew, Merging the data prior to upload would probably be simplest. Files in a galaxy history are not in .tar format at this time. Loading forward and reverse separately will most likely be important from a scientific perspective for analysis. Once ready for upload, you can tar or gz - as long as each load is a single file - or leave uncompressed - either is fine. Using FTP is required for larger data (= 2G) and using a client that will allow you to track progress/resume an interrupted load can be helpful. Each file can be up to 50G in size if you have an account. http://wiki.galaxyproject.org/FTPUpload Hopefully this helps, Jen Galaxy team On 4/5/13 3:20 AM, Thompson, Andrew wrote: Hi I have received Illumina paired-end genome sequence data as a .tar file. When unpacked the data for each genome accession is split into about 100 fastq files. Total of about 37 Gpb per genome. Can you recommend the best way to organise this data prior to mapping to reference genome? I can concatenate unpacked files using DOS command line into forward and reverse before uploading: is this the best approach? Is there a tools that will start with the .tar file? Andrew ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Regarding a cuffdiff output
Hi Yona, Yes, the GTF file is most likely the problem due to it lacking certain attributes that Cuffdiff requires to perform these calculations. You will also want to double check that the reference genome and GTF file (where you source it next) are an exact match - both the genome build and the identifier format. If either are not a match, you will not get the expected or full results that Cuffdiff can produce. This wiki has some help; http://wiki.galaxyproject.org/Support#Interpreting_scientific_results See Tools on the Main server: Example ? RNA-seq analysis tools. The links to the Cufflinks web site explains the attributes that Cuffdiff is looking for, links to the iGenomes datasets available (best to use if your genome is represented), and a pointer to the tool's user group. Two iGenomes GTF files are also already available in Galaxy (hg19, mm9) in Shared Data - Data Libraries - iGenomes. The link to our tutorial and FAQ has help about how the GTF files are used along with troubleshooting advice. Best, Jen Galaxy team On 4/3/13 8:28 AM, Yona Kim wrote: Dear galaxy users Hello. I have a quick question about Cuffdiff analysis. I have obtained two SRA files and converted them to fastq files which were uploaded to Galaxy via FTP server. My analysis was followed by Fastq groomer, Tophat, Cufflinks, Cuffcompare, and eventually Cuffdiff. (Gene annotation was also downloaded from UCSC table browser in GTF format) I've downloaded gene differential expression testing, one of the output files of Cuffdiff, and viewed it in excel sheet. However, I have only zeros recorded for value_1, value_2, log2, test_stat and only ones recorded for p_value and q_value. Is it likely that I might have obtained wrong gene annotation file and caused this problem? Thank you Yona Kim Department of Genetics Rutgers University - New Brunswick Campus ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] Extracting sequences for transcripts from reference genome
Dear Galaxy community I'm new to galaxy and would like to ask the following: I have trimmed, QC'ed my data received from Illumina HiScan SQ, paired and single end data. Mapped using Tophat, run cufflinks, cuffmerge and cuffdiff. I would like to analyze the gene_exp.diff file by extracting the significant transcripts. I've used grep yes to extract only the significant transcripts. From this info I have the locus start and end coordinates of each transcript for example XLOC_000544XLOC_000544-chr1:12763969-12765675C0 C4OK3.164871628.259.00696-4.570224.8722e-06 0.00905256yes. How can I go about to extract this information/or sequence from the reference genome. Kind regards Lizex This message is confidential and may be covered by legal professional privilege. It must not be read, copied, disclosed or used in any other manner by any person other than the addressee(s). Unauthorised use, disclosure or copying is strictly prohibited and may be unlawful. The views expressed in this email are those of the sender, unless otherwise stated. If you have received this email in error, please contact ARC Service Desk immediately. (mailto:serviced...@arc.agric.za) To report incidents of fraud and / or corruption in the ARC use our Ethics Hotline by: Phone number : 0800 000 604 Fax number : 0800 00 7788 Email address : a...@tip-offs.com Please Call me : 32840 Website: www.tip-offs.com For more information on the ARC Ethics Hotline, please visit our website at www.arc.agric.za. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?
Hi All, I have a very basic question. I have RNA-seq datasets of several cell types and want to compare the alternative splicing events between cell types. The reads are 36nt in length. Are these reads long enough to map on the splicing jucntions accurately when I run Tophat with stringent parameters (no mismatch)? Thanks. Best, Jianguang Du ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/