[galaxy-user] Cuffmerge not working with reference genome
Hello Mark, Do you mean the GenCode reference _/annotation/_ gtf dataset? This is not a reference genome. If you want to use a _/custom reference genome/_, this would be provided as a fasta file, using these instructions: https://wiki.galaxyproject.org/Support#Custom_reference_genome My guess is that you have done the following (since a gtf file would not be accepted as a custom reference genome unless you changed the datatype to be fasta): - provided the GenCode gtf file as a reference annotation dataset and for mapping used a built-in reference genome or mapped somewhere else. When there are issues, it is most often a reference genome chromosome identifier mismatch problem. This wiki section explains this issue in more detail: https://wiki.galaxyproject.org/Support#Reference_genomes. Sometimes this issue can be corrected by altering the identifiers in the gtf file to match those in the reference genome (what your fastq data was mapped to, convert BAM-SAM to see the identifiers if necessary). This FAQ shows one method: https://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_RNA-seq - https://usegalaxy.org/u/jeremy/p/transcriptome-analysis-faq#faq5 Hopefully the resources here help to resolve the issue! Best, Jen Galaxy team ps. This did not post to the mailing list because the to was not to just the mailing list. Please post new questions that way, or much better (since this mailing list will be retired very soon), please activate your account at Galaxy Biostar and post your question there (this has replaced the galaxy-user mailing list). Here is how: https://wiki.galaxyproject.org/Support#Biostar On 5/20/14 12:55 AM, Mark Lindsay wrote: Dear All when I run Cuffmerge using the latest GenCode v19 GTF as a reference genome….this fails to run…tried this using multiple approaches and datasets. Of note…...Cuffmerge works fine when the Gencode GTF is omitted as a reference genome. CuffCompare also works fine when the same Gencode GTF is used as a reference genome. Presumably there is something wrong with the Cuffmerge set-up relating to the reference genome? Or am I doing something wrong. Best wishes Mark -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User List is being replaced by the Galaxy Biostar User Support Forum at https://biostar.usegalaxy.org/ Posts to this list will be disabled in May 2014. In the meantime, you are encouraged to post all new questions to Galaxy Biostar. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] cuffmerge
Hello Janice, Are you using the public Main Galaxy instance at http://usegalaxy.org? Would you be able to submit this as a bug report? Thanks, Jen Galaxy team On 2/10/14 11:07 AM, Janice Patterson wrote: I am analyzing RNA-seq data and I ran Cufflinks with the genes.gtf as a reference annotation guide, with bias correction using the genome.fa as a reference. When I subsequently attempt to run cuffmerge on my assembled transcript files, however, I get the following errors, and cuffmerge fails. Error: duplicate GFF ID 'CUFF.1.1' encountered! [FAILED] Error: could not execute gtf_to_sam I am using the same genes.gtf file I used for cufflinks. It is the genes.gtf file attained from the Data Libraries provided by galaxy. I ran Cufflinks on the same data set without bias correction (and therefore a genome.fa file was unnecessary) and no multi-read correct, and subsequent cuffmerge with the same genes.gtf file provided ran just fine. Why did turning on bias correction and providing cufflinks with a fasta reference file making cuffmerge fail? Thanks. Janice Patterson ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] cuffmerge
I am analyzing RNA-seq data and I ran Cufflinks with the genes.gtf as a reference annotation guide, with bias correction using the genome.fa as a reference. When I subsequently attempt to run cuffmerge on my assembled transcript files, however, I get the following errors, and cuffmerge fails. Error: duplicate GFF ID 'CUFF.1.1' encountered! [FAILED] Error: could not execute gtf_to_sam I am using the same genes.gtf file I used for cufflinks. It is the genes.gtf file attained from the Data Libraries provided by galaxy. I ran Cufflinks on the same data set without bias correction (and therefore a genome.fa file was unnecessary) and no multi-read correct, and subsequent cuffmerge with the same genes.gtf file provided ran just fine. Why did turning on bias correction and providing cufflinks with a fasta reference file making cuffmerge fail? Thanks. Janice Patterson ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] Cuffmerge error: duplicate GFF ID encountered
Hello, I was doing a RNA analyse and I wished to compare the transcription and expression of two samples using a reference annotation, however this is the error message I got: =Quote= Error running cuffmerge. [Thu Jul 4 07:32:59 2013] Beginning transcriptome assembly merge --- [Thu Jul 4 07:32:59 2013] Preparing output location cm_output/ [Thu Jul 4 07:34:07 2013] Converting GTF files to SAM [07:34:07] Loading reference annotation. [07:34:07] Loading reference annotation. [Thu Jul 4 07:34:08 2013] Quantitating transcripts You are using Cufflinks v2.1.1, which is the most recent release. Command line: cufflinks -o cm_output/ -F 0.05 -g /galaxy/main_pool/pool7/files/006/446/dataset_6446730.dat -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 4 cm_output/tmp/mergeSam_fileIO17rb [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File cm_output/tmp/mergeSam_fileIO17rb doesn't appear to be a valid BAM file, trying SAM... [07:34:08] Loading reference annotation. [07:35:53] Inspecting reads and determining fragment length distribution. Processed 33854 loci. Map Properties: Normalized Map Mass: 8719.00 Raw Map Mass: 8719.00 Fragment Length Distribution: Truncated Gaussian (default) Default Mean: 200 Default Std Dev: 80 [07:35:53] Assembling transcripts and estimating abundances. Processed 33854 loci. [Thu Jul 4 07:39:29 2013] Comparing against reference file /galaxy/main_pool/pool7/files/006/446/dataset_6446730.dat You are using Cufflinks v2.1.1, which is the most recent release. Error: duplicate GFF ID 'ENST0361547.2' encountered! [FAILED] Error: could not execute cuffcompare ==End quote== The job goes well without the annotation reference. The annotation file I used can be downloaded here: ftp://ftp.sanger.ac.uk/pub/gencode/release_17/gencode.v17.annotation.gtf.gz Can anyone help me please? Thanks, Delong ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] Cuffmerge Error
Hello, The problem is most likely with the SAM files and sorting. Please see: Why won't my SAM dataset work with Cufflinks? http://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq#faq2 Best, Jen Galaxy team On 9/26/12 12:54 PM, Kenneth Auerbach wrote: Hi, I'm also getting these error when I run cuffmerge. It was run without a reference so it shouldn't have anything to do with the reference. Can you please tell me how to fix this? Thank you. -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] cuffmerge loses p_id
Hello Christopher, We did some testing here using your available data and are fairly certain that the problem is with the tool itself and not the Galaxy wrapper. Contacting the author and asking about the issue is probably the best way forward. If you have time, we would be glad to learn about the outcome. Best, Jen Galaxy team On 5/15/12 1:38 PM, Christopher M. Weber wrote: Hello, Problem: Cuffmerge loses p_id from reference genome in merged gtf file on Galaxy online server resulting in blank cds cuffdiff files. DATA Input to Cuffmerge or Cuffcompare: Two fly cufflinks transcript assemblies created from bam files in Galaxy server using reference annotation and bias correction. Options: Use Reference Annotation: YES UCSC DM3 genes gtf (D. melanogaster) or ENSMBL 5.25 genes gtf Use Sequence Data: YES Result: tss_id found in cuffmerge but no p_id with either reference annotation file. Examples included below. Reference: 2L protein_coding stop_codon 86088610. + 0 exon_number 2; gene_id FBgn0031208; gene_name CG11023; p_id P13746; transcript_id FBtr0300689; transcript_name CG11023-RB; tss_id TSS8369; Cuffmerge: Cufflinks exon81939484. + . gene_id XLOC_01; transcript_id TCONS_0001; exon_number 2; gene_name CG11023; oId FBtr0300689; nearest_ref FBtr0300689; class_code =; tss_id TSS1; 2L Cufflinks exon66721 67003 . + . gene_id XLOC_02; transcript_id TCONS_0003; exon_number 1; gene_name dbr; oId CUFF.1.1; nearest_ref FBtr0078100; class_code j; tss_id TSS2; Help is much appreciated, Thanks! ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] cuffmerge loses p_id
Hello Christopher, I see that this was also posted at SeqAnswers yesterday. I searched and didn't see anything specifically in the manual or at seqanswers that addresses this (but if someone else can find it, please add to this thread): http://seqanswers.com/forums/showthread.php?t=20084 Do you think this is a problem with the Cuffmerge program itself? If so, going to the authors is the next step by emailing tophat.cuffli...@gmail.com http://wiki.g2.bx.psu.edu/Support#Interpreting_scientific_results (See: /Example/: *unexpected results with* /*RNA-seq analysis*/ *tools.)* * *If you think this is a problem may be with the Galaxy wrapper around CuffMerge, we can take a closer look. Please share a link to your history containing this data. Use Options (top 'gear' icon in history panel) - Share or publish, generate a share link, copy link, and email link back directly to me. Note the dataset #'s involved and leave all datasets unhidden/undeleted. Thanks! Jen Galaxy team On 5/15/12 1:38 PM, Christopher M. Weber wrote: Hello, Problem: Cuffmerge loses p_id from reference genome in merged gtf file on Galaxy online server resulting in blank cds cuffdiff files. DATA Input to Cuffmerge or Cuffcompare: Two fly cufflinks transcript assemblies created from bam files in Galaxy server using reference annotation and bias correction. Options: Use Reference Annotation: YES UCSC DM3 genes gtf (D. melanogaster) or ENSMBL 5.25 genes gtf Use Sequence Data: YES Result: tss_id found in cuffmerge but no p_id with either reference annotation file. Examples included below. Reference: 2L protein_coding stop_codon 8608 8610 . + 0 exon_number 2; gene_id FBgn0031208; gene_name CG11023; p_id P13746; transcript_id FBtr0300689; transcript_name CG11023-RB; tss_id TSS8369; Cuffmerge: Cufflinks exon 8193 9484 . + . gene_id XLOC_01; transcript_id TCONS_0001; exon_number 2; gene_name CG11023; oId FBtr0300689; nearest_ref FBtr0300689; class_code =; tss_id TSS1; 2L Cufflinks exon 66721 67003 . + . gene_id XLOC_02; transcript_id TCONS_0003; exon_number 1; gene_name dbr; oId CUFF.1.1; nearest_ref FBtr0078100; class_code j; tss_id TSS2; Help is much appreciated, Thanks! ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] cuffmerge loses p_id
Hello, Problem: Cuffmerge loses p_id from reference genome in merged gtf file on Galaxy online server resulting in blank cds cuffdiff files. DATA Input to Cuffmerge or Cuffcompare: Two fly cufflinks transcript assemblies created from bam files in Galaxy server using reference annotation and bias correction. Options: Use Reference Annotation: YES UCSC DM3 genes gtf (D. melanogaster) or ENSMBL 5.25 genes gtf Use Sequence Data: YES Result: tss_id found in cuffmerge but no p_id with either reference annotation file. Examples included below. Reference: 2L protein_coding stop_codon 8608 8610 . + 0 exon_number 2; gene_id FBgn0031208; gene_name CG11023; p_id P13746; transcript_id FBtr0300689; transcript_name CG11023-RB; tss_id TSS8369; Cuffmerge: Cufflinks exon 8193 9484 . + . gene_id XLOC_01; transcript_id TCONS_0001; exon_number 2; gene_name CG11023; oId FBtr0300689; nearest_ref FBtr0300689; class_code =; tss_id TSS1; 2L Cufflinks exon 66721 67003 . + . gene_id XLOC_02; transcript_id TCONS_0003; exon_number 1; gene_name dbr; oId CUFF.1.1; nearest_ref FBtr0078100; class_code j; tss_id TSS2; Help is much appreciated, Thanks! ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] cuffmerge question
I'm new to RNA-Seq analysis and think this question must have been asked before, but I can't find an answer in the Galaxy-user or Seq-answers archives. My question is, if I use the public Galaxy server interface to TopHat and Cufflinks, is there any access to cuffmerge? Also, I'm trying to understand the difference between using cuffmerge and then using cuffcompare (without a reference genome) to assemble gtf transcript files produced by Cufflinks for each group of 3 Illumina paired-end reads corresponding to biological replicates, in order to use the resulting combined gtf file for comparing the TopHat alignments of two such groups using cuffdiff. Is there any difference in the output between cuffdiff and cuffcompare, using in this fashion? For example, do they form the union of transcripts by the same rules, and do their outputs contain (or lack) the same columns (strand, perhaps??) I've read things on seq-answers indicating that I should be using cuffmerge, but I can't find it on the public server and apparently haven't installed it properly on my own computer so far. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] cuffmerge question
Carol, My question is, if I use the public Galaxy server interface to TopHat and Cufflinks, is there any access to cuffmerge? No, Cuffmerge is not available in Galaxy. Also, I'm trying to understand the difference between using cuffmerge and then using cuffcompare (without a reference genome) to assemble gtf transcript files produced by Cufflinks for each group of 3 Illumina paired-end reads corresponding to biological replicates, in order to use the resulting combined gtf file for comparing the TopHat alignments of two such groups using cuffdiff. Is there any difference in the output between cuffdiff and cuffcompare, using in this fashion? For example, do they form the union of transcripts by the same rules, and do their outputs contain (or lack) the same columns (strand, perhaps??) I've read things on seq-answers indicating that I should be using cuffmerge, but I can't find it on the public server and apparently haven't installed it properly on my own computer so far. From the Cufflinks/compare/merge/diff documentation ( http://cufflinks.cbcb.umd.edu/manual.html#cuffmerge ): *Cuffmerge calls Cuffcompare and does some filtering of transfrags as well as merging of novel and known isoforms; *The main purpose of this script is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. Hence, it appears that Cuffmerge and Cuffcompare are relatively similar and use the same basic union algorithm--whatever Cuffcompare uses. If you have more detailed questioned, you might ask the Cufflinks' authors: tophat.cuffli...@gmail.com Good luck, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/