Re: [galaxy-user] downstream analysis of cuffdiff out put
Vasu, Please reply to the mailing list as emails to individual Galaxy developers often get lost, and there are others on the list that might be able to help you or benefit from this discussion. Now, to your question: you're using the wrong GFF filtering tool, which is an easy mistake to make as there are many of them. You want to use Filter and Sort -- GFF -- Filter GTF data by attribute values list. Using this tool, I was able to filter dataset 11--a GTF file produced by Cuffcompare--using a Cuffdiff isoform expression file (dataset 10) on transcript_id. I've shared the modified history with you. Best, J. On Apr 18, 2011, at 3:29 PM, vasu punj wrote: Hi Jeremy, I have been trying to use the tool mentioned in this message. I have two samples comparison 6 and 5 and has run Cufflink/ Cuffcompare/ Cuffdiff. I have filtred the files for c12 i.e for significant analysis and file is 10 uploaded as B_A Cuffdiff isoform expr filtered.txt I uploaded the second file 11 B_A_Homo_sapiens.GRCh37.60.clean.combined which is a combined GTF file generated by cuffcompare. When I tried to run filter combined transcript file using: Combined GTF as Cufflink assembled transcripts (11) and Cuffcompare tracking file as Cuffdiff isoform exp filtered file using sample no as 2, it return an empty file (12) Than thinking that perhaps it may be tracking file which I may have to use instead of combined GTF. I used B-A combined tracking file in place of combined GTF file but it will pop up only in Cuffcompare tracking file It may not be right but I used File 13 as tracking file with combined GTF as as assembled transcript still it return empty out put I have also shared history with you. Would you like to point me what is going on here? Thanks. Vasu --- On Mon, 4/11/11, Jeremy Goecks jeremy.goe...@emory.edu wrote: From: Jeremy Goecks jeremy.goe...@emory.edu Subject: Re: [galaxy-user] downstream analysis of cuffdiff out put To: shamsher jagat kanwar...@gmail.com Cc: galaxy-user galaxy-user@lists.bx.psu.edu Date: Monday, April 11, 2011, 9:04 AM On Thu, Mar 10, 2011 at 7:55 AM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Jagat, Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Hello all, I've added a tool called 'Filter GTF file by attribute values list' to the galaxy-central code repository. This tool is available on our test server ( http://test.g2.bx.psu.edu/ ) at Filter and Sort -- GFF -- Filter GTF data by attribute values list and will be available on our main server in the next few weeks. As expected, this tool filters a GTF file based on a list of attribute values--or filters using a tabular file where attribute values are first column, as is the case for Cuffdiff output files. Potential attributes that can be filtered on include transcript_id, gene_id, tss_id, and p_id; conveniently, these are the IDs that Cuffdiff uses in its output files. Here's an example workflow: (1) Run Cufflinks/compare/diff (2) Filter Cufflinks isoform differential expression file for transcripts that are differentially expressed; in other words, filter for c12=='yes' (2) Use 'Filter GTF data by attribute values list' to filter Cuffcompare combined transcripts using the filtered file from step (2) as the attribute values list and, voila, you have a GTF file of the differentially expressed transcripts that you can view in your favorite genome browser. Hope this helps; feedback is always welcome. Best, J. -Inline Attachment Follows- ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu
Re: [galaxy-user] downstream analysis of cuffdiff out put
On Thu, Mar 10, 2011 at 7:55 AM, Jeremy Goecks jeremy.goe...@emory.edu wrote: Jagat, Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Hello all, I've added a tool called 'Filter GTF file by attribute values list' to the galaxy-central code repository. This tool is available on our test server ( http://test.g2.bx.psu.edu/ ) at Filter and Sort -- GFF -- Filter GTF data by attribute values list and will be available on our main server in the next few weeks. As expected, this tool filters a GTF file based on a list of attribute values--or filters using a tabular file where attribute values are first column, as is the case for Cuffdiff output files. Potential attributes that can be filtered on include transcript_id, gene_id, tss_id, and p_id; conveniently, these are the IDs that Cuffdiff uses in its output files. Here's an example workflow: (1) Run Cufflinks/compare/diff (2) Filter Cufflinks isoform differential expression file for transcripts that are differentially expressed; in other words, filter for c12=='yes' (2) Use 'Filter GTF data by attribute values list' to filter Cuffcompare combined transcripts using the filtered file from step (2) as the attribute values list and, voila, you have a GTF file of the differentially expressed transcripts that you can view in your favorite genome browser. Hope this helps; feedback is always welcome. Best, J.___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] downstream analysis of cuffdiff out put
Jagat, Please send queries such as these to the galaxy-user mailing list (cc'd); there are many users on the list who can contribute to this discussion, and there are many additional users that will benefit from this discussion. I was wondering if you can point me to a documentation or URL to guide how to perform the downstream analysis once we have cuffdiff out put. In general, I agree that tools are needed to further process cufflinks/compare/diff outputs, but I'm not aware of any that are publicly available. Let's open this issue up for discussion and see if we can reach a consensus about tools might be useful. Everyone, please feel free to contribute ideas/tools; note that the Galaxy Tool Shed is a nice place for sharing tools you've built for Galaxy: http://community.g2.bx.psu.edu/ Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] downstream analysis of cuffdiff out put
Hi All, I agree with this problem and solution. I have a lot of cufflinks, cuffcompare and cuffdiff output but I am struggling to relate what this means in terms of the real world! I have seen partek software attempt to visualise some of the data it generates which appears to be using the FMI data in the cufflinks suite but beyond that I struggle. I did have an email conversation with Cole Trapnell which eventually centred on the idea that you just have to trust the analysis and then go away and do the RT-PCR to check it all out! So for tools I think: 1. A tool that shows you the layout of known isoforms for a gene and the FMI data for each isoform. er. thats it for now from me! But I also struggle to understand what all the other outputs really mean! What does the CDS.diff output tell us? What dies the promoters.diff output tell us? I know what the cufflinks manual says but I struggle to convert this in my head to what is happening to an actual gene so if anyone has a power point example on a specific gene of what the data is saying in terms of how this relates to changes in protein production - that would be great! I'm hoping someone out there has had to lecture on this to students and they have done a powerpoint presentation and are willing to show it to the galaxy community. Another point about the analysis of cufflinks data is the subject of the Pseudo Autosomal Regions in X and Y - this will make a mess of gene expression analysis in some cases especially because tophat will assign a read to both sites and make it a multihit read (which you might then filter out) or it may double the true levels of reported expression.. Anyone had thoughts on this? Best Wishes, David. __ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 d.a.matth...@bristol.ac.uk On 10 Mar 2011, at 15:55, Jeremy Goecks wrote: Jagat, Please send queries such as these to the galaxy-user mailing list (cc'd); there are many users on the list who can contribute to this discussion, and there are many additional users that will benefit from this discussion. I was wondering if you can point me to a documentation or URL to guide how to perform the downstream analysis once we have cuffdiff out put. In general, I agree that tools are needed to further process cufflinks/compare/diff outputs, but I'm not aware of any that are publicly available. Let's open this issue up for discussion and see if we can reach a consensus about tools might be useful. Everyone, please feel free to contribute ideas/tools; note that the Galaxy Tool Shed is a nice place for sharing tools you've built for Galaxy: http://community.g2.bx.psu.edu/ Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/