Re: [galaxy-user] downstream analysis of cuffdiff out put

2011-04-18 Thread Jeremy Goecks

Vasu,

Please reply to the mailing list as emails to individual Galaxy  
developers often get lost, and there are others on the list that might  
be able to help you or benefit from this discussion.


Now, to your question: you're using the wrong GFF filtering tool,  
which is an easy mistake to make as there are many of them. You want  
to use Filter and Sort -- GFF -- Filter GTF data by attribute values  
list. Using this tool, I was able to filter dataset 11--a GTF file  
produced by Cuffcompare--using a Cuffdiff isoform expression file  
(dataset 10) on transcript_id. I've shared the modified history with  
you.


Best,
J.

On Apr 18, 2011, at 3:29 PM, vasu punj wrote:


Hi Jeremy,

I have been trying to use the tool mentioned in this message.
I have two samples comparison  6 and 5 and has run Cufflink/  
Cuffcompare/ Cuffdiff. I have filtred the files for c12 i.e for  
significant analysis and file is 10 uploaded as B_A Cuffdiff isoform  
expr filtered.txt
I uploaded the second file 11  
B_A_Homo_sapiens.GRCh37.60.clean.combined which is a combined GTF  
file generated by cuffcompare.


When I tried to run filter combined transcript file using:
Combined GTF as Cufflink assembled transcripts (11)  and Cuffcompare  
tracking file as Cuffdiff isoform exp filtered file  using sample no  
as 2,  it return an empty file (12)
Than thinking that perhaps it may be tracking file which I may have  
to use instead of combined GTF.
I used B-A combined tracking file in place of combined GTF file but  
it will pop up only in Cuffcompare tracking file It may not be right  
but I used File 13 as
 tracking file with combined GTF as as assembled transcript still it  
return empty out put

I have also shared history with you.

Would you like to point me what is going on here?
Thanks.


Vasu
--- On Mon, 4/11/11, Jeremy Goecks jeremy.goe...@emory.edu wrote:

From: Jeremy Goecks jeremy.goe...@emory.edu
Subject: Re: [galaxy-user] downstream analysis of cuffdiff out put
To: shamsher jagat kanwar...@gmail.com
Cc: galaxy-user galaxy-user@lists.bx.psu.edu
Date: Monday, April 11, 2011, 9:04 AM

On Thu, Mar 10, 2011 at 7:55 AM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:

Jagat,
Just like any mRNA-seq experiment to achieve following objectives:
1.   Reconstruct  all transcripts of a particular gene and  
corresponding Cuffdiff  significantly expressed transcripts as  
called by cuffdiff.

2.   What are different isoforms
3.   Location of splicing

From various output files which unique ID can be matched  from one  
file say Cuffdiff.expr (transcript/ isoform/Splicing)  to  other  
file - transcript.gtf  corresponding to each sample or combined  
GTF file.
I've got a script that does this for the cuffdiff isoform  
expression testing file and a GTF file; I'll wrap it up and add it  
to Galaxy in the next couple weeks. It would probably be useful to  
have similar scripts for the other expression testing files as  
well. Also, it would be nice to be able to take the FPKM values  
generated by Cuffdiff and attach them to their respective  
transcripts as attributes.


Hello all,

I've added a tool called 'Filter GTF file by attribute values list'  
to the galaxy-central code repository. This tool is available on our  
test server ( http://test.g2.bx.psu.edu/ ) at Filter and Sort --  
GFF -- Filter GTF data by attribute values list and will be  
available on our main server in the next few weeks.


As expected, this tool filters a GTF file based on a list of  
attribute values--or filters using a tabular file where attribute  
values are first column, as is the case for Cuffdiff output files.  
Potential attributes that can be filtered on include transcript_id,  
gene_id, tss_id, and p_id; conveniently, these are the IDs that  
Cuffdiff uses in its output files.


Here's an example workflow:

(1) Run Cufflinks/compare/diff
(2) Filter Cufflinks isoform differential expression file for  
transcripts that are differentially expressed; in other words,  
filter for c12=='yes'
(2) Use 'Filter GTF data by attribute values list' to filter  
Cuffcompare combined transcripts using the filtered file from step  
(2) as the attribute values list and, voila, you have a GTF file of  
the differentially expressed transcripts that you can view in your  
favorite genome browser.


Hope this helps; feedback is always welcome.

Best,
J.

-Inline Attachment Follows-

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu

Re: [galaxy-user] downstream analysis of cuffdiff out put

2011-04-11 Thread Jeremy Goecks
 On Thu, Mar 10, 2011 at 7:55 AM, Jeremy Goecks jeremy.goe...@emory.edu 
 wrote:
 Jagat,
 Just like any mRNA-seq experiment to achieve following objectives:
 1.   Reconstruct  all transcripts of a particular gene and corresponding 
 Cuffdiff  significantly expressed transcripts as called by cuffdiff.
 2.   What are different isoforms
 3.   Location of splicing
 
 From various output files which unique ID can be matched  from one file say 
 Cuffdiff.expr (transcript/ isoform/Splicing)  to  other file - 
 transcript.gtf  corresponding to each sample or combined GTF file.
 
 I've got a script that does this for the cuffdiff isoform expression testing 
 file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple 
 weeks. It would probably be useful to have similar scripts for the other 
 expression testing files as well. Also, it would be nice to be able to take 
 the FPKM values generated by Cuffdiff and attach them to their respective 
 transcripts as attributes.

Hello all,

I've added a tool called 'Filter GTF file by attribute values list' to the 
galaxy-central code repository. This tool is available on our test server ( 
http://test.g2.bx.psu.edu/ ) at Filter and Sort -- GFF -- Filter GTF data by 
attribute values list and will be available on our main server in the next few 
weeks.

As expected, this tool filters a GTF file based on a list of attribute 
values--or filters using a tabular file where attribute values are first 
column, as is the case for Cuffdiff output files. Potential attributes that can 
be filtered on include transcript_id, gene_id, tss_id, and p_id; conveniently, 
these are the IDs that Cuffdiff uses in its output files. 

Here's an example workflow:

(1) Run Cufflinks/compare/diff
(2) Filter Cufflinks isoform differential expression file for transcripts that 
are differentially expressed; in other words, filter for c12=='yes'
(2) Use 'Filter GTF data by attribute values list' to filter Cuffcompare 
combined transcripts using the filtered file from step (2) as the attribute 
values list and, voila, you have a GTF file of the differentially expressed 
transcripts that you can view in your favorite genome browser.

Hope this helps; feedback is always welcome.

Best,
J.___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] downstream analysis of cuffdiff out put

2011-03-10 Thread Jeremy Goecks
Jagat,

Please send queries such as these to the galaxy-user mailing list (cc'd); there 
are many users on the list who can contribute to this discussion, and there are 
many additional users that will benefit from this discussion.

 I was wondering if you can point me to a documentation or URL to guide how to 
 perform the downstream analysis once we have cuffdiff out put.

In general, I agree that tools are needed to further process 
cufflinks/compare/diff outputs, but I'm not aware of any that are publicly 
available. Let's open this issue up for discussion and see if we can reach a 
consensus about tools might be useful. Everyone, please feel free to contribute 
ideas/tools; note that the Galaxy Tool Shed is a nice place for sharing tools 
you've built for Galaxy:

http://community.g2.bx.psu.edu/

 Just like any mRNA-seq experiment to achieve following objectives:
 
 1.   Reconstruct  all transcripts of a particular gene and corresponding 
 Cuffdiff  significantly expressed transcripts as called by cuffdiff.
 2.   What are different isoforms
 3.   Location of splicing
 
 From various output files which unique ID can be matched  from one file say 
 Cuffdiff.expr (transcript/ isoform/Splicing)  to  other file - transcript.gtf 
  corresponding to each sample or combined GTF file.
 
I've got a script that does this for the cuffdiff isoform expression testing 
file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple 
weeks. It would probably be useful to have similar scripts for the other 
expression testing files as well. Also, it would be nice to be able to take the 
FPKM values generated by Cuffdiff and attach them to their respective 
transcripts as attributes.

Best,
J. 

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] downstream analysis of cuffdiff out put

2011-03-10 Thread David Matthews
Hi All,

I agree with this problem and solution. I have a lot of cufflinks, cuffcompare 
and cuffdiff output but I am struggling to relate what this means in terms of 
the real world! I have seen partek software attempt to visualise some of the 
data it generates which appears to be using the FMI data in the cufflinks suite 
but beyond that I struggle. I did have an email conversation with Cole Trapnell 
which eventually centred on the idea that you just have to trust the analysis 
and then go away and do the RT-PCR to check it all out!
So for tools I think:

1. A tool that shows you the layout of known isoforms for a gene and the FMI 
data for each isoform. 

er. thats it for now from me!

But I also struggle to understand what all the other outputs really mean! What 
does the CDS.diff output tell us? What dies the promoters.diff output tell us? 
I know what the cufflinks manual says but I struggle to convert this in my head 
to what is happening to an actual gene so if anyone has a power point example 
on a specific gene of what the data is saying in terms of how this relates to 
changes in protein production - that would be great! I'm hoping someone out 
there has had to lecture on this to students and they have done a powerpoint 
presentation and are willing to show it to the galaxy community.

Another point about the analysis of cufflinks data is the subject of the Pseudo 
Autosomal Regions in X and Y - this will make a mess of gene expression 
analysis in some cases especially because tophat will assign a read to both 
sites and make it a multihit read (which you might then filter out) or it may 
double the true levels of reported expression.. Anyone had thoughts on this?

Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 10 Mar 2011, at 15:55, Jeremy Goecks wrote:

 Jagat,
 
 Please send queries such as these to the galaxy-user mailing list (cc'd); 
 there are many users on the list who can contribute to this discussion, and 
 there are many additional users that will benefit from this discussion.
 
 I was wondering if you can point me to a documentation or URL to guide how 
 to perform the downstream analysis once we have cuffdiff out put.
 
 In general, I agree that tools are needed to further process 
 cufflinks/compare/diff outputs, but I'm not aware of any that are publicly 
 available. Let's open this issue up for discussion and see if we can reach a 
 consensus about tools might be useful. Everyone, please feel free to 
 contribute ideas/tools; note that the Galaxy Tool Shed is a nice place for 
 sharing tools you've built for Galaxy:
 
 http://community.g2.bx.psu.edu/
 
 Just like any mRNA-seq experiment to achieve following objectives:
 
 1.   Reconstruct  all transcripts of a particular gene and corresponding 
 Cuffdiff  significantly expressed transcripts as called by cuffdiff.
 2.   What are different isoforms
 3.   Location of splicing
 
 From various output files which unique ID can be matched  from one file say 
 Cuffdiff.expr (transcript/ isoform/Splicing)  to  other file - 
 transcript.gtf  corresponding to each sample or combined GTF file.
 
 I've got a script that does this for the cuffdiff isoform expression testing 
 file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple 
 weeks. It would probably be useful to have similar scripts for the other 
 expression testing files as well. Also, it would be nice to be able to take 
 the FPKM values generated by Cuffdiff and attach them to their respective 
 transcripts as attributes.
 
 Best,
 J. 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/