[galaxy-user] Transcriptome Hypericum perforatum

2013-11-25 Thread miroslav.sotak


To whom it may concern

I would like to kindly ask you if you do have any experience in de-novo 
transcriptomic analysis (no reference genome available) who might give 
us some advice.
Our main question is how to create the best set of cDNA contigs, on 
which we can map our RNAseq reads for the analysis of differential 
expression. Currently 4 larger sets of of RNAseq reads are available 
from different genotypes as well as draft genome assembly for one of the 
genotypes. We worry about the SNPs in different genotypes affecting the 
assembly, if we combine all the RNAseq datasets and using assemblers 
such as Trinity, Oases, Velvet. Might it be better to use the draft 
genomic assembly to obtain cDNA contigs using Tophat/cufflinks via all 
available RNAseq data or only using the RNAseq data from the same 
genotype as the genome draft?


Thank you in advance
Best wishes
Miro Sotak
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

 http://galaxyproject.org/search/mailinglists/


Re: [galaxy-user] Transcriptome Hypericum perforatum

2013-11-25 Thread Martin Čech
Hello Miro,

for these kind of general questions I would recommend you to ask in the
bioinformatics forum at http://www.biostars.org/ as it is somewhat
unrelated to Galaxy.

Nevertheless some of the tools you mentioned are installed and available on
the main instance (usegalaxy.org) and some you can install on your own
Galaxy via the Toolshed (http://toolshed.g2.bx.psu.edu/).

best

Martin, Galaxy Team


On Mon, Nov 25, 2013 at 4:16 PM, miroslav.sotak miroslav.so...@upjs.skwrote:


 To whom it may concern

 I would like to kindly ask you if you do have any experience in de-novo
 transcriptomic analysis (no reference genome available) who might give us
 some advice.
 Our main question is how to create the best set of cDNA contigs, on which
 we can map our RNAseq reads for the analysis of differential expression.
 Currently 4 larger sets of of RNAseq reads are available from different
 genotypes as well as draft genome assembly for one of the genotypes. We
 worry about the SNPs in different genotypes affecting the assembly, if we
 combine all the RNAseq datasets and using assemblers such as Trinity,
 Oases, Velvet. Might it be better to use the draft genomic assembly to
 obtain cDNA contigs using Tophat/cufflinks via all available RNAseq data or
 only using the RNAseq data from the same genotype as the genome draft?

 Thank you in advance
 Best wishes
 Miro Sotak
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

  http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Transcriptome Hypericum perforatum

2013-11-25 Thread Jennifer Jackson

Hello,

Interesting genome. I see that SRA has some RNA-seq public data, but 
there isn't much else going on. And you goal is to characterize the 
expression for observed phenotypes (linked to known genotypes)? If you 
use the Tuxedo suite after assembly (Trinity or other), differential 
expression of alternative splicing is one of the discovery outputs.


From my experience (and other are welcome to add comments), most SNP 
differences (_single_ base polymorphisms) do not in general impact the 
global assembly of whole genome data. Larger insertions/deletions are 
where you will observe differences. But that is DNA.


For transcription assembly, including RNA-seq, novel isoforms per sample 
and in particular rare events like SNPs, can become diluted when 
multiple samples are directly combined and assembled together straight 
de-novo. Still, obtaining full length cDNAs is certainly possible. And 
it has been done just about the same way, with various types of RNA 
data, for a very long time (most of RefSeq started out that way). The 
downside here is that the most common variant can overwhelm, but with 
a plant you might have that issue anyway depending on ploidy. So, test 
for yourself. Genomes can vary and the tools are so interesting - same 
way is a gross generalization on my part, in specifics the tools are 
very sophisticated.


And, most importantly, as you do have a reference genome to use as a 
guide (and that is really an invaluable tool not to be ignored) be sure 
to incorporate it unless it is from a sample that is known to be 
significantly, unacceptably, different from the wildtype. It sounds like 
the quality has been assessed to be unacceptable to use directly as a 
reference genome for some reason (correct? Or, you just want to build up 
the cDNA set -great project!). But the genome can still be utilized. 
Specifically - using it as an early stage assembly guide will give you a 
huge advantage, in my opinion (some assemblers cluster the data first by 
mapping - you want this if possible). But again, you could try it both 
ways and check out a few genes to see how the transcript profile worked 
out (vs any knowns - comparative OK, I always used these when I did this 
type of work), plus use the truth metrics (to me) of transcription 
assembly: how many singletons did you end up with (and what do they map 
to! can they really be ignored?)  how many over-clustered genes did 
you get (interesting, sparcer genes gobbled up by abundant 
housekeeping). Under-clustered genes/transcripts or incomplete 
transcripts are other factors, but depending on how you set the 
parameters in Cufflinks, this may be less important, if it isn't a 
pathological problem.


Many people will have advice about this, so ask, but also test. Looking 
at the results will inform you if the path is right. I hope this helps a 
little bit!


Jen
Galaxy team


On 11/25/13 1:16 PM, miroslav.sotak wrote:


To whom it may concern

I would like to kindly ask you if you do have any experience in 
de-novo transcriptomic analysis (no reference genome available) who 
might give us some advice.
Our main question is how to create the best set of cDNA contigs, on 
which we can map our RNAseq reads for the analysis of differential 
expression. Currently 4 larger sets of of RNAseq reads are available 
from different genotypes as well as draft genome assembly for one of 
the genotypes. We worry about the SNPs in different genotypes 
affecting the assembly, if we combine all the RNAseq datasets and 
using assemblers such as Trinity, Oases, Velvet. Might it be better to 
use the draft genomic assembly to obtain cDNA contigs using 
Tophat/cufflinks via all available RNAseq data or only using the 
RNAseq data from the same genotype as the genome draft?


Thank you in advance
Best wishes
Miro Sotak
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

 http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to