Re: [galaxy-user] Assembly of Paired and Unpaired sequences

2013-12-16 Thread Jennifer Jackson

Hi Prash,

You have reach the galaxy-u...@bx.psu.edu mailing list that supports the 
public Galaxy instance at http://usegalaxy.org. Sometimes we can help 
with broader questions, but for general bioinformatics help I would 
search, then ask, the communities at a web sites such as biostars.org 
and seqanswers.com. The original tool author and any web sites they 
support are also good resources.


That said, to give some short help for your questions (but follow up 
with the above):
1  - most any short read dataset can be run with blast - so I am not 
sure what you are asking. when you ask at the other sites, add more 
details about your goal.
2 - running a tool such as FastQC can give you an idea about sequence 
quality (if that is what you mean by better). some tools require 
paired end data, so that could make it automatically better. If you are 
wondering which set is contributing in a better way to the assembly, 
then asking other users of the tool, ideally working with a similar 
genome, how they determine this would be a good place to start.
3 - to annotate assembly results with chromosome assignment - how to do 
this depends on what other data is available for your genome (genomic or 
transcripts/genes). Or what related genomes may be available 
(comparative). The basic idea would be to compare against known to make 
assignments.


There is a repository for this tool in the Galaxy Tool Shed, for use to 
local or cloud instances, but it sounds like you already saw that. 
http://usegalaxy.org/toolshed. If you had technical problems with that 
tool, the tool author could be contacted. Although if the tool fails on 
the line command, then there is likely a bigger issue as you suspect 
(memory or otherwise), and the wrapper would be unlikely to change that. 
But, you could also move to a cloud instance with more resource. 
http://usegalaxy.org/cloud


Good luck!

Jen
Galaxy team

On 12/16/13 2:14 AM, Prash wrote:

Dear All
Greetings! I am analysing a genome of ca. 3.4Mb where I have paired.fa 
and unpaired.fa  files as of now.  The sequences have been trimmed 
before that.  Now when I assemble these reads using 'ssake -f paired 
-g unpaired ...',  it takes hell lot of time.  Perhaps, I am running 
out of memory in analyzing the sequence reads.  I could use galaxy 
platform, but would like to stick with ssake.

Few questions:
What if I concatenate these two files, would I be able to peruse this 
for blasting against my reference?
At this point, how do I know whether or not paired or single-end reads 
are better?

How do I know the two chromosomal sequences?
Help appreciated for stupid questions :)
Thank you in advance
Prash
Prashanth Suravajhala, PhD.
Homepage: http://www.bioinformatics.org/wiki/Prash 
http://www.bioinformatics.org/wiki/Prash
Linkedin: http://dk.linkedin.com/in/prashbio 
http://dk.linkedin.com/in/prashbio


What counts in life is not the mere fact that we have lived. It is 
what difference we have made to the lives of others that will 
determine the significance of the life we lead. --- Nelson Mandela



On 15 December 2013 18:00, galaxy-user-requ...@lists.bx.psu.edu 
mailto:galaxy-user-requ...@lists.bx.psu.edu wrote:


Send galaxy-user mailing list submissions to
galaxy-user@lists.bx.psu.edu mailto:galaxy-user@lists.bx.psu.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.bx.psu.edu/listinfo/galaxy-user
or, via email, send a message with subject or body 'help' to
galaxy-user-requ...@lists.bx.psu.edu
mailto:galaxy-user-requ...@lists.bx.psu.edu

You can reach the person managing the list at
galaxy-user-ow...@lists.bx.psu.edu
mailto:galaxy-user-ow...@lists.bx.psu.edu

When replying, please edit your Subject line so it is more specific
than Re: Contents of galaxy-user digest...


HEY!  This is important!  If you reply to a thread in a digest, please
1. Change the subject of your response from Galaxy-user Digest
Vol ... to the original subject for the thread.
2. Strip out everything else in the digest that is not part of the
thread you are responding to.

Why?
1. This will keep the subject meaningful.  People will have some
idea from the subject line if they should read it or not.
2. Not doing this greatly increases the number of emails that
match search queries, but that aren't actually informative.

Today's Topics:

   1. Re: fastqc and blast? trinity? (Peter Cock)


--

Message: 1
Date: Sat, 14 Dec 2013 21:18:29 +
From: Peter Cock p.j.a.c...@googlemail.com
mailto:p.j.a.c...@googlemail.com
To: Jorge Braun braun_...@hotmail.com mailto:braun_...@hotmail.com
Cc: galaxy-user@lists.bx.psu.edu
mailto:galaxy-user@lists.bx.psu.edu
galaxy-user@lists.bx.psu.edu mailto:galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] fastqc and blast? trinity?
Message-ID:

Re: [galaxy-user] Assembly of Paired and Unpaired sequences

2013-12-16 Thread Prash
Thank you Jennifer.  That was a big help :)

Regards
Prash


Prashanth Suravajhala, PhD.
Homepage: http://www.bioinformatics.org/wiki/Prash
Linkedin: http://dk.linkedin.com/in/prashbio
http://dk.linkedin.com/in/prashbio

“What counts in life is not the mere fact that we have lived. It is what
difference we have made to the lives of others that will determine the
significance of the life we lead.” — Nelson Mandela


On 16 December 2013 18:19, Jennifer Jackson j...@bx.psu.edu wrote:

  Hi Prash,

 You have reach the galaxy-u...@bx.psu.edu mailing list that supports the
 public Galaxy instance at http://usegalaxy.org. Sometimes we can help
 with broader questions, but for general bioinformatics help I would search,
 then ask, the communities at a web sites such as biostars.org and
 seqanswers.com. The original tool author and any web sites they support
 are also good resources.

 That said, to give some short help for your questions (but follow up with
 the above):
 1  - most any short read dataset can be run with blast - so I am not sure
 what you are asking. when you ask at the other sites, add more details
 about your goal.
 2 - running a tool such as FastQC can give you an idea about sequence
 quality (if that is what you mean by better). some tools require paired
 end data, so that could make it automatically better. If you are wondering
 which set is contributing in a better way to the assembly, then asking
 other users of the tool, ideally working with a similar genome, how they
 determine this would be a good place to start.
 3 - to annotate assembly results with chromosome assignment - how to do
 this depends on what other data is available for your genome (genomic or
 transcripts/genes). Or what related genomes may be available (comparative).
 The basic idea would be to compare against known to make assignments.

 There is a repository for this tool in the Galaxy Tool Shed, for use to
 local or cloud instances, but it sounds like you already saw that.
 http://usegalaxy.org/toolshed. If you had technical problems with that
 tool, the tool author could be contacted. Although if the tool fails on the
 line command, then there is likely a bigger issue as you suspect (memory or
 otherwise), and the wrapper would be unlikely to change that. But, you
 could also move to a cloud instance with more resource.
 http://usegalaxy.org/cloud

 Good luck!

 Jen
 Galaxy team


 On 12/16/13 2:14 AM, Prash wrote:

  Dear All

  Greetings! I am analysing a genome of ca. 3.4Mb where I have paired.fa
 and unpaired.fa  files as of now.  The sequences have been trimmed before
 that.  Now when I assemble these reads using 'ssake -f paired -g unpaired
 ...',  it takes hell lot of time.  Perhaps, I am running out of memory in
 analyzing the sequence reads.  I could use galaxy platform, but would like
 to stick with ssake.

 Few questions:
  What if I concatenate these two files, would I be able to peruse this for
 blasting against my reference?
 At this point, how do I know whether or not paired or single-end reads are
 better?
 How do I know the two chromosomal sequences?


 Help appreciated for stupid questions :)

 Thank you in advance
 Prash


 Prashanth Suravajhala, PhD.
 Homepage: http://www.bioinformatics.org/wiki/Prash
 Linkedin: http://dk.linkedin.com/in/prashbio 
 http://dk.linkedin.com/in/prashbio

 “What counts in life is not the mere fact that we have lived. It is what
 difference we have made to the lives of others that will determine the
 significance of the life we lead.” — Nelson Mandela


 On 15 December 2013 18:00, galaxy-user-requ...@lists.bx.psu.edu wrote:

 Send galaxy-user mailing list submissions to
 galaxy-user@lists.bx.psu.edu

 To subscribe or unsubscribe via the World Wide Web, visit
 http://lists.bx.psu.edu/listinfo/galaxy-user
 or, via email, send a message with subject or body 'help' to
 galaxy-user-requ...@lists.bx.psu.edu

 You can reach the person managing the list at
 galaxy-user-ow...@lists.bx.psu.edu

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of galaxy-user digest...


 HEY!  This is important!  If you reply to a thread in a digest, please
 1. Change the subject of your response from Galaxy-user Digest Vol ...
 to the original subject for the thread.
 2. Strip out everything else in the digest that is not part of the thread
 you are responding to.

 Why?
 1. This will keep the subject meaningful.  People will have some idea
 from the subject line if they should read it or not.
 2. Not doing this greatly increases the number of emails that match
 search queries, but that aren't actually informative.

 Today's Topics:

1. Re: fastqc and blast? trinity? (Peter Cock)


 --

 Message: 1
 Date: Sat, 14 Dec 2013 21:18:29 +
 From: Peter Cock p.j.a.c...@googlemail.com
 To: Jorge Braun braun_...@hotmail.com
 Cc: galaxy-user@lists.bx.psu.edu