Re: [galaxy-user] Simulating sequencing and removing redundant sequences

2011-09-20 Thread Kevin Lam
Hi Daniel,
You would have multiple names for each sequence and that would be quite hard
to display. I am sure someone thought through this. Since the sequence is
the same, you can use the sequence to look back in the fastq file for read
name. Although I am not sure how that would help you?

Cheers
Kevin


On 20 September 2011 13:43, Daniel Sher ds...@sci.haifa.ac.il wrote:

  Thanks Kevin.  However, the collapse sequences replaces the original name
 of the sequences with a numerical code, and I need to keep the original
 names.  Any other suggestions?

 Thanks

 Daniel
  On 20/09/2011 05:32, Kevin Lam wrote:

 Hi Daniel
 for 2) you may use the tools under NGS QC and manipulation
  FASTQ to 
 FASTAhttp://main.g2.bx.psu.edu/tool_runner?tool_id=cshl_fastq_to_fastaconverter

  followed by

  
 Collapsehttp://main.g2.bx.psu.edu/tool_runner?tool_id=cshl_fastx_collapsersequences


 On 19 September 2011 09:54, Kevin Lam ke...@aitbiotech.com wrote:

 For 1) you may refer to  Simulated Dataset of Solexa - 
 SEQanswershttp://seqanswers.com/forums/showthread.php?t=806


  Has anyone replied you for 2) ?



  On 18 September 2011 21:12, Daniel Sher ds...@sci.haifa.ac.il wrote:

   Hello,

 I have two questions - I apologize if they are trivial..

 1) I want to simulate the amount of Illumina sequencing needed to
 sequence  and assemble a known genome.  Is there a way to randomly pick
 sequences of a specific length from a genome (either one available online or
 one I upload)?  Something like pick 100bp randomly (either strand), move
 400-500bp forward and pick another 100bp?

 2) Is there a way to remove redundant sequences from a FASTA file without
 losing the original sequence names (as happens with collapse)?

 Thanks

 Daniel


  --
 ~~~
 Daniel Sher, PhD
 Department of Marine Biology
 Leon H. Charney School of Marine Sciences
 University of Haifa, Mt. Carmel 31905, Haifa, Israel

 Office +972-4-8240731
 Lab+972-4-8288961
 email: ds...@sci.haifa.ac.il


  ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

  http://lists.bx.psu.edu/




 --
 ~~~
 Daniel Sher, PhD
 Department of Marine Biology
 Leon H. Charney School of Marine Sciences
 University of Haifa, Mt. Carmel 31905, Haifa, Israel

 Office +972-4-8240731
 Lab+972-4-8288961
 email: ds...@sci.haifa.ac.il


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Simulating sequencing and removing redundant sequences

2011-09-20 Thread Florent Angly
For read simulation, you may also want to give Grinder a try. I made a 
Galaxy wrapper for it (see in the toolshed: http://toolshed.g2.bx.psu.edu/)

Florent

On 20/09/11 18:46, Kevin Lam wrote:

Hi Daniel,
You would have multiple names for each sequence and that would be 
quite hard to display. I am sure someone thought through this. Since 
the sequence is the same, you can use the sequence to look back in the 
fastq file for read name. Although I am not sure how that would help you?


Cheers
Kevin


On 20 September 2011 13:43, Daniel Sher ds...@sci.haifa.ac.il 
mailto:ds...@sci.haifa.ac.il wrote:


Thanks Kevin.  However, the collapse sequences replaces the
original name of the sequences with a numerical code, and I need
to keep the original names.  Any other suggestions?

Thanks

Daniel

On 20/09/2011 05:32, Kevin Lam wrote:

Hi Daniel
for 2) you may use the tools under NGS QC and manipulation
FASTQ to FASTA
http://main.g2.bx.psu.edu/tool_runner?tool_id=cshl_fastq_to_fasta
converter

followed by

Collapse
http://main.g2.bx.psu.edu/tool_runner?tool_id=cshl_fastx_collapser
sequences


On 19 September 2011 09:54, Kevin Lam ke...@aitbiotech.com
mailto:ke...@aitbiotech.com wrote:

For 1) you may refer to


  Simulated Dataset of Solexa - SEQanswers
  http://seqanswers.com/forums/showthread.php?t=806



Has anyone replied you for 2) ?



On 18 September 2011 21:12, Daniel Sher
ds...@sci.haifa.ac.il mailto:ds...@sci.haifa.ac.il wrote:

Hello,

I have two questions - I apologize if they are trivial..

1) I want to simulate the amount of Illumina sequencing
needed to sequence  and assemble a known genome.  Is
there a way to randomly pick sequences of a specific
length from a genome (either one available online or one
I upload)?  Something like pick 100bp randomly (either
strand), move 400-500bp forward and pick another 100bp?

2) Is there a way to remove redundant sequences from a
FASTA file without losing the original sequence names (as
happens with collapse)?

Thanks

Daniel


-- 
~~~

Daniel Sher, PhD
Department of Marine Biology
Leon H. Charney School of Marine Sciences
University of Haifa, Mt. Carmel 31905, Haifa, Israel

Office+972-4-8240731  tel:%2B972-4-8240731
Lab+972-4-8288961  tel:%2B972-4-8288961
email:ds...@sci.haifa.ac.il  mailto:ds...@sci.haifa.ac.il


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org http://usegalaxy.org.  Please keep all
replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

http://lists.bx.psu.edu/





-- 
~~~

Daniel Sher, PhD
Department of Marine Biology
Leon H. Charney School of Marine Sciences
University of Haifa, Mt. Carmel 31905, Haifa, Israel

Office+972-4-8240731  tel:%2B972-4-8240731
Lab+972-4-8288961  tel:%2B972-4-8288961
email:ds...@sci.haifa.ac.il  mailto:ds...@sci.haifa.ac.il



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Simulating sequencing and removing redundant sequences

2011-09-18 Thread Daniel Sher

  
  
Hello,
I have two questions - I apologize if they are trivial..

1) I want to simulate the amount of Illumina sequencing needed to
  sequence and assemble a known genome. Is there a way to randomly
  pick sequences of a specific length from a genome (either one
  available online or one I upload)? Something like "pick 100bp
  randomly (either strand), move 400-500bp forward and pick another
  100bp?" 
2) Is there a way to remove redundant sequences from a FASTA file
  without losing the original sequence names (as happens with
  "collapse")?
Thanks
Daniel



-- 
~~~
Daniel Sher, PhD
Department of Marine Biology
Leon H. Charney School of Marine Sciences
University of Haifa, Mt. Carmel 31905, Haifa, Israel
 
Office +972-4-8240731
Lab+972-4-8288961
email: ds...@sci.haifa.ac.il
  

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Simulating sequencing and removing redundant sequences

2011-09-18 Thread Kevin Lam
For 1) you may refer to  Simulated Dataset of Solexa -
SEQanswershttp://seqanswers.com/forums/showthread.php?t=806


Has anyone replied you for 2) ?



On 18 September 2011 21:12, Daniel Sher ds...@sci.haifa.ac.il wrote:

  Hello,

 I have two questions - I apologize if they are trivial..

 1) I want to simulate the amount of Illumina sequencing needed to sequence
 and assemble a known genome.  Is there a way to randomly pick sequences of a
 specific length from a genome (either one available online or one I
 upload)?  Something like pick 100bp randomly (either strand), move
 400-500bp forward and pick another 100bp?

 2) Is there a way to remove redundant sequences from a FASTA file without
 losing the original sequence names (as happens with collapse)?

 Thanks

 Daniel


  --
 ~~~
 Daniel Sher, PhD
 Department of Marine Biology
 Leon H. Charney School of Marine Sciences
 University of Haifa, Mt. Carmel 31905, Haifa, Israel

 Office +972-4-8240731
 Lab+972-4-8288961
 email: ds...@sci.haifa.ac.il


 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/