Re: [galaxy-user] extension of read length

2013-09-12 Thread Jennifer Jackson

Hi Tobias,

In general, you can use *'**NGS: Picard (beta) -> SAM to FASTQ'* to 
extract sequences (convert BAM > SAM first), but this tool does not add 
in extra sequence based off the reference genome (or pad the associated 
quality scores, etc.). I don't know of a Galaxy wrapped tool that does 
this, but you might check the Tool Shed, or other public Galaxy servers. 
Others reading this post may also have advice.


Now, going from *BAM* -> coordinates (bed/interval) *->* *FASTA* 
sequence is possible a few ways. The general idea is that the 
coordinates are manipulated to extend the mapped footprint and then the 
sequence is extracted from the reference genome. Any content novel in 
the original sequence is lost, but maybe this still has some utility for 
you. The two methods below show how to do this, with the 2nd being 
simpler, if the genome is at UCSC. There are other ways to get flanking 
sequence, merge/cluster, etc. (see tools in group 'Operate on Genomic 
Intervals') but below are the most direct methods per-sequence to simply 
extend.


And if you need to filter down multi-mapped data, use the tool ' NGS: 
SAM Tools -> Filter SAM' (converting to/from SAM from BAM as needed).


*1st method, works for any genome, include a custom reference genome:*

1 - convert 'NGS: SAM Tools ->BAM-to-SAM'
2 - convert SAM to interval with 'NGS: SAM Tools -> Convert SAM' or 
convert to bed with 'BEDTools -> Convert from BAM to BED'
3 - split the file into two: one representing the (+) strand alignments, 
one the (-) using the tool ' Filter and Sort -> Filter'
4 - adjust the start or end coordinate to extend the alignment footprint 
as wanted using the tool 'Text Manipulation -> Compute'. Remember that 
for negative stranded coordinates, the "start" is really where the end 
of the sequence aligned and "end" is where the start of the sequence 
aligned - interval files report coordinates with respect to (+) strand, 
smallest -> largest.

http://wiki.galaxyproject.org/Learn/Datatypes#Interval
5 - cut out the columns to create a standard interval file again, 
swapping in the new coordinates. Click on the pencil icon to make 
attribute assignment for columns and to assign a reference genome as 
needed - this information is required by the next tool.
6 - get the fasta sequence by using the tool 'Fetch Sequences -> Extract 
Genomic DNA'
7 - merge all fasta results together with the tool 'Text Manipulation -> 
Concatenate datasets'
8 - if you need fastq format, you can pad out quality scores and create 
that with the tool 'NGS: QC and manipulation -> Combine FASTA and QUAL'



*2nd method, if the reference genome is at UCSC:*

1 - convert 'BEDTools -> Convert from BAM to BED'
2 - click on the "view at UCSC main" link for the dataset
3 - once at UCSC Browser, the data will show up as a custom track, by 
default named "User Track" in the top track group. Click on the track 
name - it will take you to the track controls and focus the browser on 
this track.
4 - in the top blue menu bar, click on "Tools -> Table Browser". This 
track will now be pre-loaded in the form with all options probably set 
as you want them (this user track is selected and "region" is "genome") 
- except for one - change "output format" from "BED" to be "sequence

5 - confirm that the "Galaxy" box is checked, and click on "get output"
6 - the next form has options for extending the sequence at 5' and/or 3' 
ends, all in one go, adjust as you want
7 - click on "Send query to Galaxy" and the dataset will load back into 
the working history

8 - the fasta can be converted to fastq as in the 1st method, step #8

Hopefully some of this is helpful!

Jen
Galaxy team


On 9/11/13 1:56 AM, Tobias Hohenauer wrote:

Dear all,

I am working on an MNAse-Seq experiment with 50bp single end reads. To 
identify nucleosome positions, I read that one needs to extend the 
single reads to approximately the length of nucleosome protected DNA, 
being approximately 150bp.


Is there a way in Galaxy to extend 50bp reads to 150bp length, lets 
say from a .BAM file with mapped reads?

Of course any other comment on this topic is much appreciated!

Thank you very much,

Tobias



--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

[galaxy-user] extension of read length

2013-09-11 Thread Tobias Hohenauer

Dear all,

I am working on an MNAse-Seq experiment with 50bp single end reads. To 
identify nucleosome positions, I read that one needs to extend the 
single reads to approximately the length of nucleosome protected DNA, 
being approximately 150bp.


Is there a way in Galaxy to extend 50bp reads to 150bp length, lets say 
from a .BAM file with mapped reads?

Of course any other comment on this topic is much appreciated!

Thank you very much,

Tobias

--
Tobias Hohenauer, PhD
GCNA, Disease Mechanism Research Core
RIKEN Brain Science Institute
2-1 Hirosawa, Wako-shi
351-0198 Japan

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

 http://galaxyproject.org/search/mailinglists/