On Jul 15, 2009, at 10:37 PM, C. Titus Brown wrote:
> One of the things I wanted to do with my code was to provide a
> 'translation' on sequences that would let you translate a DNA sequence
> into a protein sequence in any specified frame. What do you think?
If I'm understanding your question right, TranslationDB does this
pretty directly. Say you have a nucleotide sequence database object
db. Then you can get the translation of any desired frame or subslice
of a sequence in db as easily as:
tdb = TranslationDB(db)
# 100 AA beginning at nucleotide i on positive strand
orf100 = tdb[seqID][i:i+300]
# 100 AA for negative strand (reverse-comp) of same nt interval
orf100rc = (-(tdb[seqID]))[-i - 300: -i]
Note in the last case, you have to get the negative strand BEFORE
slicing it (instead of slicing it before negating). This is because
the slicing operation yields a translation (specifically a
TranslationAnnotSlice), which of course you cannot negate (protein
sequence has no reverse complement).
Alternatively, you can directly request one of the six frame
translations of the entire sequence, using the TranslationDB's annodb
attribute (which is just an annotation DB yielding TranslationAnnot
annotations for the six frames). I had to choose a naming convention
for the six frame annotations, so I just appended to the sequence ID a
colon and digit indicating what nucleotide the frame begins at (0, 1,
or 2 for positive strand, -0, -1, or -2 for the negative strand).
frame0 = tdb.annodb[seqID + ':0'] # + strand
frame1 = tdb.annodb[seqID + ':1']
frame2 = tdb.annodb[seqID + ':2']
frame0rc = tdb.annodb[seqID + ':-0'] # - strand
frame1rc = tdb.annodb[seqID + ':-1']
frame2rc = tdb.annodb[seqID + ':-2']
The negative frames translate the negative strand interval (reverse
complement) of the corresponding positive frame interval. i.e.
comparing their underlying nucleotide interval objects
frame0rc.sequence == -(frame0.sequence)
is True
Does this address what you wanted?
-- Chris
Example usage:
>>> from pygr import seqdb, translationDB
>>> dna = seqdb.SequenceFileDB('data/hbb1_mouse.fa')
>>> dna.keys()
['gi|171854975|dbj|AB364477.1|']
>>> nt = dna['gi|171854975|dbj|AB364477.1|']
>>> len(nt)
444
>>> tdb = translationDB.TranslationDB(dna)
>>> frame0 = tdb.annodb['gi|171854975|dbj|AB364477.1|:0']
>>> frame0.sequence
gi|171854975|dbj|AB364477.1|[0:444]
>>> frame1 = tdb.annodb['gi|171854975|dbj|AB364477.1|:1']
>>> frame1.sequence
gi|171854975|dbj|AB364477.1|[1:442]
>>> frame2 = tdb.annodb['gi|171854975|dbj|AB364477.1|:2']
>>> frame2.sequence
gi|171854975|dbj|AB364477.1|[2:443]
>>> frame0rc = tdb.annodb['gi|171854975|dbj|AB364477.1|:-0']
>>> frame0rc.sequence
-gi|171854975|dbj|AB364477.1|[0:444]
>>> frame1rc = tdb.annodb['gi|171854975|dbj|AB364477.1|:-1']
>>> frame1rc.sequence
-gi|171854975|dbj|AB364477.1|[1:442]
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---