[pygr] Re: blast issues to decide

Christopher Lee Thu, 16 Jul 2009 11:50:24 -0700

On Jul 15, 2009, at 10:37 PM, C. Titus Brown wrote:
> One of the things I wanted to do with my code was to provide a
> 'translation' on sequences that would let you translate a DNA sequence
> into a protein sequence in any specified frame.  What do you think?


If I'm understanding your question right, TranslationDB does this  
pretty directly.  Say you have a nucleotide sequence database object  
db.  Then you can get the translation of any desired frame or subslice  
of a sequence in db as easily as:

tdb = TranslationDB(db)
# 100 AA beginning at nucleotide i on positive strand
orf100 = tdb[seqID][i:i+300]
# 100 AA for negative strand (reverse-comp) of same nt interval
orf100rc = (-(tdb[seqID]))[-i - 300: -i]

Note in the last case, you have to get the negative strand BEFORE  
slicing it (instead of slicing it before negating).  This is because  
the slicing operation yields a translation (specifically a  
TranslationAnnotSlice), which of course you cannot negate (protein  
sequence has no reverse complement).

Alternatively, you can directly request one of the six frame  
translations of the entire sequence, using the TranslationDB's annodb  
attribute (which is just an annotation DB yielding TranslationAnnot  
annotations for the six frames).  I had to choose a naming convention  
for the six frame annotations, so I just appended to the sequence ID a  
colon and digit indicating what nucleotide the frame begins at (0, 1,  
or 2 for positive strand, -0, -1, or -2 for the negative strand).

frame0 = tdb.annodb[seqID + ':0'] # + strand
frame1 = tdb.annodb[seqID + ':1']
frame2 = tdb.annodb[seqID + ':2']

frame0rc = tdb.annodb[seqID + ':-0'] # - strand
frame1rc = tdb.annodb[seqID + ':-1']
frame2rc = tdb.annodb[seqID + ':-2']

The negative frames translate the negative strand interval (reverse  
complement) of the corresponding positive frame interval.  i.e.  
comparing their underlying nucleotide interval objects

frame0rc.sequence == -(frame0.sequence)
is True

Does this address what you wanted?

-- Chris

Example usage:
 >>> from pygr import seqdb, translationDB
 >>> dna = seqdb.SequenceFileDB('data/hbb1_mouse.fa')
 >>> dna.keys()
['gi|171854975|dbj|AB364477.1|']
 >>> nt = dna['gi|171854975|dbj|AB364477.1|']
 >>> len(nt)
444
 >>> tdb = translationDB.TranslationDB(dna)
 >>> frame0 = tdb.annodb['gi|171854975|dbj|AB364477.1|:0']
 >>> frame0.sequence
gi|171854975|dbj|AB364477.1|[0:444]
 >>> frame1 = tdb.annodb['gi|171854975|dbj|AB364477.1|:1']
 >>> frame1.sequence
gi|171854975|dbj|AB364477.1|[1:442]
 >>> frame2 = tdb.annodb['gi|171854975|dbj|AB364477.1|:2']
 >>> frame2.sequence
gi|171854975|dbj|AB364477.1|[2:443]
 >>> frame0rc = tdb.annodb['gi|171854975|dbj|AB364477.1|:-0']
 >>> frame0rc.sequence
-gi|171854975|dbj|AB364477.1|[0:444]
 >>> frame1rc = tdb.annodb['gi|171854975|dbj|AB364477.1|:-1']
 >>> frame1rc.sequence
-gi|171854975|dbj|AB364477.1|[1:442]


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: blast issues to decide

Reply via email to