[pygr] Re: seq.translation() (was: blast issues to decide)

Christopher Lee Fri, 17 Jul 2009 11:21:27 -0700


On Jul 16, 2009, at 7:47 PM, C. Titus Brown wrote:


> I'm so confused about frame calculations that I can't think straight
> about this any more -- does that match the BLAST frame calculations?
> Perhaps I was too mired in the BLAST code thought process to separate
> concerns, but I feel like we should adhere to NCBI's standards.  (My
> tests of 'seq.translation()' do so.)

Yes, standards are great, to the extent that everyone agrees on one  
standard and follows it.  I'll try to summarize what TranslationDB  
follows, then suggest a process for making decisions about external  
interface...

- TranslationDB follows blast's way of specifying coordinates of  
translations (namely, blast reports the (positive) nucleotide  
coordinates of the ORF; if they are backwards (i.e. start > stop) that  
indicates a negative strand ORF).  It therefore works as a sequence  
database that will process blastx / tblastn / tblastx results  
correctly with no extra code / coordinate transformations in the blast  
parser / alignment processor.  If you consider BLAST to be a standard,  
then in this sense it follows that external interface standard.

i.e. tdb[seqID][blastStart:blastStop] --> the appropriate translation  
slice

- I consider TranslationDB's frame calculation (0, 1, 2 and the  
negative versions of those) to be an internal (private)  
representation.  I recommend we keep this as the internal  
representation of frame, because it's simple and mathematically  
solid.  However, we're not obligated to use this as the external  
(public) interface for frame information, so we can have a separate  
discussion about what that public interface should be.

- TranslationDB was written for making our blastx alignment processing  
clean and simple.  I think it does that.  That is all that was  
requested for the 0.8 release.

- I suspect that the broader discussion of public standards for  
representing frame may take longer (we should probably look at how  
other people do this, besides NCBI), and the 0.8 release should not be  
held up for this.  No such feature was ever discussed for 0.8, and  
adding an open-ended requirement for an as-yet undefined feature  
strikes me as a bad example of feature-creep.  Suggestion: the  
discussion of public interface (standards) for frame is important, and  
should occur as a separate process after the 0.8 release.  There are  
some challenging questions here.  For example, Pygr universally  
follows the zero-based indexing convention (just like C, Python, Perl  
etc., Pygr indexing starts at 0 rather than 1).  We should probably  
weigh the advantages of any frame standard that is not zero-based  
carefully against the potential confusion caused by its inconsistency  
with the rest of Pygr / Python indexing.

What do you think?

-- Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: seq.translation() (was: blast issues to decide)

Reply via email to