On Fri, Jul 17, 2009 at 11:20:54AM -0700, Christopher Lee wrote:
-> On Jul 16, 2009, at 7:47 PM, C. Titus Brown wrote:
-> 
-> > I'm so confused about frame calculations that I can't think straight
-> > about this any more -- does that match the BLAST frame calculations?
-> > Perhaps I was too mired in the BLAST code thought process to separate
-> > concerns, but I feel like we should adhere to NCBI's standards.  (My
-> > tests of 'seq.translation()' do so.)
-> 
-> Yes, standards are great, to the extent that everyone agrees on one  
-> standard and follows it.  I'll try to summarize what TranslationDB  
-> follows, then suggest a process for making decisions about external  
-> interface...

[ ... ]

-> - I consider TranslationDB's frame calculation (0, 1, 2 and the  
-> negative versions of those) to be an internal (private)  
-> representation.  I recommend we keep this as the internal  
-> representation of frame, because it's simple and mathematically  
-> solid.  However, we're not obligated to use this as the external  
-> (public) interface for frame information, so we can have a separate  
-> discussion about what that public interface should be.
-> 
-> - TranslationDB was written for making our blastx alignment processing  
-> clean and simple.  I think it does that.  That is all that was  
-> requested for the 0.8 release.
-> 
-> - I suspect that the broader discussion of public standards for  
-> representing frame may take longer (we should probably look at how  
-> other people do this, besides NCBI), and the 0.8 release should not be  
-> held up for this.  No such feature was ever discussed for 0.8, and  
-> adding an open-ended requirement for an as-yet undefined feature  
-> strikes me as a bad example of feature-creep.  Suggestion: the  
-> discussion of public interface (standards) for frame is important, and  
-> should occur as a separate process after the 0.8 release.  There are  
-> some challenging questions here.  For example, Pygr universally  
-> follows the zero-based indexing convention (just like C, Python, Perl  
-> etc., Pygr indexing starts at 0 rather than 1).  We should probably  
-> weigh the advantages of any frame standard that is not zero-based  
-> carefully against the potential confusion caused by its inconsistency  
-> with the rest of Pygr / Python indexing.
-> 
-> What do you think?

The mental origin of the 'translation()' method on sequences was my
desire to be able to query the results of translated BLAST queries
nicely, like so:

  frame2 = dna_seq.translation(2)
  frame2_matches = blast_results[frame2]

Right now I don't see a simple (one-line) way to do this; I don't think
this:

-> # 100 AA for negative strand (reverse-comp) of same nt interval      
-> orf100rc = (-(tdb[seqID]))[-i - 300: -i]

counts as simple ;) and in any case you could not manually inspect the
BLAST output and easily correlate the frame of matches there with the
frame of matches in pygr.

I agree it's important to get good internal code for BLAST handling, but
my original objective was to make BLASTX more usable at an API level --
the internal code cleanup was just a nice byproduct.

I also agree that it's opening a small can of worms to add the translation
function, but (for better or for worse) BLAST holds a special place in
the toolkit of many bioinformaticians...

If we don't want to add a general translation function, how about adding
an NCBI-specific convenience function to translationDB?  e.g.

def translation_ncbi(seq, ncbi_frame):
   """
   Translates the DNA sequence 'seq' in the given frame, using NCBI's
   frame convention.
   """

cheers,
--titus
-- 
C. Titus Brown, [email protected]

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to