On Jul 16, 2009, at 7:47 PM, C. Titus Brown wrote:
> I'm so confused about frame calculations that I can't think straight > about this any more -- does that match the BLAST frame calculations? > Perhaps I was too mired in the BLAST code thought process to separate > concerns, but I feel like we should adhere to NCBI's standards. (My > tests of 'seq.translation()' do so.) Yes, standards are great, to the extent that everyone agrees on one standard and follows it. I'll try to summarize what TranslationDB follows, then suggest a process for making decisions about external interface... - TranslationDB follows blast's way of specifying coordinates of translations (namely, blast reports the (positive) nucleotide coordinates of the ORF; if they are backwards (i.e. start > stop) that indicates a negative strand ORF). It therefore works as a sequence database that will process blastx / tblastn / tblastx results correctly with no extra code / coordinate transformations in the blast parser / alignment processor. If you consider BLAST to be a standard, then in this sense it follows that external interface standard. i.e. tdb[seqID][blastStart:blastStop] --> the appropriate translation slice - I consider TranslationDB's frame calculation (0, 1, 2 and the negative versions of those) to be an internal (private) representation. I recommend we keep this as the internal representation of frame, because it's simple and mathematically solid. However, we're not obligated to use this as the external (public) interface for frame information, so we can have a separate discussion about what that public interface should be. - TranslationDB was written for making our blastx alignment processing clean and simple. I think it does that. That is all that was requested for the 0.8 release. - I suspect that the broader discussion of public standards for representing frame may take longer (we should probably look at how other people do this, besides NCBI), and the 0.8 release should not be held up for this. No such feature was ever discussed for 0.8, and adding an open-ended requirement for an as-yet undefined feature strikes me as a bad example of feature-creep. Suggestion: the discussion of public interface (standards) for frame is important, and should occur as a separate process after the 0.8 release. There are some challenging questions here. For example, Pygr universally follows the zero-based indexing convention (just like C, Python, Perl etc., Pygr indexing starts at 0 rather than 1). We should probably weigh the advantages of any frame standard that is not zero-based carefully against the potential confusion caused by its inconsistency with the rest of Pygr / Python indexing. What do you think? -- Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
