On Fri, Jul 17, 2009 at 11:20:54AM -0700, Christopher Lee wrote: -> On Jul 16, 2009, at 7:47 PM, C. Titus Brown wrote: -> -> > I'm so confused about frame calculations that I can't think straight -> > about this any more -- does that match the BLAST frame calculations? -> > Perhaps I was too mired in the BLAST code thought process to separate -> > concerns, but I feel like we should adhere to NCBI's standards. (My -> > tests of 'seq.translation()' do so.) -> -> Yes, standards are great, to the extent that everyone agrees on one -> standard and follows it. I'll try to summarize what TranslationDB -> follows, then suggest a process for making decisions about external -> interface...
[ ... ] -> - I consider TranslationDB's frame calculation (0, 1, 2 and the -> negative versions of those) to be an internal (private) -> representation. I recommend we keep this as the internal -> representation of frame, because it's simple and mathematically -> solid. However, we're not obligated to use this as the external -> (public) interface for frame information, so we can have a separate -> discussion about what that public interface should be. -> -> - TranslationDB was written for making our blastx alignment processing -> clean and simple. I think it does that. That is all that was -> requested for the 0.8 release. -> -> - I suspect that the broader discussion of public standards for -> representing frame may take longer (we should probably look at how -> other people do this, besides NCBI), and the 0.8 release should not be -> held up for this. No such feature was ever discussed for 0.8, and -> adding an open-ended requirement for an as-yet undefined feature -> strikes me as a bad example of feature-creep. Suggestion: the -> discussion of public interface (standards) for frame is important, and -> should occur as a separate process after the 0.8 release. There are -> some challenging questions here. For example, Pygr universally -> follows the zero-based indexing convention (just like C, Python, Perl -> etc., Pygr indexing starts at 0 rather than 1). We should probably -> weigh the advantages of any frame standard that is not zero-based -> carefully against the potential confusion caused by its inconsistency -> with the rest of Pygr / Python indexing. -> -> What do you think? The mental origin of the 'translation()' method on sequences was my desire to be able to query the results of translated BLAST queries nicely, like so: frame2 = dna_seq.translation(2) frame2_matches = blast_results[frame2] Right now I don't see a simple (one-line) way to do this; I don't think this: -> # 100 AA for negative strand (reverse-comp) of same nt interval -> orf100rc = (-(tdb[seqID]))[-i - 300: -i] counts as simple ;) and in any case you could not manually inspect the BLAST output and easily correlate the frame of matches there with the frame of matches in pygr. I agree it's important to get good internal code for BLAST handling, but my original objective was to make BLASTX more usable at an API level -- the internal code cleanup was just a nice byproduct. I also agree that it's opening a small can of worms to add the translation function, but (for better or for worse) BLAST holds a special place in the toolkit of many bioinformaticians... If we don't want to add a general translation function, how about adding an NCBI-specific convenience function to translationDB? e.g. def translation_ncbi(seq, ncbi_frame): """ Translates the DNA sequence 'seq' in the given frame, using NCBI's frame convention. """ cheers, --titus -- C. Titus Brown, [email protected] --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
