Hi Titus, a few thoughts regarding all the work and discussion we've been having about BLAST.
First, I've noticed with increasing alarm that BLAST is like a black hole that will suck in more and more development effort seemingly without limit. That is, while BLAST may seem simple, it actually has all sorts of complications (e.g. all the questions about how to handle blastx, tblastn, tblastx etc.); its results can be inconsistent from version to version; and it has some really annoying bugs (e.g. mis- handling of paths containing whitespace) that are hell to work- around. Pygr generally tries to protect the user from complications and misbehaviors of external tools, but in the case of BLAST this turns into the proverbial road to hell (paved with good intentions), because it never ends. And it's not clear that this is effort well- spent. Certainly there is no unique added value here -- everybody already has scripts for running BLAST. I would rather achieve "80% of the value for 20% of the effort" by drawing a clear line separating what we will vs will not do for BLAST, e.g. we will raise a clear error message if BLAST fails because a database path had whitespace, but we won't try all sorts of kluges to work around that BLAST bug. To draw this line, I propose the following "skeptical philosophy" for decision making: - for a given problem, we only take action if a simple solution emerges that really provides closure, i.e. it doesn't lead to any *more* problems to solve. - otherwise, "masterful inaction": wait to see if the problem is really worth solving, if a simple solution emerges etc. With that in mind, we have a few loose ends to decide about: 1. the BLAST path whitespace bug. The BLASTDB env variable only works for blastall, not for formatdb, so it doesn't really solve the problem for us. So I think we should just raise a clear error message telling the user that BLAST can't handle whitespace in file paths. 2. handling of self-alignment in pairwise mode. We proposed changing pairwise mode to report self-alignments, rendering obsolete the current annoying warning message. 3. blastx 6 frame ORFs. We proposed changing this so each sequence would have just 6 full-length ORFs associated with it, and therefore blastx could return a single NLMSA (containing all the results), just as the regular blastn/blastp mode does. I think both 1 and 2 are simple and don't lead to new "problems to solve". I think #3 needs a detailed design proposal before we can really make this decision. What do you think we should do on these issues? -- Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
