[pygr] blast issues to decide

Christopher Lee Sat, 13 Jun 2009 20:55:20 -0700

Hi Titus,
a few thoughts regarding all the work and discussion we've been having  
about BLAST.


First, I've noticed with increasing alarm that BLAST is like a black  
hole that will suck in more and more development effort seemingly  
without limit.  That is, while BLAST may seem simple, it actually has  
all sorts of complications (e.g. all the questions about how to handle  
blastx, tblastn, tblastx etc.); its results can be inconsistent from  
version to version; and it has some really annoying bugs (e.g. mis- 
handling of paths containing whitespace) that are hell to work- 
around.  Pygr generally tries to protect the user from complications  
and misbehaviors of external tools, but in the case of BLAST this  
turns into the proverbial road to hell (paved with good intentions),  
because it never ends.  And it's not clear that this is effort well- 
spent.  Certainly there is no unique added value here -- everybody  
already has scripts for running BLAST.

I would rather achieve "80% of the value for 20% of the effort" by  
drawing a clear line separating what we will vs will not do for BLAST,  
e.g. we will raise a clear error message if BLAST fails because a  
database path had whitespace, but we won't try all sorts of kluges to  
work around that BLAST bug.  To draw this line, I propose the  
following "skeptical philosophy" for decision making:
- for a given problem, we only take action if a simple solution  
emerges that really provides closure, i.e. it doesn't lead to any  
*more* problems to solve.

- otherwise, "masterful inaction": wait to see if the problem is  
really worth solving, if a simple solution emerges etc.

With that in mind, we have a few loose ends to decide about:

1. the BLAST path whitespace bug.  The BLASTDB env variable only works  
for blastall, not for formatdb, so it doesn't really solve the problem  
for us.  So I think we should just raise a clear error message telling  
the user that BLAST can't handle whitespace in file paths.

2. handling of self-alignment in pairwise mode.  We proposed changing  
pairwise mode to report self-alignments, rendering obsolete the  
current annoying warning message.

3. blastx 6 frame ORFs.  We proposed changing this so each sequence  
would have just 6 full-length ORFs associated with it, and therefore  
blastx could return a single NLMSA (containing all the results), just  
as the regular blastn/blastp mode does.

I think both 1 and 2 are simple and don't lead to new "problems to  
solve".  I think #3 needs a detailed design proposal before we can  
really make this decision.

What do you think we should do on these issues?

-- Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] blast issues to decide

Reply via email to