OK, I have a better idea.  We can simply restrict this reindexing  
behavior to the specific operation of looking up IDs during a BLAST  
search.  We only implemented this behavior to deal with BLAST's buggy  
mangling of sequence IDs, so there's no need to apply it in other  
situations.  If it isn't be applied at any other time, looking up an  
ID that isn't in the database will simply fail (KeyError), with no  
delay.

I renamed the reindexing class from BlastDB to BlastIDIndex.  It is  
now only used for looking up IDs while processing BLAST results in  
process_blast().  I renamed BlastDBbase to be the new BlastDB.   
Reindexing will never happen in normal usage; only when actually  
processing BLAST results.  This resolves Issue 49.

Questions:
- should we do the initial reindexing at the same time as the formatdb  
step?  This might reduce user annoyance, since users expect formatdb  
to take some time to reindex the database.

- Should we print out a warning message explaining that we're  
reindexing the BLAST database?  This might also reduce user  
annoyance / confusion, by clearing up the mystery of "why is Pygr so  
slow?".

- Should we allow the user to turn off reindexing (which means that  
BLAST will not work on NCBI databases with "mangled blob" IDs)?

- Can we auto-detect whether reindexing is needed (i.e. detect whether  
the sequence IDs are blobs that blastall will mangle?).  Then we could  
dispense with it completely on non-NCBI databases (or more  
specifically, databases whose IDs blastall won't mangle).


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to