Re: cache in seqdb.BlastDB

Christopher Lee Wed, 10 Dec 2008 13:43:25 -0800

Solution to the additional BlastDB that Namshin reported as issue 49:

BlastDB is deprecated.  Use SequenceFileDB instead as the standard  
sequence database class.

The delay that Namshin reported is due to BlastDB's support for NCBI  
ID mangling (i.e. the fact that NCBI blastall reports back "fake ID"s  
that do not match the original ID in the FASTA file).  BlastDB handles  
this by creating a lookup table for translating the fake IDs to the  
correct IDs.  The first request for an ID that doesn't match triggers  
construction of this table, thus the delay.  Note also that  
construction of this table will take up memory as well.

The solution is simple: switch to using the base class  
(SequenceFileDB, or BlastDBbase which adds blast() etc. methods)  
unless you really need the NCBI ID mangling support -- in which case  
this delay is unavoidable.

On Nov 17, 2008, at 3:45 PM, Namshin Kim wrote:

> Hi Chris,
>
> Now, I got another problem.
>
> >>> from pygr import seqdb
> >>> R1 = seqdb.BlastDB('R1')
> >>> R1.has_key('1')
> True
> >>> R1.has_key('1A') # EXTREMELY SLOW
>
> You can use the same BlastDB as previous test. Python does not show  
> any increase of memory usage.
>
> Yours,
> Namshin Kim
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: cache in seqdb.BlastDB

Reply via email to