Re: cache in seqdb.BlastDB

Istvan Albert Tue, 21 Oct 2008 05:56:51 -0700


On Oct 20, 10:39 pm, Christopher Lee <[EMAIL PROTECTED]> wrote:


> OK.  I now understand the problem.  The bsddb module btree index is  
> screwing us over: when you simply ask for an iterator, it apparently  
> loads the entire index into memory.  

Is this really true? The bsddb module is very heavily used by lots of
people to store dictionaries that do not  fit into memory. I have
never heard people mentioning this before.

For my own curiosity I wrote a small test script (see below) that
first creates 10 million entries (a 536MB file) then attempts to
iterate over them.  I see no slowdown or memory use in iterating over
elements. I get millisecond level access to keys.

---------------

def create():
    db = bsddb.btopen(filename, 'n')
    for key in range(10**7):
        key = str(key)
        db[key] = key
    db.close()

def read():
    db = bsddb.btopen(filename, 'r')
    start = time.time()

    # iterating on the database
    print db.first(), db.next(), db.next(), db.last()

    # with a custom iterator
    it = iter(db)
    print it.next(), it.next(), it.next()

    end = time.time()
    print 'Elapsed: %s' % (end-start)
    db.close()

#create()
read()
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: cache in seqdb.BlastDB

Reply via email to