On Aug 2, 1:42 pm, Ian Clark <[EMAIL PROTECTED]> wrote: > lazy wrote: > > I have a berkely db and Im using the bsddb module to access it. The Db > > is quite huge (anywhere from 2-30GB). I want to iterate over the keys > > serially. > > I tried using something basic like > > > for key in db.keys() > > > but this takes lot of time. I guess Python is trying to get the list > > of all keys first and probbaly keep it in memory. Is there a way to > > avoid this, since I just want to access keys serially. I mean is there > > a way I can tell Python to not load all keys, but try to access it as > > the loop progresses(like in a linked list). I could find any accessor > > methonds on bsddb to this with my initial search. > > I am guessing BTree might be a good choice here, but since while the > > Dbs were written it was opened using hashopen, Im not able to use > > btopen when I want to iterate over the db. > > db.iterkeys() > > Looking at the doc for bsddb objects[1] it mentions that "Once > instantiated, hash, btree and record objects support the same methods as > dictionaries." Then looking at the dict documentation[2] you'll find the > dict.iterkeys() method that should do what you're asking. > > Ian > > [1]http://docs.python.org/lib/bsddb-objects.html > [2]http://docs.python.org/lib/typesmapping.html
Thanks. I tried using db.first and then db.next for subsequent keys. seems to be faster. Thanks for the pointers -- http://mail.python.org/mailman/listinfo/python-list