Rafael Darder Calvo wrote: >> > > Please recommend a module that allows persistent set/dict storage + >> > > fast query that best fits my problem, >> > >> > What is the problem you are trying to solve? How many keys do you have? >> >> Corpus processing. There are in the order of billions to tens of >> billions keys (64bit integers). >> > I would recommend you to use a database since it meets your > requirements (off-memory, fast, persistent). The bsdddb module > (berkeley db) even gives you a dictionary like interface. > http://www.python.org/doc/lib/module-bsddb.html
Standard SQL databases can work for this, but generally your recommendation of using bsddb works very well for int -> int mappings. In particular, I would suggest using a btree, if only because I have had troubles in the past with colliding keys in the bsddb.hash (and recno is just a flat file, and will attempt to create a file i*(record size) to write to record number i . As an alternative, there are many search-engine known methods for mapping int -> [int, int, ...], which can be implemented as int -> int, where the second int is a pointer to an address on disk. Looking into a few of the open source search implementations may be worthwhile. - Josiah -- http://mail.python.org/mailman/listinfo/python-list