On Fri, Feb 23, 2007, Alan Gauld wrote: >"Bill Campbell" <[EMAIL PROTECTED]> wrote > >>>It seems that an SQL database would probably be the way to go, but I >>>am a bit concerned about speed issues (even though running time is >> ... >> You would probably be better off using one of the hash databases, >> Berkeley, gdbm, etc. (see the anydbm documentation). These can >> be treated exactly like dictionaries in python, and are probably >> orders of magnitude faster than using an SQL database. > >I'm glad Bill suggested this because I'd forgotten about them >entirely! >But while they wont literally be "orders of magnitude" faster - the >disk I/O subsystem is usually the main limiter here - they will be >several factors faster, in fact many SQL databases use the dbm >database under the hood.
While the disk subsystem is going to be a factor, the overhead communicating with the SQL server, parsing the queries, etc. will be far greater than calculating location of the record using the hashed key. FWIW: I've found that the size of Berkeley DB btree files can be significantly less than the Berkeley hash files. I would really like to see somebody come up with a good alternative to the Berkeley DB stuff from sleepcat. The source code is the most godawfull mess if #ifn*defs I've ever seen, with frequent API even in minor release levels. Take a look at the bdb source in python or perl if you want to see what I'm talking about. Bill -- INTERNET: [EMAIL PROTECTED] Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676 ``the purpose of government is to reign in the rights of the people'' -Bill Clinton during an interview on MTV in 1993 _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor