On Thursday, 31 May 2012 16:25:10 UTC+1, duncan smith wrote: > On 31/05/12 06:15, John Nagle wrote: > > On 5/30/2012 6:57 PM, duncan smith wrote: > >> Hello, > >> I have been attempting to speed up some code by using an sqlite > >> database, but I'm not getting the performance gains I expected. > > > > SQLite is a "lite" database. It's good for data that's read a > > lot and not changed much. It's good for small data files. It's > > so-so for large database loads. It's terrible for a heavy load of > > simultaneous updates from multiple processes. > > > > Once the table is created the data will not be changed at all. > Corresponding integer codes will have to be generated for columns. (I > want to do this lazily because some columns might never be needed for > output files, and processing all columns was relatively expensive for my > initial solution.) After that it's a series of 'SELECT a, b, ... FROM > table WHERE f="g" ORDER by a, b, ...' style queries dumped to space > separated text files. > > > However, wrapping the inserts into a transaction with BEGIN > > and COMMIT may help. > > > > Unfortunately there's no discernible difference. > > > If you have 67 columns in a table, you may be approaching the > > problem incorrectly. > > > > Quite possibly. I have defined start and end points. The data are > contained in text files. I need to do the mapping to integer codes and > generate output files for subsets of variables conditional on the levels > of other variables. (I was doing the subsequent sorting separately, but > if I'm using SQL I guess I might as well include that in the query.) The > output files are inputs for other (C++) code that I have no control over. > > Any approach that doesn't consume large amounts of memory will do. Cheers. > > Duncan
It might be worth checking out https://sdm.lbl.gov/fastbit/ which has Python bindings (nb: the library itself takes a while to compile), but I'm not I00% sure it would meet all your requirements. Jon -- http://mail.python.org/mailman/listinfo/python-list