Jim Callahan jim.callahan.orlando at gmail.com wrote: > 1. My condolences with those dimensions you are heading for "big> > data"/hadoop land.
Heheh thanks, I am so determined not to go there! SQLite has been such a nice simple database to use, I will do anything to avoid all the complexity of those map-reduce solutions. > 2. Worry about the number of rows; that's what feeds into the big-oh: O(n). > Assuming your 150 columns translate into a 1.5k to 2k record length that > means your 300 gigabyte file must have 150 (300 gig/2k) to 200 (300 > gig/1.5k) million records. That's a lot of n for O(n). I've got about 200 million rows of 1,100 bytes each. At least it's linear. I would love to partition the rows into 4 databases and query them in parallel. But that's just a fantasy, I can live with it for now. > So, I would recommend double checking the specs of your SSD and not > necessarily making the reasonable, but not yet true, assumption that > solid-state has to be faster than mechanical. I use an SSD for other reasons. However, I've tested sustained sequential read and I am getting 550MB/s. I'm doing full table scans so it's sequential. This is on a laptop, and I am fairly sure there are no mechanical 2.5" HDDs that can transfer 200GB at 550MB/s. > One strategy that might work is to have an entirely separate (not joined) 8 > column table; develop queries (targets) on that database and then write out > the primary key of the rows you are interested in to a separate table > (CREATE TABLE AS SELECT primarykey FROM 8columntable WHERE yourquery;) and > then JOIN the row reduced table to the main table. If your rowreduced table > has millions of fewer records (primary keys) that's millions of rows (in > the main 150 column table) where the precompiled SQL query doesn't have to > be executed. Cheers for that, I'll do some testing and see how I go! Thanks, Paul _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

