On 26 May 2011, at 3:04am, Frank Chang wrote: > In the second phase, we read the sqlite WAL database and try to find > out the duplicates in our input records. Here, we are only reading the sqlite > WAL database. We would like to find out how to optimize the read performance > of the sqlite WAL database during the second phase of deduping? Please let > us know if you have any suggestions.
Can you tell us how you would spot a duplicate ? Is it as simple as having one or more columns identical in two different rows ? If so, then it will be faster not to insert the duplicates in the first place. Create a unique index on the columns (unless it's your PRIMARY KEY) CREATE UNIQUE INDEX uniqueIndex ON myTable (uniqueColumn) and use the special form INSERT OR IGNORE INTO myTable ... This will make SQLite ignore any insert commands which would break the 'UNIQUE' requirement. They won't create any error, they will just be ignored. This is extremely fast. And since duplicate rows never make it into the table, you don't have to do anything else: once you've done all the INSERTs just read the entire table back out again. Simon. _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

