On 26 May 2011, at 3:04am, Frank Chang wrote:

>       In the second phase, we read the sqlite WAL database and try to find 
> out the duplicates in our input records. Here, we are only reading the sqlite 
> WAL database. We would like to find out how to optimize the read performance 
> of the sqlite WAL database during the second phase of deduping? Please  let 
> us know if you have any suggestions.

Can you tell us how you would spot a duplicate ?  Is it as simple as having one 
or more columns identical in two different rows ?  If so, then it will be 
faster not to insert the duplicates in the first place.  Create a unique index 
on the columns (unless it's your PRIMARY KEY)

CREATE UNIQUE INDEX uniqueIndex ON myTable (uniqueColumn)

and use the special form

INSERT OR IGNORE INTO myTable ...

This will make SQLite ignore any insert commands which would break the 'UNIQUE' 
requirement.  They won't create any error, they will just be ignored.  This is 
extremely fast.  And since duplicate rows never make it into the table, you 
don't have to do anything else: once you've done all the INSERTs just read the 
entire table back out again.

Simon.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to