On Fri, 27 Mar 2009 15:53:18 +0100, Jonas Sandman <jonas.sand...@gmail.com> wrote:
>Hello, > >I have a database with about 137000 * 2 rows with four columns; >fileid, filename, filepath and istarget. >It's used to determine if two scanned directories are equal so I run a >simply query to get the rows that are missing on the target directory >but do exists in the source directory. > >I use this query: > >SELECT f.filepath, f.filename FROM files f >WHERE f.istarget=0 >AND NOT EXISTS (SELECT * FROM files WHERE filepath=f.filepath AND >filename=f.filename AND istarget=1) In the sub-SELECT you use * so you get all columns. That's not necessary. SELECT f.filepath, f.filename FROM files f WHERE f.istarget=0 AND NOT EXISTS ( SELECT ROWID FROM files WHERE filepath=f.filepath AND filename=f.filename AND istarget=1 ); It might be better to rewrite as a self-join using LEFT OUTER JOIN on filename and filepath, using an NULL istarget from one or the other alias as an indication that target instance of the file is missing. Something like: SELECT source.filepath, source.filename FROM files AS source LEFT OUTER JOIN files AS target USING (filepath,filename) WHERE source.istarget = 0 AND target.istarget IS NULL ORDER BY whatever you like; (untested) >and I have experimented with some index to improve the speed, both >index (filename, filepath, istarget), (filename), (filepath) etc... The SQLite optimizer can only use one index at a time. In general, the index should be as selective as possible. This is known as cardinality. In your case, (filename, filepath) should do, unless all files in all directories have the same series of names, in which (filepath, filename) could be better. Adding istarget doesn't hurt, it would make the index usable as a primary key. If you defined more than one index, you can help the optimizer to choose the best index by running ANALYZE; on a database filled with representative data. >I am still not quite satisfied with the speed (a few seconds to check this). >Perhaps the table schema itself is the problem? You could normalise some more, by creating a second table CREATE TABLE Pathnames ( pathid INTEGER PRIMARY KEY, pathname UNIQUE ); and referring to its with a foreign key in the file table. CREATE TABLE Files ( pathid INTEGER CONSTRAINT fk_path REFERENCES Pathnames (pathid) ON INSERT RESTRICT ON UPDATE RESTRICT, filename TEXT, istarget INTEGER, PRIMARY KEY (filepathid,filename,istarget) ); This reduces the overall database size, improves the amount of unique data in a database page, and above that comparing integers is faster than strings. I can't help noticing that comparing two directories in the same system would cause the source and target paths to be different by definition (assuming the filepath column represents the absolute path), but I guess you are aware of that :) >Can someone help me? > >Regards, >Jonas -- ( Kees Nuyt ) c[_] _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users