> In the sub-SELECT you use * so you get all columns. That's
> not necessary.
>

Ah, thanks!

> It might be better to rewrite as a self-join using LEFT
> OUTER JOIN on filename and filepath, using an NULL istarget
> from one or the other alias as an indication that target
> instance of the file is missing.
>
> Something like:
>
> SELECT source.filepath, source.filename
>  FROM files AS source
>  LEFT OUTER JOIN files AS target
>  USING (filepath,filename)
>  WHERE source.istarget = 0
>    AND target.istarget IS NULL
>  ORDER BY whatever you like;
>
> (untested)
>

But with this solution I have to iterate through the whole resultset,
and will not this iteration of 137 000 files be kinda slow?

> You could normalise some more, by creating a second table
>
> CREATE TABLE Pathnames (
>        pathid    INTEGER PRIMARY KEY,
>        pathname  UNIQUE
> );
> and referring to its with a foreign key in the file table.
>
> CREATE TABLE Files (
>        pathid INTEGER
>                CONSTRAINT fk_path REFERENCES Pathnames (pathid)
>                ON INSERT RESTRICT
>                ON UPDATE RESTRICT,
>        filename TEXT,
>        istarget INTEGER,
>   PRIMARY KEY (filepathid,filename,istarget)
> );
>
> This reduces the overall database size, improves the amount
> of unique data in a database page, and above that comparing
> integers is faster than strings.
>

Yeah, I know I could normalize, but I thought it added more complexity
than was required but perhaps not...

> I can't help noticing that comparing two directories in the
> same system would cause the source and target paths to be
> different by definition (assuming the filepath column
> represents the absolute path), but I guess you are aware of
> that :)

Heh, yes. The paths are stored as relative so c:\my
documents\folder1\file1 and d:\backup\folder1\file1 are both stored as
folder1\file1 in the database.

In any case, I'll try your suggestions with primary key and
normalization and see how it goes.
Thanks!
/Jonas
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to