Re: [sqlite] self joins efficiency question
Kurt Welgehausen wrote: select user_id from person p1, person p2 where p1.user_id<>p2.user_id and p1.name = p2.name Your query is fine. It's slow because it's doing a full scan of p1, and for each row in p1 it's doing a full scan of p2. That makes the time O(n^2). It should go much faster if you add an index on name. You may also need to reverse the order of the predicates so SQLite will pick the right index -- the optimizer may handle it correctly anyway, but I'm not sure. Regards I added an index on the name column and the query ran instantaneously. I didn't need to change predicate order either. Thx!
Re: [sqlite] self joins efficiency question
> select user_id from person p1, person p2 where p1.user_id<>p2.user_id > and p1.name = p2.name Your query is fine. It's slow because it's doing a full scan of p1, and for each row in p1 it's doing a full scan of p2. That makes the time O(n^2). It should go much faster if you add an index on name. You may also need to reverse the order of the predicates so SQLite will pick the right index -- the optimizer may handle it correctly anyway, but I'm not sure. Regards
Re: [sqlite] self joins efficiency question
Marvin K. Bellamy ha scritto: I'm still a noob to SQL and I *think* I just found my first real-world need to perform a self join. Given the table: person (user_id integer primary key, name text) I want to find all the entries with identical names (before adding a unique index). I executed this query which took an extremely long time (over 30 seconds) to complete on a table with 8,000+ rows. select user_id from person p1, person p2 where p1.user_id<>p2.user_id and p1.name = p2.name This is more an administrative task, so performance isn't a concern. But, I was curious about a more efficient option for this query. You can try this. SELECT user_id FROM person GROUP BY name HAVING count(*) > 1 This will return 1 id for any group; to get a resultset similar to the one you are receiving now you have to do: SELECT user_id FROM person WHERE name IN (SELECT name FROM person GROUP BY name HAVING count(*) > 1) Paolo
Re: [sqlite] self joins efficiency question
On 2/2/06, Marvin K. Bellamy <[EMAIL PROTECTED]> wrote: > I'm still a noob to SQL and I *think* I just found my first real-world > need to perform a self join. Given the table: > > person (user_id integer primary key, name text) > > I want to find all the entries with identical names (before adding a > unique index). I executed this query which took an extremely long time > (over 30 seconds) to complete on a table with 8,000+ rows. > > select user_id from person p1, person p2 where p1.user_id<>p2.user_id > and p1.name = p2.name > > This is more an administrative task, so performance isn't a concern. > But, I was curious about a more efficient option for this query. You might want to look at this: You could do insert or replace person into p1 There's more about that option here: http://sqlite.org/lang_conflict.html
[sqlite] self joins efficiency question
I'm still a noob to SQL and I *think* I just found my first real-world need to perform a self join. Given the table: person (user_id integer primary key, name text) I want to find all the entries with identical names (before adding a unique index). I executed this query which took an extremely long time (over 30 seconds) to complete on a table with 8,000+ rows. select user_id from person p1, person p2 where p1.user_id<>p2.user_id and p1.name = p2.name This is more an administrative task, so performance isn't a concern. But, I was curious about a more efficient option for this query.