Re: [sqlite] self joins efficiency question

2006-02-03 Thread Marvin K. Bellamy

Kurt Welgehausen wrote:
select user_id from person p1, person p2 where p1.user_id<>p2.user_id 
and p1.name = p2.name



Your query is fine. It's slow because it's doing a full
scan of p1, and for each row in p1 it's doing a full
scan of p2. That makes the time O(n^2).

It should go much faster if you add an index on name.
You may also need to reverse the order of the predicates
so SQLite will pick the right index -- the optimizer may
handle it correctly anyway, but I'm not sure.

Regards

  


I added an index on the name column and the query ran instantaneously.  
I didn't need to change predicate order either.  Thx!


Re: [sqlite] self joins efficiency question

2006-02-02 Thread Kurt Welgehausen
> select user_id from person p1, person p2 where p1.user_id<>p2.user_id 
> and p1.name = p2.name

Your query is fine. It's slow because it's doing a full
scan of p1, and for each row in p1 it's doing a full
scan of p2. That makes the time O(n^2).

It should go much faster if you add an index on name.
You may also need to reverse the order of the predicates
so SQLite will pick the right index -- the optimizer may
handle it correctly anyway, but I'm not sure.

Regards


Re: [sqlite] self joins efficiency question

2006-02-02 Thread Paolo Vernazza

Marvin K. Bellamy ha scritto:
I'm still a noob to SQL and I *think* I just found my first real-world 
need to perform a self join.  Given the table:


person (user_id integer primary key, name text)

I want to find all the entries with identical names (before adding a 
unique index).  I executed this query which took an extremely long 
time (over 30 seconds) to complete on a table with 8,000+ rows.


select user_id from person p1, person p2 where p1.user_id<>p2.user_id 
and p1.name = p2.name


This is more an administrative task, so performance isn't a concern.  
But, I was curious about a more efficient option for this query.



You can try this.
SELECT user_id FROM person GROUP BY name HAVING count(*) > 1

This will return 1 id for any group; to get a resultset similar to the 
one you are receiving now you have to do:


SELECT user_id FROM person WHERE name IN (SELECT name FROM person GROUP 
BY name HAVING count(*) > 1)


Paolo


Re: [sqlite] self joins efficiency question

2006-02-02 Thread Jay Sprenkle
On 2/2/06, Marvin K. Bellamy <[EMAIL PROTECTED]> wrote:
> I'm still a noob to SQL and I *think* I just found my first real-world
> need to perform a self join.  Given the table:
>
> person (user_id integer primary key, name text)
>
> I want to find all the entries with identical names (before adding a
> unique index).  I executed this query which took an extremely long time
> (over 30 seconds) to complete on a table with 8,000+ rows.
>
> select user_id from person p1, person p2 where p1.user_id<>p2.user_id
> and p1.name = p2.name
>
> This is more an administrative task, so performance isn't a concern.
> But, I was curious about a more efficient option for this query.

You might want to look at this:

You could do
insert or replace person into p1
There's more about that option here:
http://sqlite.org/lang_conflict.html


[sqlite] self joins efficiency question

2006-02-02 Thread Marvin K. Bellamy
I'm still a noob to SQL and I *think* I just found my first real-world 
need to perform a self join.  Given the table:


person (user_id integer primary key, name text)

I want to find all the entries with identical names (before adding a 
unique index).  I executed this query which took an extremely long time 
(over 30 seconds) to complete on a table with 8,000+ rows.


select user_id from person p1, person p2 where p1.user_id<>p2.user_id 
and p1.name = p2.name


This is more an administrative task, so performance isn't a concern.  
But, I was curious about a more efficient option for this query.