Thanks for the tip and will have a look at that.
I have in the meantime made this a lot faster by making the data in both
tables upper case and making the small table smaller by taking out invalid
records. I can then run a simpler join with glob, although I noticed it
still doesn't use the index. Sounds like your solution will be faster still.

RBS


-----Original Message-----
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of marbex
Sent: 12 September 2009 12:25
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Any way to speed up this SQL?


I had a similar issue. I wanted to find strings that started as another
string in the same table and field. My solution was to create a temp table
that consisted of the id, the string and the first word of the string which
I then indexed. The table had 30 000 records and the total processing time
went down from 15-20 minutes to 15 seconds!

Applying that solution to your case, not knowing the nature of your data (I
had names), I guess you can do something like this:

- Get the length of the shortest string in table2.term. Lets say it's 3.

- Create a tmptable of table1
Create temp tmptable1 as
select term, lower(substr(term,1,3)) shortest
from table1

- Create a tmptable of table2
Create temp tmptable2 as
select term, lower(substr(term,1,3)) shortest
from table2

- Index the shortest fields
Create index idx_tmptable1_shortest on tmptable1(shortest)
Create index idx_tmptable2_shortest on tmptable2(shortest)

-Run this sql
select
count(a.rowid)
from tmptable1 a inner join tmptable2 b on a.shortest=b.shortest
where (lower(b.term) = lower(substr(a.term,1,length(b.term))))



RB Smissaert wrote:
> 
> Have 2 tables with both one text field called term and need to run a SQL
> like this, to count the records in table1 where the start of term in
> table1
> equals a term in table2:
> 
> select
> count(a.rowid)
> from table1 a inner join table2 b on
> (lower(b.term) = lower(substr(a.term,1,length(b.term))))
> 
> term is indexed in both tables, but not surprisingly, this query runs very
> slow, taking some 40 minutes. Table1 is large, maybe some 1 million rows
> and
> table2 is small, maybe some 30.000 rows. All rows in table2 are unique,
> but
> table1 has many duplicates.
> 
> Any suggestions to speed this up?
> I could also tackle this in code rather than in SQL.
> 
> RBS
> 
> 
> 
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 

-- 
View this message in context:
http://www.nabble.com/Any-way-to-speed-up-this-SQL--tp25412299p25413614.html
Sent from the SQLite mailing list archive at Nabble.com.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to