Hello,

I had a 135MB, 1256132 lines, '@' separated text file containing various words and text fields (like a dictionary).
Example record:
[EMAIL PROTECTED]@[EMAIL PROTECTED],[EMAIL PROTECTED] altana : μικρός ανθόκηπος - εξώστης, ταράτσα@@@@@@@@@@@


I imported the data in sqlite3.3.6 but when querying with the 'like' operator, the performance way too slow (about 1.5-2 seconds/query):

>time sqlite3 dicts.db "select * from table1 where word like 'asdf%';"
1.156u 0.491s 0:01.64 100.0%    0+0k 0+0io 0pf+0w

Creating an index did not help at all. Using the COLLATE NOCASE is not of help either since text encoding is not iso-8859-1 (but still is 8-bit).

FYI using egrep takes only 0.14s to get results in the worse case scenario:
>time egrep -i "[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@asdf" 
meta.txt
0.077u 0.069s 0:00.14 92.8%     0+0k 0+0io 0pf+0w

1) I know egrep is not a DB but does sqlite use such an inefficient search algorithm for content that cannot be indexed? Why not reverting to simple 'grep-like' methods? Or am I missing something trivial here? 2) Why doesn't an index raise performance at all in this case? Is it because non-latin chars are used?


Please clarify, I'm really confused about this.
Thank you.

Spiros Ioannou

p.s.
if answers to (1) and (2) don't help, I'm going to write a php-egrep plugin to avoid popen-ing :-)


--
Image Video & Multimedia Systems Lab.
Department of Electrical & Computer Eng.
National Technical University of Athens
http://www.image.ece.ntua.gr

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to