On Mar 31, 2006, at 1:06 AM, Marco Bambini wrote:

Unfortunately that does not make any difference if the search is made
using 'LIKE' because SQLite will ignore the index and will perform a
full table scan. See 5.5.1 at:
http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html

The document refers to version 2.8.6 (10 September 2003), a lot of changes occurs in sqlite since that version.

Anyway, sqlite doesn't officially supports full text index, but a guy wrote a small full text search engine for sqlite3 on sourceforge.net. It is C/C++ code and I am not sure you can do the same things with the REALSQLDatabase, even if I am quite sure you can recreate the same algorithms with RB and SQLite3ProfessionalPlugin.

This is the description of his work and a link to the project:

Basic ideas are as follow:
--------------------------
* use another sqlite3 file for storing full text index

  information
* store everything into sqlite3 (this is not the
  fastest strategy in particular for inverted index)
* don't care too much about disk space constrains
  (ft3 use roughly 8x more space than initial datas)
* be resonnably efficient (index the 500k small
  documents in the english part of dmoz.org in 2H on
  my desktop PC, a search with frequent words take
  around 3/5 seconds on 1 IDE disk whitout cache, and
  less than 0,5 second if data in cache)

Support basic things:
---------------------
* for words: stemming, metaphone, stopwords,
  dictionnary, some statistics
* for scores: TFIDF, proximity
* special parsing for urls
* special parsing for topics (If you have some)
* web classical syntax for queries (support ™±/""/:)
* make it easy to search from a php module
* configuration stored inside sqlite3
* language detection (currently unplugged)

<snip>

You could certainly do this in REALbasic. A useful, free resource for the dictionary etc would be WordNet <http://wordnet.princeton.edu/>.

But you're getting into some moderately complex stuff here. See, for example: <http://citeseer.ist.psu.edu/context/20836/0> to see that this is an area of active research for computer scientists. There are standard approaches, though. Typical is to index all the substrings of a particular length of each word, along with other information such as in what order they occur. As you can imagine, the index can wind up being many times the size of the original data.

Depending on your project, a compelling alternative might be Lucene: <http://lucene.apache.org/java/docs/>. Lucene is very, very good at this kind of thing.

OTOH, I'd love to see someone put together an all-REALbasic solution for this. Extra bonus points if you implement an index that lets me do grep searches. :-)

Regards,

Guyren G Howe
guyren-at-relevantlogic.com
http://relevantlogic.com

REALbasic, PHP, Python programming
PostgreSQL, MySQL database design and consulting
Technical writing and training


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to