Re: RealSQLDatabase text index

Marco Bambini Thu, 30 Mar 2006 23:06:19 -0800

Unfortunately that does not make any difference if the search is made
using 'LIKE' because SQLite will ignore the index and will perform a
full table scan. See 5.5.1 at:
http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html

The document refers to version 2.8.6 (10 September 2003), a lot ofchanges occurs in sqlite since that version.

Anyway, sqlite doesn't officially supports full text index, but a guywrote a small full text search engine for sqlite3 on sourceforge.net.It is C/C++ code and I am not sure you can do the same things withthe REALSQLDatabase, even if I am quite sure you can recreate thesame algorithms with RB and SQLite3ProfessionalPlugin.


This is the description of his work and a link to the project:

Basic ideas are as follow:
--------------------------
* use another sqlite3 file for storing full text index

  information
* store everything into sqlite3 (this is not the
  fastest strategy in particular for inverted index)
* don't care too much about disk space constrains
  (ft3 use roughly 8x more space than initial datas)
* be resonnably efficient (index the 500k small
  documents in the english part of dmoz.org in 2H on
  my desktop PC, a search with frequent words take
  around 3/5 seconds on 1 IDE disk whitout cache, and
  less than 0,5 second if data in cache)

Support basic things:
---------------------
* for words: stemming, metaphone, stopwords,
  dictionnary, some statistics
* for scores: TFIDF, proximity
* special parsing for urls
* special parsing for topics (If you have some)
* web classical syntax for queries (support ™±/""/:)
* make it easy to search from a php module
* configuration stored inside sqlite3
* language detection (currently unplugged)

Important Missing things:
-------------------------
* Trigger support (currently not an incremental
  process, ft3_indexer reindex the full table)
* ICU integration
* language detection to be put in place and debug
* declare functions via sqlite3_func
* SQL for computing co-occurence table is slow/buggy
* It currently lacks a decent documentation which is
  the next thing on my todo list.
* It is not integrated within sqlite3 but it may be in

  the future if I understand enough of sqlite3
internal.

Example:
--------
from dmoz.org rdf dump,
extract every documents not under Top/World
dmozen.db3 =>  197MO 578k docs
dmozen.ft3 => 1704MO 646k words, 20,000k positions



Code is in C++ with some external dependancies
(google sparsehash, libtextcat, and of course sqlite3.
It currently works under Linux 32bits and Cygwin, tested
with g++-3.4.4. Quality is Beta.

It is under a BSD license.

http://sourceforge.net/projects/ft3

---
Marco Bambini
http://www.sqlabs.net
http://www.sqlabs.net/blog/


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: RealSQLDatabase text index

Reply via email to