On Mar 31, 2006, at 1:06 AM, Marco Bambini wrote:
Unfortunately that does not make any difference if the search is made
using 'LIKE' because SQLite will ignore the index and will perform a
full table scan. See 5.5.1 at:
http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
The document refers to version 2.8.6 (10 September 2003), a lot of
changes occurs in sqlite since that version.
Anyway, sqlite doesn't officially supports full text index, but a
guy wrote a small full text search engine for sqlite3 on
sourceforge.net.
It is C/C++ code and I am not sure you can do the same things with
the REALSQLDatabase, even if I am quite sure you can recreate the
same algorithms with RB and SQLite3ProfessionalPlugin.
This is the description of his work and a link to the project:
Basic ideas are as follow:
--------------------------
* use another sqlite3 file for storing full text index
information
* store everything into sqlite3 (this is not the
fastest strategy in particular for inverted index)
* don't care too much about disk space constrains
(ft3 use roughly 8x more space than initial datas)
* be resonnably efficient (index the 500k small
documents in the english part of dmoz.org in 2H on
my desktop PC, a search with frequent words take
around 3/5 seconds on 1 IDE disk whitout cache, and
less than 0,5 second if data in cache)
Support basic things:
---------------------
* for words: stemming, metaphone, stopwords,
dictionnary, some statistics
* for scores: TFIDF, proximity
* special parsing for urls
* special parsing for topics (If you have some)
* web classical syntax for queries (support ™±/""/:)
* make it easy to search from a php module
* configuration stored inside sqlite3
* language detection (currently unplugged)
<snip>
You could certainly do this in REALbasic. A useful, free resource for
the dictionary etc would be WordNet <http://wordnet.princeton.edu/>.
But you're getting into some moderately complex stuff here. See, for
example: <http://citeseer.ist.psu.edu/context/20836/0> to see that
this is an area of active research for computer scientists. There are
standard approaches, though. Typical is to index all the substrings
of a particular length of each word, along with other information
such as in what order they occur. As you can imagine, the index can
wind up being many times the size of the original data.
Depending on your project, a compelling alternative might be Lucene:
<http://lucene.apache.org/java/docs/>. Lucene is very, very good at
this kind of thing.
OTOH, I'd love to see someone put together an all-REALbasic solution
for this. Extra bonus points if you implement an index that lets me
do grep searches. :-)
Regards,
Guyren G Howe
guyren-at-relevantlogic.com
http://relevantlogic.com
REALbasic, PHP, Python programming
PostgreSQL, MySQL database design and consulting
Technical writing and training
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>