On 23/11/2006, at 5:21 AM, Dr Gerard Hammond wrote:

At 2:16 PM -0700 22/11/06, Norman Palardy wrote:
On Nov 22, 2006, at 1:42 PM, Dr Gerard Hammond wrote:
When I do a SQLselect and search for a fragment of a string in a text field, will the SQLselect actually search the entire contents of my TEXT and BLOB columns.


ie is full-text search builtin?

you can get it to but it does not seem to be indexed in any way that makes it run faster

if you do

        like '%something%'

on a big blob or text it can still take a while



Thanks. Now I understand. full-text search = indexed full text search.

Looking at the SQLlite pages
http://www.sqlite.org/cvstrac/wiki?p=FtsOne shows the example:

sqlite> select name, snippet(poem, '[', ']', '%%') from poem where text match 'land'; ozymandias|i met a traveller from an antique [land] who said: two vast and trunkless legs of %%

This seems to be what I would call keyword-indexed searching, or what I used to refer to as "FileMaker searches" because it was my observation back in the early nineties that FM users expected to be able to type any word into a search field and have it be found.

eg: given a name field containing Andy Dent, users expect to be able to type in Dent and find the same entry as they do when they type in Andy, unless we have spent a lot of hours educating them out of this.

I implemented this in 4D for various systems, one enterprise system of which has been in use continuously for about 15 years now, and it was something people loved. Things they especially appreciated were being able to do a "word starting with" and "word sounds like" search (NOT Soundex).

I made sure the first version of our OOFILE C++ database engine had support for these features!

I'm not sure but I *think* Valentina may have something smarter to speedup fragment searches. I have had a few conversations with Ruslan over the years.

Something which people tend to dismiss is implementation differences affecting sheer speed of reading records from a database and doing whole text searches. SQL engines vary widely in the cost of scanning all records in a table.

The classic text on indexing and searching text is "Managing Gigabytes". See http://www.cs.mu.oz.au/mg/ for more information and source code.

KEYWORD vs FRAGMENT searching
A keyword search for 'dent' will find '"Andy Dent" but not "The page indent".
It may match "the car was dented".

A wildcard search for '%dent%' will match all the above.

TIP
One simple optimisation I built into OOFILE but you can code in any application on top of your database is to factor out single wildcard searches, if you have an indexed way to do "starts with" searches.

Say someone is searching for "den%d"

Break that down into a search for startsWith("den') and a sub-search on the results for those matching "den%d'.

Given the speed of modern computers (vs the kinds of Macs I was delivering on in say 1994) this is probably only worth worrying about for collections in the hundreds of thousands of records, or certainly to be considered if similar searches are taking more than a second.

Andy Dent
databaser of yore

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to