* Thus wrote Cesar Cordovez ([EMAIL PROTECTED]):
> 
> 2. save in the keyword table the non repeating words in the array with a 
> reference to the original document, for example the document id.
> 
> 3.  Then, if you want to search for, let say, people you will do:
> 
> select distinct(docid) from keywords where word='people'
> 
> and you will have a list (cursor) with all the documents that have the 
> word "people".

To take it a step further, get a count of how many matches are
made so there is some sort of relevency:

select count(*) as qty, docid  from keywords where word='people'
group by docid order by qty

Doing this however will require the db to hit the disk (tmpfile)
and do sorting and grouping, which is a bad thing. So I'll take it
a bit further :)

Instead of just adding a word to table of words, you add a field
that holds the qty of times it appears in the document.  So now the
sql looks something like:

select qty, docid  from keywords where word='people' order by qty

With an index on qty, the query should be rather fast. And now you
can join the keyword table and the main document table together so
you can display the results:

select k.qty, doc.*  from keywords k, documents doc 
  where k.docid = doc.id k.word='people' 
  order by k.qty


I do this very exact thing for my graphs and reports listed below.

Curt
-- 
"My PHP key is worn out"

  PHP List stats since 1997: 
          http://zirzow.dyndns.org/html/mlists/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to