I am investigating whether it is useful to directly query a database
containing a rather large text corpus (order of magnitude 100k - 1m
newspaper articles, so around 100 million words), or whether I should
use third party text indexing services. I want to know things such as:
how often is a certain word (or pattern) mentioned in an article and how
often it is mentioned with the condition that another word is nearby
(same article or n words distant).

You really want to use the contrib/tsearch2 module that comes already with PostgreSQL.


cd contrib/tsearch2
gmake install
psql <mydb> < tsearch2.sql
more README.tsearch2

Chris


---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to