Oleg, the data I have right now was generated using a random paragraph generator. The words are real words, but there are only 508 distinct keywords in the 3,000,000 records that tsearch2 will pick up, using default settings. I was using this data set for the purpose of testing tsearch2's capabilities, so it's not real world data. If you still want it, let me know where to send it and I will send you a dump of the DB.

Kris



Oleg Bartunov wrote:

Kris,

we're working on prototype of tsearchd - full text search daemon, which
maintain static inverted index outside of postgresql using the same
parser, dictionary tsearch2 does.  This approach could scale up
fts capability preserving access to metadata, so yo may have
"archive" part of your collection (tsearchd) and "online", which could be
searchable with tsearch2.

Here is what we have right now:

pages ( tid integer, fts_index  tsvector)

1. Create index
select count(tdindex(tid,fts_index)) from pages;
2. Flush index
select tdflush();
3. Search
select pages.tid, rank(fts_index,to_tsquery('supernovae & magellan')) as rank
from pages, tdsearch(to_tsquery('supernovae & magellan')) as idx where
tid=idx order by rank desc;

If it's possible, you could share your data, so we could test our
prototype on real data.


Oleg




---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to