Hi,
 
I have tables with millions of sentences. Each row contains a sentence. It is 
natural language and every language is possible, but the sentences of one table 
have the same language.
I have to do a similarity search on them. It has to be very fast, because I 
have to search for a few hundert sentences many times.
The search shouldn't be context-based. It should just get sentences with 
similar words(maybe stemmed).
 
I already had a try with gist/gin-index-based trigramm search (pg_trgm 
extension), fulltextsearch (tsearch2 extension) and a pivot-based indexing 
(Fixed Query Array), but it's all to slow or not suitable.
Soundex and Metaphone aren't suitable, as well.
 
I'm already working on this project since a long time, but without any success.
Do any of you have an idea?
 
I would be very thankful for help.
 
Janek Sendrowski


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to