Hi, I am trying to design a large text search database. It will have upwards of 6 million documents, along with meta data on each.
I am currently looking at tsearch2 to provide fast text searching and also playing around with different hardware configurations. 1. With tsearch2 I get very good query times up until I insert more records. For example with 100,000 records tsearch2 returns in around 6 seconds, with 200,000 records tsearch2 returns in just under a minute. Is this due to the indices fitting entirely in memory with 100,000 records? 2. As well as whole word matching i also need to be able to do substring matching. Is the FTI module the way to approach this? 3. I have just begun to look into distibuted queries. Is there an existing solution for distibuting a postgresql database amongst multiple servers, so each has the same schema but only a subset of the total data? Any other helpful comments or sugestions on how to improve query times using different hardware or software techniques would be appreciated. Thanks, Mat ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match