On Fri, Oct 10, 2014 at 9:53 AM, Tomas Vondra <t...@fuzzy.cz> wrote: > > On 10.10.2014 16:21, Craig James wrote: > > Our index is for chemical structures. Chemicals are indexed on > > chemical fragments > > <http://emolecules.com/info/molecular-informatics>. A search > > typically starts with 50-200 indexed "columns" (chemical fragments). > > The query is always flat, "A and B and ... and Z". The indexed > > fragments are both correlated (the existence of one strongly raises > > the chances of another) and anti-correlated (certain combinations are > > very rare). > > Maybe I don't understand the problem well enough, but isn't this a > perfect match for GIN indexes? I mean, you essentially need to do > queries like "WHERE substance @@ ('A & B & !C')" etc. Which is exactly > what GIN does, because it keeps pointers to tuples for each fragment. >
On the day our web site opened we were using tsearch. Before the end of the day we realized it was a bad idea, for the very reasons discussed here. The early-abort/late-start problem ("offset N limit M") could take minutes to return the requested page. With the external dynamically-optimized index, we can almost always get answers in less than a couple seconds, often in 0.1 seconds. Craig