On 3-Jul-08, at 5:13 PM, Chris Harris wrote:

That's pretty much impossible (way too small). Double check those numbers.

I don't know where I got the above numbers. Sorry. Here are the real numbers:

.tis file: 730MB
.frq files: 10.1 GB
.prx file: 43.2 GB

Now keeping all *that* in RAM, that sounds like a challenge.

It doesn't have to be *all* in RAM... the OS will figure out what parts are needed.

One alternative you might consider is using a flash hard drive. Another is to index bigrams as terms, and do phrase queries using the conjunction of the bigrams of a phrase. This should make phrase queries only a few times slower than term queries, and probably inflate your .frq to "only" 25GB (.prx could be ignored).

Some other tricks, like stop word removal, also speed up phrase queries.

-Mike

Reply via email to