Matt Hamilton wrote:
> I would like to help if I had time :)  I think the most efficient way of
> doing what you want is to construct an index based on a 'Suffix Trie' this
> essentially allows matching of arbitrary substrings very quickly, the only
> problem is that it takes up a fair amount of space.  The upside is that it
> can be updated and incrementally added to quite easily (unlike many
> inverted list implementations).
> I confess I have not had the chance to look at the pluggable index types
> in 2.4, but would really like to as I would like to port over some
> indexing code I was working on for another project that allows compressed
> storage of inverted lists for indexes.  On average you can store a 32-bit
> document id/ref in around 4 bits, which means you save a lot of space and
> can keep stopwords in the lexicon (as an example try searching for 'to be
> or not to be' in an index that removes stopwords :).  Not only do you save
> space, but due to the way the inverted list is read and decompressed you
> save time on disk access for large indexes as there is less to physically
> read.

Wow Matt, you seem to know what you're talking about :-)

If you get a chance to implement the index I asked about, please gimme a shout,
I'd love to try it out...



PS: Whereabouts in the UK are you?

Zope-Dev maillist  -  [EMAIL PROTECTED]
**  No cross posts or HTML encoding!  **
(Related lists - )

Reply via email to