Matt Hamilton wrote:
> I would like to help if I had time :) I think the most efficient way of
> doing what you want is to construct an index based on a 'Suffix Trie' this
> essentially allows matching of arbitrary substrings very quickly, the only
> problem is that it takes up a fair amount of space. The upside is that it
> can be updated and incrementally added to quite easily (unlike many
> inverted list implementations).
> I confess I have not had the chance to look at the pluggable index types
> in 2.4, but would really like to as I would like to port over some
> indexing code I was working on for another project that allows compressed
> storage of inverted lists for indexes. On average you can store a 32-bit
> document id/ref in around 4 bits, which means you save a lot of space and
> can keep stopwords in the lexicon (as an example try searching for 'to be
> or not to be' in an index that removes stopwords :). Not only do you save
> space, but due to the way the inverted list is read and decompressed you
> save time on disk access for large indexes as there is less to physically
Wow Matt, you seem to know what you're talking about :-)
If you get a chance to implement the index I asked about, please gimme a shout,
I'd love to try it out...
PS: Whereabouts in the UK are you?
Zope-Dev maillist - [EMAIL PROTECTED]
** No cross posts or HTML encoding! **
(Related lists -