----- Original Message -----
From: "Chris Withers" <[EMAIL PROTECTED]>
To: "Matt Hamilton" <[EMAIL PROTECTED]>
Cc: "Casey Duncan" <[EMAIL PROTECTED]>; "Steve Alexander"
Sent: Wednesday, November 28, 2001 09:27
Subject: Re: [Zope-dev] Catalog improvements

> Matt Hamilton wrote:
> >
> > I would like in on that too :)  About a year or so ago I was working on
> > full-text indexing system for indexing several gigabytes of text
> > list archives).  Most of it was written in C and uses quite a lot of
> > algorithms from various information retrieval papers and books.  I have
> > been hoping to have the time to take parts of it and work it into the
> > PluginIndex architecture.  The existing code uses BerkeleyDB files to
> > the index structures, but I would like to use ZODB instead to give it a
> > bit more modularity.
> Hi Matt,
> Are any of these algorithms publicly available? I'd be _very_ interested
in them
> :-)

I think the software "MG" from the book "Managing Gigabytes" is GPLed and
released as mg-1.21. Walking through the TOC of the book, it seems to be a
very detailed
sources about text processing and gives very much informations about
different indexes types.
But I miss some explanations about current data structures like suffix
arrays or suffix tree
that have several advantages for text processing compared to B-Trees.


   -    Andreas Jung                            Zope Corporation       -
  -   EMail: [EMAIL PROTECTED]                http://www.zope.com      -
 -  "Python Powered"                       http://www.python.org     -
  -   "Makers of Zope"                       http://www.zope.org      -
   -                  "Life is a fulltime occupation"                  -

Zope-Dev maillist  -  [EMAIL PROTECTED]
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope )

Reply via email to