----- Original Message ----- From: "Chris Withers" <[EMAIL PROTECTED]> To: "Matt Hamilton" <[EMAIL PROTECTED]> Cc: "Casey Duncan" <[EMAIL PROTECTED]>; "Steve Alexander" <[EMAIL PROTECTED]>; "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, November 28, 2001 09:27 Subject: Re: [Zope-dev] Catalog improvements
> Matt Hamilton wrote: > > > > I would like in on that too :) About a year or so ago I was working on a > > full-text indexing system for indexing several gigabytes of text (mailing > > list archives). Most of it was written in C and uses quite a lot of cool > > algorithms from various information retrieval papers and books. I have > > been hoping to have the time to take parts of it and work it into the new > > PluginIndex architecture. The existing code uses BerkeleyDB files to hold > > the index structures, but I would like to use ZODB instead to give it a > > bit more modularity. > > Hi Matt, > > Are any of these algorithms publicly available? I'd be _very_ interested in them > :-) > I think the software "MG" from the book "Managing Gigabytes" is GPLed and currently released as mg-1.21. Walking through the TOC of the book, it seems to be a very detailed sources about text processing and gives very much informations about different indexes types. But I miss some explanations about current data structures like suffix arrays or suffix tree that have several advantages for text processing compared to B-Trees. Andreas --------------------------------------------------------------------- - Andreas Jung Zope Corporation - - EMail: [EMAIL PROTECTED] http://www.zope.com - - "Python Powered" http://www.python.org - - "Makers of Zope" http://www.zope.org - - "Life is a fulltime occupation" - --------------------------------------------------------------------- _______________________________________________ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )