Re: [Zope-dev] Catalog improvements
Andreas Jung wrote: > > I think the software "MG" from the book "Managing Gigabytes" is GPLed and > currently > released as mg-1.21. Walking through the TOC of the book, it seems to be a > very detailed > sources about text processing and gives very much informations about > different indexes types. > But I miss some explanations about current data structures like suffix > arrays or suffix tree > that have several advantages for text processing compared to B-Trees. Hmmm... looks like it's time ot go buy a book :-) cheers, Chris ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
Yes, this looks promising. Thanks! - Original Message - From: "Andreas Jung" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; "Wolfram Kerber" <[EMAIL PROTECTED]> Cc: "Casey Duncan" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, November 28, 2001 3:18 PM Subject: Re: [Zope-dev] Catalog improvements > TopicIndexes are currently available in the 'ajung-topicindex' branch and > are not yet part of the Zope core. > > Andreas > > - Original Message - > From: "Chris McDonough" <[EMAIL PROTECTED]> > To: "Wolfram Kerber" <[EMAIL PROTECTED]> > Cc: "Casey Duncan" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > Sent: Wednesday, November 28, 2001 10:06 > Subject: Re: [Zope-dev] Catalog improvements > > > > Note that one way to get the effect of "cached queries" is to use a > > TopicIndex, which I believe either Andreas or Tres has implemented. See > > http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes. I can't find > > the actual source code, though. Maybe either Tres or Andreas knows > > where it is? > > > > Wolfram Kerber wrote: > > > Hi > > > > > > No, i wasn't aware of your product :-( , the only one i found was ZOQL > by > > > Stephan Richter, but that didn't help much. Well, now i have written an > > > implementation that reuses some of the code in TextIndex (for > parenthesis > > > parsing and insertion of a default operator) an then saves the query in > RPN > > > format (so the Catalog does't need to think that hard when being > queried). > > > I have taken a look at your product, and i'd say a 'new' Catalog should > have > > > sort of QueryParser plugins that know how to turn string-queries (as > yours) > > > or SQL to native Catalog queries ... > > > I've also contacted the authors of the two proposals, just wasn't sure > > > wether i should start this off, since i have no experience as to how the > > > fishbowl works and i'm expected to finish my current project sometime > soon. > > > > > > > > > Wolfram > > > > > > - Original Message - > > > From: "Casey Duncan" <[EMAIL PROTECTED]> > > > To: "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > > > Sent: Tuesday, November 27, 2001 2:48 PM > > > Subject: Re: [Zope-dev] Catalog improvements > > > > > > > > > > > >>On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote: > > >> > > >>>Hi, > > >>> > > >>>i'm currently working on a product that allows to attach relational > > >>>information to zope-objects. It works quite well so far, but to further > > >>>enhance it i need to make some changes to the Catalog. I could perhaps > > >>>implement it as a separate product, but i strongly feel that those > > >>> > > > changes > > > > > >>>are best applied to the Catalog itself, as they are of general use (i > > >>>think) and involve a lot of changes to the inner workings of the > > >>> > > > Catalog. > > > > > >>>In particular i need the following: > > >>> > > >>>- named/stored queries > > >>>these are precompiled queries, so they can be executed without parsing > > >>> > > > and > > > > > >>>are easily cacheable > > >>>i.e. similar to what is implemented in CMFTopic, but stored in the > > >>> > > > Catalog > > > > > >>>and a bit smarter > > >>> > > >>>- caching support > > >>> > > >>>- unions and intersections > > >>>sub-queries (i.e. queries that are directed at a certain index) should > > >>> > > > be > > > > > >>>more flexibly combineable > > >>> > > >>I have some code that implements this in my CatalogQuery product. It > > >> > > > creates > > > > > >>a query object from a string. Presently these are not persistent, but > they > > >>could easily be made to be to create precompiled queries. > > >> > > >>code at: http://www.zope.org/Members/Kaivo/CatalogQuery > > >> > > >> > > >>>I searched this mailing-list as well as zop
Re: [Zope-dev] Catalog improvements
- Original Message - From: "Matt Hamilton" <[EMAIL PROTECTED]> To: "Andreas Jung" <[EMAIL PROTECTED]> Cc: "Chris Withers" <[EMAIL PROTECTED]>; "Casey Duncan" <[EMAIL PROTECTED]>; "Steve Alexander" <[EMAIL PROTECTED]>; "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, November 28, 2001 09:55 Subject: Re: [Zope-dev] Catalog improvements > On Wed, 28 Nov 2001, Andreas Jung wrote: > > > I think the software "MG" from the book "Managing Gigabytes" is GPLed and > > currently > > released as mg-1.21. Walking through the TOC of the book, it seems to be a > > very detailed > > sources about text processing and gives very much informations about > > different indexes types. > > But I miss some explanations about current data structures like suffix > > arrays or suffix tree > > that have several advantages for text processing compared to B-Trees. > > Suffix Trees/Tries take up a *lot* of space. But they are very fast, and > useful for searching for substrings. Usually four times the amount of the data to be indexed ;-) Andreas ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
On Wed, 28 Nov 2001, Andreas Jung wrote: > I think the software "MG" from the book "Managing Gigabytes" is GPLed and > currently > released as mg-1.21. Walking through the TOC of the book, it seems to be a > very detailed > sources about text processing and gives very much informations about > different indexes types. > But I miss some explanations about current data structures like suffix > arrays or suffix tree > that have several advantages for text processing compared to B-Trees. Suffix Trees/Tries take up a *lot* of space. But they are very fast, and useful for searching for substrings. The main gist of the stuff in 'Managing Gigabytes' is that it is possible to store an ascending list of integers in a compressed form, such that on average each integer requires only 4 bits to represent it. This is obviously much more compact than a straight list of 32 or 64 bit integers/longs (plus any overhead python adds to its inbuild list type). The other point is that you can read and decode the lists very quickly (you don't need to decompress the entire list first before reading it). Also consecutive numbers only take 1 bit of storage, this means that 'stopwords' that are normally omitted from indexes due to their very high frequency (and hence bloat of the index) can be stored very efficiently. One problem is that all of the research done in MG is based on much older hardware than is currently availible and they try to make certain optimisations, which nowadays don't save much time. -Matt -- Matt Hamilton [EMAIL PROTECTED] Netsight Internet Solutions, Ltd. Business Vision on the Internet http://www.netsight.co.uk +44 (0)117 9090901 Web Hosting | Web Design | Domain Names | Co-location | DB Integration ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
- Original Message - From: "Chris Withers" <[EMAIL PROTECTED]> To: "Matt Hamilton" <[EMAIL PROTECTED]> Cc: "Casey Duncan" <[EMAIL PROTECTED]>; "Steve Alexander" <[EMAIL PROTECTED]>; "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, November 28, 2001 09:27 Subject: Re: [Zope-dev] Catalog improvements > Matt Hamilton wrote: > > > > I would like in on that too :) About a year or so ago I was working on a > > full-text indexing system for indexing several gigabytes of text (mailing > > list archives). Most of it was written in C and uses quite a lot of cool > > algorithms from various information retrieval papers and books. I have > > been hoping to have the time to take parts of it and work it into the new > > PluginIndex architecture. The existing code uses BerkeleyDB files to hold > > the index structures, but I would like to use ZODB instead to give it a > > bit more modularity. > > Hi Matt, > > Are any of these algorithms publicly available? I'd be _very_ interested in them > :-) > I think the software "MG" from the book "Managing Gigabytes" is GPLed and currently released as mg-1.21. Walking through the TOC of the book, it seems to be a very detailed sources about text processing and gives very much informations about different indexes types. But I miss some explanations about current data structures like suffix arrays or suffix tree that have several advantages for text processing compared to B-Trees. Andreas - -Andreas JungZope Corporation - - EMail: [EMAIL PROTECTED]http://www.zope.com - - "Python Powered" http://www.python.org - - "Makers of Zope" http://www.zope.org - - "Life is a fulltime occupation" - - ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
Matt Hamilton wrote: > > I would like in on that too :) About a year or so ago I was working on a > full-text indexing system for indexing several gigabytes of text (mailing > list archives). Most of it was written in C and uses quite a lot of cool > algorithms from various information retrieval papers and books. I have > been hoping to have the time to take parts of it and work it into the new > PluginIndex architecture. The existing code uses BerkeleyDB files to hold > the index structures, but I would like to use ZODB instead to give it a > bit more modularity. Hi Matt, Are any of these algorithms publicly available? I'd be _very_ interested in them :-) Chris ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
TopicIndexes are currently available in the 'ajung-topicindex' branch and are not yet part of the Zope core. Andreas - Original Message - From: "Chris McDonough" <[EMAIL PROTECTED]> To: "Wolfram Kerber" <[EMAIL PROTECTED]> Cc: "Casey Duncan" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, November 28, 2001 10:06 Subject: Re: [Zope-dev] Catalog improvements > Note that one way to get the effect of "cached queries" is to use a > TopicIndex, which I believe either Andreas or Tres has implemented. See > http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes. I can't find > the actual source code, though. Maybe either Tres or Andreas knows > where it is? > > Wolfram Kerber wrote: > > Hi > > > > No, i wasn't aware of your product :-( , the only one i found was ZOQL by > > Stephan Richter, but that didn't help much. Well, now i have written an > > implementation that reuses some of the code in TextIndex (for parenthesis > > parsing and insertion of a default operator) an then saves the query in RPN > > format (so the Catalog does't need to think that hard when being queried). > > I have taken a look at your product, and i'd say a 'new' Catalog should have > > sort of QueryParser plugins that know how to turn string-queries (as yours) > > or SQL to native Catalog queries ... > > I've also contacted the authors of the two proposals, just wasn't sure > > wether i should start this off, since i have no experience as to how the > > fishbowl works and i'm expected to finish my current project sometime soon. > > > > > > Wolfram > > > > - Original Message - > > From: "Casey Duncan" <[EMAIL PROTECTED]> > > To: "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > > Sent: Tuesday, November 27, 2001 2:48 PM > > Subject: Re: [Zope-dev] Catalog improvements > > > > > > > >>On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote: > >> > >>>Hi, > >>> > >>>i'm currently working on a product that allows to attach relational > >>>information to zope-objects. It works quite well so far, but to further > >>>enhance it i need to make some changes to the Catalog. I could perhaps > >>>implement it as a separate product, but i strongly feel that those > >>> > > changes > > > >>>are best applied to the Catalog itself, as they are of general use (i > >>>think) and involve a lot of changes to the inner workings of the > >>> > > Catalog. > > > >>>In particular i need the following: > >>> > >>>- named/stored queries > >>>these are precompiled queries, so they can be executed without parsing > >>> > > and > > > >>>are easily cacheable > >>>i.e. similar to what is implemented in CMFTopic, but stored in the > >>> > > Catalog > > > >>>and a bit smarter > >>> > >>>- caching support > >>> > >>>- unions and intersections > >>>sub-queries (i.e. queries that are directed at a certain index) should > >>> > > be > > > >>>more flexibly combineable > >>> > >>I have some code that implements this in my CatalogQuery product. It > >> > > creates > > > >>a query object from a string. Presently these are not persistent, but they > >>could easily be made to be to create precompiled queries. > >> > >>code at: http://www.zope.org/Members/Kaivo/CatalogQuery > >> > >> > >>>I searched this mailing-list as well as zope.org to get an idea about > >>> > > what > > > >>>has already been discussed and requested, and there seems to be some > >>>interest in improving the Catalog. Some people even seem to have worked > >>> > > on > > > >>>this, perhaps they could give an update on this? Possibly i don't have > >>> > > to > > > >>>write everything from scratch... > >>> > >>I would be willing to help both in coding and getting the code put into > >> > > the > > > >>Zope core. > >> > >> > >>>I would have put this into a proposal, but there already are two > >>> > > proposals > > > >>>that deal with the features i want, one is dedicated to > >>>union
Re: [Zope-dev] Catalog improvements
Note that one way to get the effect of "cached queries" is to use a TopicIndex, which I believe either Andreas or Tres has implemented. See http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes. I can't find the actual source code, though. Maybe either Tres or Andreas knows where it is? Wolfram Kerber wrote: > Hi > > No, i wasn't aware of your product :-( , the only one i found was ZOQL by > Stephan Richter, but that didn't help much. Well, now i have written an > implementation that reuses some of the code in TextIndex (for parenthesis > parsing and insertion of a default operator) an then saves the query in RPN > format (so the Catalog does't need to think that hard when being queried). > I have taken a look at your product, and i'd say a 'new' Catalog should have > sort of QueryParser plugins that know how to turn string-queries (as yours) > or SQL to native Catalog queries ... > I've also contacted the authors of the two proposals, just wasn't sure > wether i should start this off, since i have no experience as to how the > fishbowl works and i'm expected to finish my current project sometime soon. > > > Wolfram > > - Original Message - > From: "Casey Duncan" <[EMAIL PROTECTED]> > To: "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > Sent: Tuesday, November 27, 2001 2:48 PM > Subject: Re: [Zope-dev] Catalog improvements > > > >>On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote: >> >>>Hi, >>> >>>i'm currently working on a product that allows to attach relational >>>information to zope-objects. It works quite well so far, but to further >>>enhance it i need to make some changes to the Catalog. I could perhaps >>>implement it as a separate product, but i strongly feel that those >>> > changes > >>>are best applied to the Catalog itself, as they are of general use (i >>>think) and involve a lot of changes to the inner workings of the >>> > Catalog. > >>>In particular i need the following: >>> >>>- named/stored queries >>>these are precompiled queries, so they can be executed without parsing >>> > and > >>>are easily cacheable >>>i.e. similar to what is implemented in CMFTopic, but stored in the >>> > Catalog > >>>and a bit smarter >>> >>>- caching support >>> >>>- unions and intersections >>>sub-queries (i.e. queries that are directed at a certain index) should >>> > be > >>>more flexibly combineable >>> >>I have some code that implements this in my CatalogQuery product. It >> > creates > >>a query object from a string. Presently these are not persistent, but they >>could easily be made to be to create precompiled queries. >> >>code at: http://www.zope.org/Members/Kaivo/CatalogQuery >> >> >>>I searched this mailing-list as well as zope.org to get an idea about >>> > what > >>>has already been discussed and requested, and there seems to be some >>>interest in improving the Catalog. Some people even seem to have worked >>> > on > >>>this, perhaps they could give an update on this? Possibly i don't have >>> > to > >>>write everything from scratch... >>> >>I would be willing to help both in coding and getting the code put into >> > the > >>Zope core. >> >> >>>I would have put this into a proposal, but there already are two >>> > proposals > >>>that deal with the features i want, one is dedicated to >>>unions/intersections, the other (TopicIndexes) to performance issues (i >>>dont't know what's the status of these though, especially the first one >>> > is > >>>rather old), and i don't want to hijack them without asking. As so often >>> > i > >>>will need to complete my current project first, but would then like to >>> > help > >>>in improving the Catalog for a more general use. >>> >>Possibly we need to rekindle discussion. I would suggest contacting the >>authors of those proposals to see how compatible your concepts are wth >>theirs. Perhaps a new proposal should be drafted with the new ideas and ty >>them back to the previous ones. If there is redundancy, that can be worked >>out. >> >> >>>So, if there is interest, i would propose to collect some ideas and >>>comments about how a better C
Re: [Zope-dev] Catalog improvements
Hi No, i wasn't aware of your product :-( , the only one i found was ZOQL by Stephan Richter, but that didn't help much. Well, now i have written an implementation that reuses some of the code in TextIndex (for parenthesis parsing and insertion of a default operator) an then saves the query in RPN format (so the Catalog does't need to think that hard when being queried). I have taken a look at your product, and i'd say a 'new' Catalog should have sort of QueryParser plugins that know how to turn string-queries (as yours) or SQL to native Catalog queries ... I've also contacted the authors of the two proposals, just wasn't sure wether i should start this off, since i have no experience as to how the fishbowl works and i'm expected to finish my current project sometime soon. Wolfram - Original Message - From: "Casey Duncan" <[EMAIL PROTECTED]> To: "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, November 27, 2001 2:48 PM Subject: Re: [Zope-dev] Catalog improvements > On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote: > > Hi, > > > > i'm currently working on a product that allows to attach relational > > information to zope-objects. It works quite well so far, but to further > > enhance it i need to make some changes to the Catalog. I could perhaps > > implement it as a separate product, but i strongly feel that those changes > > are best applied to the Catalog itself, as they are of general use (i > > think) and involve a lot of changes to the inner workings of the Catalog. > > In particular i need the following: > > > > - named/stored queries > > these are precompiled queries, so they can be executed without parsing and > > are easily cacheable > > i.e. similar to what is implemented in CMFTopic, but stored in the Catalog > > and a bit smarter > > > > - caching support > > > > - unions and intersections > > sub-queries (i.e. queries that are directed at a certain index) should be > > more flexibly combineable > > I have some code that implements this in my CatalogQuery product. It creates > a query object from a string. Presently these are not persistent, but they > could easily be made to be to create precompiled queries. > > code at: http://www.zope.org/Members/Kaivo/CatalogQuery > > > > > I searched this mailing-list as well as zope.org to get an idea about what > > has already been discussed and requested, and there seems to be some > > interest in improving the Catalog. Some people even seem to have worked on > > this, perhaps they could give an update on this? Possibly i don't have to > > write everything from scratch... > > I would be willing to help both in coding and getting the code put into the > Zope core. > > > I would have put this into a proposal, but there already are two proposals > > that deal with the features i want, one is dedicated to > > unions/intersections, the other (TopicIndexes) to performance issues (i > > dont't know what's the status of these though, especially the first one is > > rather old), and i don't want to hijack them without asking. As so often i > > will need to complete my current project first, but would then like to help > > in improving the Catalog for a more general use. > > Possibly we need to rekindle discussion. I would suggest contacting the > authors of those proposals to see how compatible your concepts are wth > theirs. Perhaps a new proposal should be drafted with the new ideas and ty > them back to the previous ones. If there is redundancy, that can be worked > out. > > > > > So, if there is interest, i would propose to collect some ideas and > > comments about how a better Catalog should look like, how it could be best > > implemented and how to organize this effort (with respect to the already > > existing proposals). > > I am very interested in such a discussion. Let me know what I can do to help. > > /---\ > Casey Duncan, Sr. Web Developer > National Legal Aid and Defender Association > [EMAIL PROTECTED] > \---/ ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
Casey Duncan wrote: > > No unfortunately I think it got lost in the shuffle around the time of my > cross-country move. Any chance of sending it over again? I am revamping some > of my "old" products, perhaps this will give me an excuse to release a new > version of catquery. I'll look them up and send them again soon. > Yes, I second, third and forth that motion. I have a bunch of ideas kicking > around for ZODB-level indexing. Let's talk more. Perhaps we should arrange an > "indexing and catalog" chat on #zope. That sounds like a good idea. I'm writing an academic paper/presentation that I need to present on the 6/7 December. Some time after that would be best for me. If other good folk can collate the background information and make some sense of the different ideas, and put that on a wiki page, I can contribute to that as I have time, and then well have some sort of framework for a discussion on IRC. -- Steve Alexander Software Engineer Cat-Box limited ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
On Tue, 27 Nov 2001, Andreas Jung wrote: > Is this code available for public ? Sort of :) It used to be around, but the server with it on is currently offline and in need of a new disk controller, so it is not to hand. It is also poorly commented :( and written in very highly optimised (read: illegible) C. The main bits needed from it are the routines to store an retrieve compressed lists of ascending integers (ie. used in indexes). I want to write a python wrapper around them and release a list-like python data structure that will allow efficient storage of indexes. The other bit is the code for doing the cosine ranking similarity comparison in order to rank the documents in order of relevance to a query. Most of the code is taken from the book/code 'Managing Gigabytes' by Witten, Moffat & Bell (http://www.cs.mu.OZ.AU/mg/) The code is quite old now (1999) and designed for quite large systems, or reletively static text (ie. doesn't do incremental indexing very well). I worked on developing a 'forward' index which could be easily updated, and then inverted quite quickly on a regular basis (since it didn't need to parse the source text again). -Matt -- Matt Hamilton [EMAIL PROTECTED] Netsight Internet Solutions, Ltd. Business Vision on the Internet http://www.netsight.co.uk +44 (0)117 9090901 Web Hosting | Web Design | Domain Names | Co-location | DB Integration ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
Is this code available for public ? Andreas - Original Message - From: "Matt Hamilton" <[EMAIL PROTECTED]> To: "Casey Duncan" <[EMAIL PROTECTED]> Cc: "Steve Alexander" <[EMAIL PROTECTED]>; "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, November 27, 2001 10:06 Subject: Re: [Zope-dev] Catalog improvements > On Tue, 27 Nov 2001, Casey Duncan wrote: > > > > I'm interested in this too, and I'm keen to get a solution that will > > > work with just the ZODB, without needing all of Zope. > > > > Yes, I second, third and forth that motion. I have a bunch of ideas kicking > > around for ZODB-level indexing. Let's talk more. Perhaps we should arrange an > > "indexing and catalog" chat on #zope. > > I would like in on that too :) About a year or so ago I was working on a > full-text indexing system for indexing several gigabytes of text (mailing > list archives). Most of it was written in C and uses quite a lot of cool > algorithms from various information retrieval papers and books. I have > been hoping to have the time to take parts of it and work it into the new > PluginIndex architecture. The existing code uses BerkeleyDB files to hold > the index structures, but I would like to use ZODB instead to give it a > bit more modularity. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
On Tue, 27 Nov 2001, Casey Duncan wrote: > > I'm interested in this too, and I'm keen to get a solution that will > > work with just the ZODB, without needing all of Zope. > > Yes, I second, third and forth that motion. I have a bunch of ideas kicking > around for ZODB-level indexing. Let's talk more. Perhaps we should arrange an > "indexing and catalog" chat on #zope. I would like in on that too :) About a year or so ago I was working on a full-text indexing system for indexing several gigabytes of text (mailing list archives). Most of it was written in C and uses quite a lot of cool algorithms from various information retrieval papers and books. I have been hoping to have the time to take parts of it and work it into the new PluginIndex architecture. The existing code uses BerkeleyDB files to hold the index structures, but I would like to use ZODB instead to give it a bit more modularity. -Matt -- Matt Hamilton [EMAIL PROTECTED] Netsight Internet Solutions, Ltd. Business Vision on the Internet http://www.netsight.co.uk +44 (0)117 9090901 Web Hosting | Web Design | Domain Names | Co-location | DB Integration ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
On Tuesday 27 November 2001 09:49 am, Steve Alexander allegedly wrote: > Casey Duncan wrote: > > I have some code that implements this in my CatalogQuery product. It > > creates a query object from a string. Presently these are not persistent, > > but they could easily be made to be to create precompiled queries. > > > > code at: http://www.zope.org/Members/Kaivo/CatalogQuery > > Casey, did you get a chance to look at my patches for adding an extended > uniqueValues method to CatalogQuery? No unfortunately I think it got lost in the shuffle around the time of my cross-country move. Any chance of sending it over again? I am revamping some of my "old" products, perhaps this will give me an excuse to release a new version of catquery. > > > I would be willing to help both in coding and getting the code put into > > the Zope core. > > me too! > > >>So, if there is interest, i would propose to collect some ideas and > >>comments about how a better Catalog should look like, how it could be > >> best implemented and how to organize this effort (with respect to the > >> already existing proposals). > > > > I am very interested in such a discussion. Let me know what I can do to > > help. > > I'm interested in this too, and I'm keen to get a solution that will > work with just the ZODB, without needing all of Zope. Yes, I second, third and forth that motion. I have a bunch of ideas kicking around for ZODB-level indexing. Let's talk more. Perhaps we should arrange an "indexing and catalog" chat on #zope. /---\ Casey Duncan, Sr. Web Developer National Legal Aid and Defender Association [EMAIL PROTECTED] \---/ ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
Casey Duncan wrote: > > I have some code that implements this in my CatalogQuery product. It creates > a query object from a string. Presently these are not persistent, but they > could easily be made to be to create precompiled queries. > > code at: http://www.zope.org/Members/Kaivo/CatalogQuery Casey, did you get a chance to look at my patches for adding an extended uniqueValues method to CatalogQuery? > I would be willing to help both in coding and getting the code put into the > Zope core. me too! >>So, if there is interest, i would propose to collect some ideas and >>comments about how a better Catalog should look like, how it could be best >>implemented and how to organize this effort (with respect to the already >>existing proposals). > > I am very interested in such a discussion. Let me know what I can do to help. I'm interested in this too, and I'm keen to get a solution that will work with just the ZODB, without needing all of Zope. -- Steve Alexander Software Engineer Cat-Box limited ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote: > Hi, > > i'm currently working on a product that allows to attach relational > information to zope-objects. It works quite well so far, but to further > enhance it i need to make some changes to the Catalog. I could perhaps > implement it as a separate product, but i strongly feel that those changes > are best applied to the Catalog itself, as they are of general use (i > think) and involve a lot of changes to the inner workings of the Catalog. > In particular i need the following: > > - named/stored queries > these are precompiled queries, so they can be executed without parsing and > are easily cacheable > i.e. similar to what is implemented in CMFTopic, but stored in the Catalog > and a bit smarter > > - caching support > > - unions and intersections > sub-queries (i.e. queries that are directed at a certain index) should be > more flexibly combineable I have some code that implements this in my CatalogQuery product. It creates a query object from a string. Presently these are not persistent, but they could easily be made to be to create precompiled queries. code at: http://www.zope.org/Members/Kaivo/CatalogQuery > > I searched this mailing-list as well as zope.org to get an idea about what > has already been discussed and requested, and there seems to be some > interest in improving the Catalog. Some people even seem to have worked on > this, perhaps they could give an update on this? Possibly i don't have to > write everything from scratch... I would be willing to help both in coding and getting the code put into the Zope core. > I would have put this into a proposal, but there already are two proposals > that deal with the features i want, one is dedicated to > unions/intersections, the other (TopicIndexes) to performance issues (i > dont't know what's the status of these though, especially the first one is > rather old), and i don't want to hijack them without asking. As so often i > will need to complete my current project first, but would then like to help > in improving the Catalog for a more general use. Possibly we need to rekindle discussion. I would suggest contacting the authors of those proposals to see how compatible your concepts are wth theirs. Perhaps a new proposal should be drafted with the new ideas and ty them back to the previous ones. If there is redundancy, that can be worked out. > > So, if there is interest, i would propose to collect some ideas and > comments about how a better Catalog should look like, how it could be best > implemented and how to organize this effort (with respect to the already > existing proposals). I am very interested in such a discussion. Let me know what I can do to help. /---\ Casey Duncan, Sr. Web Developer National Legal Aid and Defender Association [EMAIL PROTECTED] \---/ ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
- Original Message - From: "Jeffrey P Shell" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, November 21, 2001 7:38 PM Subject: Re: [Zope-dev] Catalog improvements > > On Tuesday, November 20, 2001, at 03:35 PM, Wolfram Kerber wrote: > > > Hi, > > > > i'm currently working on a product that allows to attach relational > > information to zope-objects. It works quite well so far, but to further > > enhance it i need to make some changes to the Catalog. I could perhaps > > implement it as a separate product, but i strongly feel that those > > changes > > are best applied to the Catalog itself, as they are of general use > > (i think) > > and involve a lot of changes to the inner workings of the Catalog. In > > particular i need the following: > > > > - named/stored queries > > these are precompiled queries, so they can be executed without > > parsing and > > are easily cacheable > > i.e. similar to what is implemented in CMFTopic, but stored in the > > Catalog > > and a bit smarter > > There used to be something like this in ZTables/Tabula (a Zope 1.x > product that was sort of the genesis of the Catalog, for better or > worse) called 'Hierarchies'. Hierarchies were actually indexes (I > think the current Keyword index is descended from the Keyword > Hierarchy). > > I don't know what happened to that code. If it's not available, > you could probably achieve the effect that you're looking for here > with PluginIndexes I think your right. Indexes also have a management interface that could be used to define the query. It could result in a nesting problem however, if 'QueryIndexes' rely on each others results (that they should be able to). I would possibly need a management view that shows the hirarchical structure of the Indexes, but it can be merely that, a view. I'll try this out... >, which wouldn't require changing the Catalog at all. I'd say, if i would _not_ store the result of the query and just delegate to other indexes this would be true, otherwise i would need some notify mechanism to tell if my result is affected by an indexing call, and/or at least be notified when the call is over so i can update the result by issuing a query, but the latter would mean to 'take the big hit' as you mentioned, wich i think isn't acceptable. > Just write a "Query Index" that indexes objects that match > its pre-cooked Query. This would speed up searching tremendously, > but you could take a big hit at indexing time if you have many of > them. > > Jeffrey P Shell, [EMAIL PROTECTED] thanks, Wolfram ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Catalog improvements
On Tuesday, November 20, 2001, at 03:35 PM, Wolfram Kerber wrote: > Hi, > > i'm currently working on a product that allows to attach relational > information to zope-objects. It works quite well so far, but to further > enhance it i need to make some changes to the Catalog. I could perhaps > implement it as a separate product, but i strongly feel that those > changes > are best applied to the Catalog itself, as they are of general use > (i think) > and involve a lot of changes to the inner workings of the Catalog. In > particular i need the following: > > - named/stored queries > these are precompiled queries, so they can be executed without > parsing and > are easily cacheable > i.e. similar to what is implemented in CMFTopic, but stored in the > Catalog > and a bit smarter There used to be something like this in ZTables/Tabula (a Zope 1.x product that was sort of the genesis of the Catalog, for better or worse) called 'Hierarchies'. Hierarchies were actually indexes (I think the current Keyword index is descended from the Keyword Hierarchy). I don't know what happened to that code. If it's not available, you could probably achieve the effect that you're looking for here with PluginIndexes, which wouldn't require changing the Catalog at all. Just write a "Query Index" that indexes objects that match its pre-cooked Query. This would speed up searching tremendously, but you could take a big hit at indexing time if you have many of them. Jeffrey P Shell, [EMAIL PROTECTED] ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )