Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
Matt Hamilton wrote: > > On Mon, 11 Jun 2001, Chris Withers wrote: > > > Wow Matt, you seem to know what you're talking about :-) > > My final year University project was to create an Open Source mailing list > archive :) I did quite a bit of reading into information retrieval and > assorted algorithms and data structures. Ah, okay :-) > Once I get a spare minute I am going to try and re-implement it in Python > and using ZODB (with BerkeleyDB storage) I might try and port some of the > code over to work as a PluggableIndex too. Cool... > One of the main tasks is to write a python wrapper around my compression > code. I will have to look more closely at how to write Python modules in > C, as it does lots of bit twiddling which is in a very tight loop. The > object will basically be a compressed list to which you can append > ascending integers and will allow various fast union/intersection > operations with other similar objects. This in itself may be sufficent to > use in a PlugginIndex. Yeah, I'd love to see it... > Unfortunately I don't have the time. Unless I can use it myself directly > in a project we have funding for (or unless anyone wants to fund my time > to develop it) I will have to wait until I have some more time on my > hands. No worries... cheers, Chris > > PS: Whereabouts in the UK are you? > > Bristol. hehe... will be out celebrating my birthday there this Wednesday evening :-) If you see me lying in a gutter on Thursday morning, please don't kick me too hard ;-) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
On Mon, 11 Jun 2001, Chris Withers wrote: > Wow Matt, you seem to know what you're talking about :-) My final year University project was to create an Open Source mailing list archive :) I did quite a bit of reading into information retrieval and assorted algorithms and data structures. I had a prototype running for quite some time, but is currently down as I am wiping the machine to start again in python :) The original system was a mix of C/Perl/Python and returned results in XML which then were formatted via XSLT. Once I get a spare minute I am going to try and re-implement it in Python and using ZODB (with BerkeleyDB storage) I might try and port some of the code over to work as a PluggableIndex too. One of the main tasks is to write a python wrapper around my compression code. I will have to look more closely at how to write Python modules in C, as it does lots of bit twiddling which is in a very tight loop. The object will basically be a compressed list to which you can append ascending integers and will allow various fast union/intersection operations with other similar objects. This in itself may be sufficent to use in a PlugginIndex. > If you get a chance to implement the index I asked about, please gimme a shout, > I'd love to try it out... Unfortunately I don't have the time. Unless I can use it myself directly in a project we have funding for (or unless anyone wants to fund my time to develop it) I will have to wait until I have some more time on my hands. > PS: Whereabouts in the UK are you? Bristol. -Matt -- Matt Hamilton [EMAIL PROTECTED] Netsight Internet Solutions, Ltd. Business Vision on the Internet http://www.netsight.co.uk +44 (0)117 9090901 Web Hosting | Web Design | Domain Names | Co-location | DB Integration ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
There is a new How-To for PlugginIndexes: http://www.zope.org/Members/ajung/howto/PluginIndexes/index_html Andreas - Original Message - From: "Chris Withers" <[EMAIL PROTECTED]> To: "Matt Hamilton" <[EMAIL PROTECTED]> Cc: "Andreas Jung" <[EMAIL PROTECTED]>; "zope-dev" <[EMAIL PROTECTED]> Sent: Monday, June 11, 2001 9:10 AM Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex) > Matt Hamilton wrote: > > > > I would like to help if I had time :) I think the most efficient way of > > doing what you want is to construct an index based on a 'Suffix Trie' this > > essentially allows matching of arbitrary substrings very quickly, the only > > problem is that it takes up a fair amount of space. The upside is that it > > can be updated and incrementally added to quite easily (unlike many > > inverted list implementations). > > > > I confess I have not had the chance to look at the pluggable index types > > in 2.4, but would really like to as I would like to port over some > > indexing code I was working on for another project that allows compressed > > storage of inverted lists for indexes. On average you can store a 32-bit > > document id/ref in around 4 bits, which means you save a lot of space and > > can keep stopwords in the lexicon (as an example try searching for 'to be > > or not to be' in an index that removes stopwords :). Not only do you save > > space, but due to the way the inverted list is read and decompressed you > > save time on disk access for large indexes as there is less to physically > > read. > > Wow Matt, you seem to know what you're talking about :-) > > If you get a chance to implement the index I asked about, please gimme a shout, > I'd love to try it out... > > cheers, > > Chris > > PS: Whereabouts in the UK are you? > ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
Matt Hamilton wrote: > > I would like to help if I had time :) I think the most efficient way of > doing what you want is to construct an index based on a 'Suffix Trie' this > essentially allows matching of arbitrary substrings very quickly, the only > problem is that it takes up a fair amount of space. The upside is that it > can be updated and incrementally added to quite easily (unlike many > inverted list implementations). > > I confess I have not had the chance to look at the pluggable index types > in 2.4, but would really like to as I would like to port over some > indexing code I was working on for another project that allows compressed > storage of inverted lists for indexes. On average you can store a 32-bit > document id/ref in around 4 bits, which means you save a lot of space and > can keep stopwords in the lexicon (as an example try searching for 'to be > or not to be' in an index that removes stopwords :). Not only do you save > space, but due to the way the inverted list is read and decompressed you > save time on disk access for large indexes as there is less to physically > read. Wow Matt, you seem to know what you're talking about :-) If you get a chance to implement the index I asked about, please gimme a shout, I'd love to try it out... cheers, Chris PS: Whereabouts in the UK are you? ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
- Original Message - From: "ender" <[EMAIL PROTECTED]> To: "Andreas Jung" <[EMAIL PROTECTED]> Cc: "zope-dev" <[EMAIL PROTECTED]> Sent: Wednesday, June 06, 2001 5:30 PM Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex) > On Monday 04 June 2001 16:55, Andreas Jung wrote: > >>Looks like you should write your own index type. Zope 2.4 > >>comes with an PlugableIndex interface to allow third-party > >>indexes to be integrated into the Catalog. > > this brings up an interesting question of what is the best way to register a > new plugindex thats distributed with a product. Glancing over the cvs logs it > looks as though plugin indexes are arranged to be the first product installed > in Application.py. Given that what is the suggested method for registering a > new plugin index? I think this should be subject of a small How-To. Anyway...to register a plugin index you have to call "context.registerClass(...)". Take a look at PluginIndexes/__init__.py how Zopes indexes are registered. Other indexes should do it in the same way. The reason why PluginIndexes are installed as first product is that there are some dependencies between PluginIndexes and other Zope Products. Products are usually inialized in alphabetical order. But in this case we made an exception. Andreas ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
On Monday 04 June 2001 16:55, Andreas Jung wrote: >>Looks like you should write your own index type. Zope 2.4 >>comes with an PlugableIndex interface to allow third-party >>indexes to be integrated into the Catalog. this brings up an interesting question of what is the best way to register a new plugindex thats distributed with a product. Glancing over the cvs logs it looks as though plugin indexes are arranged to be the first product installed in Application.py. Given that what is the suggested method for registering a new plugin index? Kapil >>Andreas >>- Original Message - >>From: "Chris Withers" <[EMAIL PROTECTED]> >>To: <[EMAIL PROTECTED]> >>Sent: Monday, June 04, 2001 4:05 PM >>Subject: [Zope-dev] Request for a Pluggin Index (NameIndex) >> >>> Hi, >>> >>> If anyone's got the time or fancies a challenge, could they write an >>> index that behaves as follows: >>> >>> Indexed values: >>> 1) C.J.Withers >>> 2) Chris Withers >>> 3) C Petrilli >>> 4) Christopher McDonough >>> >>> search result >>> C 1,2,3,4 >>> C.J.Withers1 >>> c.j.Withers1 >>> withers mcdonough 1,2,4 >>> Chris 2,4 >>> Christo4 >>> >>> I think the basic rules are: >>> - split on whitespace and punctuation (not accentuated characters and the >>> like ;-) >>> - index each remaining name part >>> - when searching, return all records where any of the name parts match >>> something like: >>> string.find(name_part,search_expression) >>> >>> ...oh yeah, and do it blindingly quickly ;-) >>> >>> This would be really useful for the Creator dublin core field and >>> anywhere you're searching for someone's name. The CMF could benefit from >>> it and >> >>would >> >>> eliminate the phrase next to the Creator field which has haunted me from >>> Squishdot: >>> >>> " Note that you must enter their username exactly. " >>> >>> cheers, >>> >>> Chris >>> >>> >>> >>> ___ >>> Zope-Dev maillist - [EMAIL PROTECTED] >>> http://lists.zope.org/mailman/listinfo/zope-dev >>> ** No cross posts or HTML encoding! ** >>> (Related lists - >>> http://lists.zope.org/mailman/listinfo/zope-announce >>> http://lists.zope.org/mailman/listinfo/zope ) >> >>___ >>Zope-Dev maillist - [EMAIL PROTECTED] >>http://lists.zope.org/mailman/listinfo/zope-dev >>** No cross posts or HTML encoding! ** >>(Related lists - >> http://lists.zope.org/mailman/listinfo/zope-announce >> http://lists.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
- Original Message - From: "Chris Withers" <[EMAIL PROTECTED]> To: "Andreas Jung" <[EMAIL PROTECTED]> Cc: "zope-dev" <[EMAIL PROTECTED]> Sent: Tuesday, June 05, 2001 11:30 AM Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex) > > Looks like you should write your own index type. Zope 2.4 > > comes with an PlugableIndex interface to allow third-party > > indexes to be integrated into the Catalog. > > Yeah, I know all that, and I'm very much looking forward to playing with > this. :-) > However, the email was an invitation for anyone who's interested and > currently has time on their hands (yeah, I know, there's lots of us like > that ;-) to have a go at writing the index type for me... > I think it should not be a large problem to write such an index because it looks like you can subclass the TextIndex class and replace/extend the needed functionality. Andreas ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
On Tue, 5 Jun 2001, Chris Withers wrote: > > Looks like you should write your own index type. Zope 2.4 > > comes with an PlugableIndex interface to allow third-party > > indexes to be integrated into the Catalog. > > Yeah, I know all that, and I'm very much looking forward to playing with > this. :-) > However, the email was an invitation for anyone who's interested and > currently has time on their hands (yeah, I know, there's lots of us like > that ;-) to have a go at writing the index type for me... I would like to help if I had time :) I think the most efficient way of doing what you want is to construct an index based on a 'Suffix Trie' this essentially allows matching of arbitrary substrings very quickly, the only problem is that it takes up a fair amount of space. The upside is that it can be updated and incrementally added to quite easily (unlike many inverted list implementations). I confess I have not had the chance to look at the pluggable index types in 2.4, but would really like to as I would like to port over some indexing code I was working on for another project that allows compressed storage of inverted lists for indexes. On average you can store a 32-bit document id/ref in around 4 bits, which means you save a lot of space and can keep stopwords in the lexicon (as an example try searching for 'to be or not to be' in an index that removes stopwords :). Not only do you save space, but due to the way the inverted list is read and decompressed you save time on disk access for large indexes as there is less to physically read. -Matt -- Matt Hamilton [EMAIL PROTECTED] Netsight Internet Solutions, Ltd. Business Vision on the Internet http://www.netsight.co.uk +44 (0)117 9090901 Web Hosting | Web Design | Domain Names | Co-location | DB Integration ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
> Looks like you should write your own index type. Zope 2.4 > comes with an PlugableIndex interface to allow third-party > indexes to be integrated into the Catalog. Yeah, I know all that, and I'm very much looking forward to playing with this. :-) However, the email was an invitation for anyone who's interested and currently has time on their hands (yeah, I know, there's lots of us like that ;-) to have a go at writing the index type for me... cheers, Chris ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Request for a Pluggin Index (NameIndex)
Looks like you should write your own index type. Zope 2.4 comes with an PlugableIndex interface to allow third-party indexes to be integrated into the Catalog. Andreas - Original Message - From: "Chris Withers" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, June 04, 2001 4:05 PM Subject: [Zope-dev] Request for a Pluggin Index (NameIndex) > Hi, > > If anyone's got the time or fancies a challenge, could they write an index > that behaves as follows: > > Indexed values: > 1) C.J.Withers > 2) Chris Withers > 3) C Petrilli > 4) Christopher McDonough > > search result > C 1,2,3,4 > C.J.Withers1 > c.j.Withers1 > withers mcdonough 1,2,4 > Chris 2,4 > Christo4 > > I think the basic rules are: > - split on whitespace and punctuation (not accentuated characters and the > like ;-) > - index each remaining name part > - when searching, return all records where any of the name parts match > something like: > string.find(name_part,search_expression) > > ...oh yeah, and do it blindingly quickly ;-) > > This would be really useful for the Creator dublin core field and anywhere > you're searching for someone's name. The CMF could benefit from it and would > eliminate the phrase next to the Creator field which has haunted me from > Squishdot: > > " Note that you must enter their username exactly. " > > cheers, > > Chris > > > > ___ > Zope-Dev maillist - [EMAIL PROTECTED] > http://lists.zope.org/mailman/listinfo/zope-dev > ** No cross posts or HTML encoding! ** > (Related lists - > http://lists.zope.org/mailman/listinfo/zope-announce > http://lists.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] Request for a Pluggin Index (NameIndex)
Hi, If anyone's got the time or fancies a challenge, could they write an index that behaves as follows: Indexed values: 1) C.J.Withers 2) Chris Withers 3) C Petrilli 4) Christopher McDonough search result C 1,2,3,4 C.J.Withers1 c.j.Withers1 withers mcdonough 1,2,4 Chris 2,4 Christo4 I think the basic rules are: - split on whitespace and punctuation (not accentuated characters and the like ;-) - index each remaining name part - when searching, return all records where any of the name parts match something like: string.find(name_part,search_expression) ...oh yeah, and do it blindingly quickly ;-) This would be really useful for the Creator dublin core field and anywhere you're searching for someone's name. The CMF could benefit from it and would eliminate the phrase next to the Creator field which has haunted me from Squishdot: " Note that you must enter their username exactly. " cheers, Chris ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )