Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-12 Thread Chris Withers

Matt Hamilton wrote:
 
 On Mon, 11 Jun 2001, Chris Withers wrote:
 
  Wow Matt, you seem to know what you're talking about :-)
 
 My final year University project was to create an Open Source mailing list
 archive :)  I did quite a bit of reading into information retrieval and
 assorted algorithms and data structures.  

Ah, okay :-)

 Once I get a spare minute I am going to try and re-implement it in Python
 and using ZODB (with BerkeleyDB storage) I might try and port some of the
 code over to work as a PluggableIndex too.

Cool...

 One of the main tasks is to write a python wrapper around my compression
 code.  I will have to look more closely at how to write Python modules in
 C, as it does lots of bit twiddling which is in a very tight loop.  The
 object will basically be a compressed list to which you can append
 ascending integers and will allow various fast union/intersection
 operations with other similar objects.  This in itself may be sufficent to
 use in a PlugginIndex.

Yeah, I'd love to see it...

 Unfortunately I don't have the time.  Unless I can use it myself directly
 in a project we have funding for (or unless anyone wants to fund my time
 to develop it) I will have to wait until I have some more time on my
 hands.

No worries...

cheers,

Chris

  PS: Whereabouts in the UK are you?
 
 Bristol.

hehe... will be out celebrating my birthday there this Wednesday evening :-) If
you see me lying in a gutter on Thursday morning, please don't kick me too hard
;-)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-11 Thread Chris Withers

Matt Hamilton wrote:
 
 I would like to help if I had time :)  I think the most efficient way of
 doing what you want is to construct an index based on a 'Suffix Trie' this
 essentially allows matching of arbitrary substrings very quickly, the only
 problem is that it takes up a fair amount of space.  The upside is that it
 can be updated and incrementally added to quite easily (unlike many
 inverted list implementations).
 
 I confess I have not had the chance to look at the pluggable index types
 in 2.4, but would really like to as I would like to port over some
 indexing code I was working on for another project that allows compressed
 storage of inverted lists for indexes.  On average you can store a 32-bit
 document id/ref in around 4 bits, which means you save a lot of space and
 can keep stopwords in the lexicon (as an example try searching for 'to be
 or not to be' in an index that removes stopwords :).  Not only do you save
 space, but due to the way the inverted list is read and decompressed you
 save time on disk access for large indexes as there is less to physically
 read.

Wow Matt, you seem to know what you're talking about :-)

If you get a chance to implement the index I asked about, please gimme a shout,
I'd love to try it out...

cheers,

Chris

PS: Whereabouts in the UK are you?

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-11 Thread Andreas Jung

There is a new How-To for PlugginIndexes:

http://www.zope.org/Members/ajung/howto/PluginIndexes/index_html

Andreas
- Original Message -
From: Chris Withers [EMAIL PROTECTED]
To: Matt Hamilton [EMAIL PROTECTED]
Cc: Andreas Jung [EMAIL PROTECTED]; zope-dev
[EMAIL PROTECTED]
Sent: Monday, June 11, 2001 9:10 AM
Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex)


 Matt Hamilton wrote:
 
  I would like to help if I had time :)  I think the most efficient way of
  doing what you want is to construct an index based on a 'Suffix Trie'
this
  essentially allows matching of arbitrary substrings very quickly, the
only
  problem is that it takes up a fair amount of space.  The upside is that
it
  can be updated and incrementally added to quite easily (unlike many
  inverted list implementations).
 
  I confess I have not had the chance to look at the pluggable index types
  in 2.4, but would really like to as I would like to port over some
  indexing code I was working on for another project that allows
compressed
  storage of inverted lists for indexes.  On average you can store a
32-bit
  document id/ref in around 4 bits, which means you save a lot of space
and
  can keep stopwords in the lexicon (as an example try searching for 'to
be
  or not to be' in an index that removes stopwords :).  Not only do you
save
  space, but due to the way the inverted list is read and decompressed you
  save time on disk access for large indexes as there is less to
physically
  read.

 Wow Matt, you seem to know what you're talking about :-)

 If you get a chance to implement the index I asked about, please gimme a
shout,
 I'd love to try it out...

 cheers,

 Chris

 PS: Whereabouts in the UK are you?



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-11 Thread Matt Hamilton

On Mon, 11 Jun 2001, Chris Withers wrote:

 Wow Matt, you seem to know what you're talking about :-)

My final year University project was to create an Open Source mailing list
archive :)  I did quite a bit of reading into information retrieval and
assorted algorithms and data structures.  I had a prototype running for
quite some time, but is currently down as I am wiping the machine to start
again in python :)  The original system was a mix of C/Perl/Python and
returned results in XML which then were formatted via XSLT.

Once I get a spare minute I am going to try and re-implement it in Python
and using ZODB (with BerkeleyDB storage) I might try and port some of the
code over to work as a PluggableIndex too.

One of the main tasks is to write a python wrapper around my compression
code.  I will have to look more closely at how to write Python modules in
C, as it does lots of bit twiddling which is in a very tight loop.  The
object will basically be a compressed list to which you can append
ascending integers and will allow various fast union/intersection
operations with other similar objects.  This in itself may be sufficent to
use in a PlugginIndex.

 If you get a chance to implement the index I asked about, please gimme a shout,
 I'd love to try it out...

Unfortunately I don't have the time.  Unless I can use it myself directly
in a project we have funding for (or unless anyone wants to fund my time
to develop it) I will have to wait until I have some more time on my
hands.

 PS: Whereabouts in the UK are you?

Bristol.

-Matt

-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-07 Thread Andreas Jung


- Original Message -
From: ender [EMAIL PROTECTED]
To: Andreas Jung [EMAIL PROTECTED]
Cc: zope-dev [EMAIL PROTECTED]
Sent: Wednesday, June 06, 2001 5:30 PM
Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex)


 On Monday 04 June 2001 16:55, Andreas Jung wrote:
 Looks like you should write your own index type. Zope 2.4
 comes with an PlugableIndex interface to allow third-party
 indexes to be integrated into the Catalog.

 this brings up an interesting question of what is the best way to register
a
 new plugindex thats distributed with a product. Glancing over the cvs logs
it
 looks as though plugin indexes are arranged to be the first product
installed
 in Application.py. Given that what is the suggested method for registering
a
 new plugin index?

I think this should be subject of a small How-To. Anyway...to register
a plugin index you have to call context.registerClass(...). Take
a look at PluginIndexes/__init__.py how Zopes indexes are
registered. Other indexes should do it in the same way.

The reason why PluginIndexes are installed as first product is that there
are some dependencies between PluginIndexes and other Zope Products.
Products are usually inialized in alphabetical order. But in this case
we made an exception.

Andreas



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-06 Thread ender

On Monday 04 June 2001 16:55, Andreas Jung wrote:
Looks like you should write your own index type. Zope 2.4
comes with an PlugableIndex interface to allow third-party
indexes to be integrated into the Catalog.

this brings up an interesting question of what is the best way to register a 
new plugindex thats distributed with a product. Glancing over the cvs logs it 
looks as though plugin indexes are arranged to be the first product installed 
in Application.py. Given that what is the suggested method for registering a 
new plugin index?

Kapil


Andreas
- Original Message -
From: Chris Withers [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, June 04, 2001 4:05 PM
Subject: [Zope-dev] Request for a Pluggin Index (NameIndex)

 Hi,

 If anyone's got the time or fancies a challenge, could they write an
 index that behaves as follows:

 Indexed values:
 1) C.J.Withers
 2) Chris Withers
 3) C Petrilli
 4) Christopher McDonough

 search result
 C  1,2,3,4
 C.J.Withers1
 c.j.Withers1
 withers mcdonough  1,2,4
 Chris  2,4
 Christo4

 I think the basic rules are:
 - split on whitespace and punctuation (not accentuated characters and the
 like ;-)
 - index each remaining name part
 - when searching, return all records where any of the name parts match
 something like:
 string.find(name_part,search_expression)

 ...oh yeah, and do it blindingly quickly ;-)

 This would be really useful for the Creator dublin core field and
 anywhere you're searching for someone's name. The CMF could benefit from
 it and

would

 eliminate the phrase next to the Creator field which has haunted me from
 Squishdot:

  Note that you must enter their username exactly. 

 cheers,

 Chris



 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-05 Thread Chris Withers

 Looks like you should write your own index type. Zope 2.4
 comes with an PlugableIndex interface to allow third-party
 indexes to be integrated into the Catalog.

Yeah, I know all that, and I'm very much looking forward to playing with
this. :-)
However, the email was an invitation for anyone who's interested and
currently has time on their hands (yeah, I know, there's lots of us like
that ;-) to have a go at writing the index type for me...

cheers,

Chris


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-05 Thread Matt Hamilton

On Tue, 5 Jun 2001, Chris Withers wrote:

  Looks like you should write your own index type. Zope 2.4
  comes with an PlugableIndex interface to allow third-party
  indexes to be integrated into the Catalog.
 
 Yeah, I know all that, and I'm very much looking forward to playing with
 this. :-)
 However, the email was an invitation for anyone who's interested and
 currently has time on their hands (yeah, I know, there's lots of us like
 that ;-) to have a go at writing the index type for me...

I would like to help if I had time :)  I think the most efficient way of
doing what you want is to construct an index based on a 'Suffix Trie' this
essentially allows matching of arbitrary substrings very quickly, the only
problem is that it takes up a fair amount of space.  The upside is that it
can be updated and incrementally added to quite easily (unlike many
inverted list implementations).

I confess I have not had the chance to look at the pluggable index types
in 2.4, but would really like to as I would like to port over some
indexing code I was working on for another project that allows compressed
storage of inverted lists for indexes.  On average you can store a 32-bit
document id/ref in around 4 bits, which means you save a lot of space and
can keep stopwords in the lexicon (as an example try searching for 'to be
or not to be' in an index that removes stopwords :).  Not only do you save
space, but due to the way the inverted list is read and decompressed you
save time on disk access for large indexes as there is less to physically
read.

-Matt

-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-05 Thread Andreas Jung


- Original Message - 
From: Chris Withers [EMAIL PROTECTED]
To: Andreas Jung [EMAIL PROTECTED]
Cc: zope-dev [EMAIL PROTECTED]
Sent: Tuesday, June 05, 2001 11:30 AM
Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex)


  Looks like you should write your own index type. Zope 2.4
  comes with an PlugableIndex interface to allow third-party
  indexes to be integrated into the Catalog.
 
 Yeah, I know all that, and I'm very much looking forward to playing with
 this. :-)
 However, the email was an invitation for anyone who's interested and
 currently has time on their hands (yeah, I know, there's lots of us like
 that ;-) to have a go at writing the index type for me...
  

 I think it should not be a large problem to write such an index because 
it looks like you can subclass the TextIndex class and replace/extend
the needed functionality.

Andreas


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-04 Thread Andreas Jung

Looks like you should write your own index type. Zope 2.4
comes with an PlugableIndex interface to allow third-party
indexes to be integrated into the Catalog.

Andreas
- Original Message -
From: Chris Withers [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, June 04, 2001 4:05 PM
Subject: [Zope-dev] Request for a Pluggin Index (NameIndex)


 Hi,

 If anyone's got the time or fancies a challenge, could they write an index
 that behaves as follows:

 Indexed values:
 1) C.J.Withers
 2) Chris Withers
 3) C Petrilli
 4) Christopher McDonough

 search result
 C  1,2,3,4
 C.J.Withers1
 c.j.Withers1
 withers mcdonough  1,2,4
 Chris  2,4
 Christo4

 I think the basic rules are:
 - split on whitespace and punctuation (not accentuated characters and the
 like ;-)
 - index each remaining name part
 - when searching, return all records where any of the name parts match
 something like:
 string.find(name_part,search_expression)

 ...oh yeah, and do it blindingly quickly ;-)

 This would be really useful for the Creator dublin core field and anywhere
 you're searching for someone's name. The CMF could benefit from it and
would
 eliminate the phrase next to the Creator field which has haunted me from
 Squishdot:

  Note that you must enter their username exactly. 

 cheers,

 Chris



 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )