Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-12 Thread Chris Withers

Matt Hamilton wrote:
> 
> On Mon, 11 Jun 2001, Chris Withers wrote:
> 
> > Wow Matt, you seem to know what you're talking about :-)
> 
> My final year University project was to create an Open Source mailing list
> archive :)  I did quite a bit of reading into information retrieval and
> assorted algorithms and data structures.  

Ah, okay :-)

> Once I get a spare minute I am going to try and re-implement it in Python
> and using ZODB (with BerkeleyDB storage) I might try and port some of the
> code over to work as a PluggableIndex too.

Cool...

> One of the main tasks is to write a python wrapper around my compression
> code.  I will have to look more closely at how to write Python modules in
> C, as it does lots of bit twiddling which is in a very tight loop.  The
> object will basically be a compressed list to which you can append
> ascending integers and will allow various fast union/intersection
> operations with other similar objects.  This in itself may be sufficent to
> use in a PlugginIndex.

Yeah, I'd love to see it...

> Unfortunately I don't have the time.  Unless I can use it myself directly
> in a project we have funding for (or unless anyone wants to fund my time
> to develop it) I will have to wait until I have some more time on my
> hands.

No worries...

cheers,

Chris

> > PS: Whereabouts in the UK are you?
> 
> Bristol.

hehe... will be out celebrating my birthday there this Wednesday evening :-) If
you see me lying in a gutter on Thursday morning, please don't kick me too hard
;-)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-11 Thread Matt Hamilton

On Mon, 11 Jun 2001, Chris Withers wrote:

> Wow Matt, you seem to know what you're talking about :-)

My final year University project was to create an Open Source mailing list
archive :)  I did quite a bit of reading into information retrieval and
assorted algorithms and data structures.  I had a prototype running for
quite some time, but is currently down as I am wiping the machine to start
again in python :)  The original system was a mix of C/Perl/Python and
returned results in XML which then were formatted via XSLT.

Once I get a spare minute I am going to try and re-implement it in Python
and using ZODB (with BerkeleyDB storage) I might try and port some of the
code over to work as a PluggableIndex too.

One of the main tasks is to write a python wrapper around my compression
code.  I will have to look more closely at how to write Python modules in
C, as it does lots of bit twiddling which is in a very tight loop.  The
object will basically be a compressed list to which you can append
ascending integers and will allow various fast union/intersection
operations with other similar objects.  This in itself may be sufficent to
use in a PlugginIndex.

> If you get a chance to implement the index I asked about, please gimme a shout,
> I'd love to try it out...

Unfortunately I don't have the time.  Unless I can use it myself directly
in a project we have funding for (or unless anyone wants to fund my time
to develop it) I will have to wait until I have some more time on my
hands.

> PS: Whereabouts in the UK are you?

Bristol.

-Matt

-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-11 Thread Andreas Jung

There is a new How-To for PlugginIndexes:

http://www.zope.org/Members/ajung/howto/PluginIndexes/index_html

Andreas
- Original Message -
From: "Chris Withers" <[EMAIL PROTECTED]>
To: "Matt Hamilton" <[EMAIL PROTECTED]>
Cc: "Andreas Jung" <[EMAIL PROTECTED]>; "zope-dev"
<[EMAIL PROTECTED]>
Sent: Monday, June 11, 2001 9:10 AM
Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex)


> Matt Hamilton wrote:
> >
> > I would like to help if I had time :)  I think the most efficient way of
> > doing what you want is to construct an index based on a 'Suffix Trie'
this
> > essentially allows matching of arbitrary substrings very quickly, the
only
> > problem is that it takes up a fair amount of space.  The upside is that
it
> > can be updated and incrementally added to quite easily (unlike many
> > inverted list implementations).
> >
> > I confess I have not had the chance to look at the pluggable index types
> > in 2.4, but would really like to as I would like to port over some
> > indexing code I was working on for another project that allows
compressed
> > storage of inverted lists for indexes.  On average you can store a
32-bit
> > document id/ref in around 4 bits, which means you save a lot of space
and
> > can keep stopwords in the lexicon (as an example try searching for 'to
be
> > or not to be' in an index that removes stopwords :).  Not only do you
save
> > space, but due to the way the inverted list is read and decompressed you
> > save time on disk access for large indexes as there is less to
physically
> > read.
>
> Wow Matt, you seem to know what you're talking about :-)
>
> If you get a chance to implement the index I asked about, please gimme a
shout,
> I'd love to try it out...
>
> cheers,
>
> Chris
>
> PS: Whereabouts in the UK are you?
>


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-11 Thread Chris Withers

Matt Hamilton wrote:
> 
> I would like to help if I had time :)  I think the most efficient way of
> doing what you want is to construct an index based on a 'Suffix Trie' this
> essentially allows matching of arbitrary substrings very quickly, the only
> problem is that it takes up a fair amount of space.  The upside is that it
> can be updated and incrementally added to quite easily (unlike many
> inverted list implementations).
> 
> I confess I have not had the chance to look at the pluggable index types
> in 2.4, but would really like to as I would like to port over some
> indexing code I was working on for another project that allows compressed
> storage of inverted lists for indexes.  On average you can store a 32-bit
> document id/ref in around 4 bits, which means you save a lot of space and
> can keep stopwords in the lexicon (as an example try searching for 'to be
> or not to be' in an index that removes stopwords :).  Not only do you save
> space, but due to the way the inverted list is read and decompressed you
> save time on disk access for large indexes as there is less to physically
> read.

Wow Matt, you seem to know what you're talking about :-)

If you get a chance to implement the index I asked about, please gimme a shout,
I'd love to try it out...

cheers,

Chris

PS: Whereabouts in the UK are you?

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-07 Thread Andreas Jung


- Original Message -
From: "ender" <[EMAIL PROTECTED]>
To: "Andreas Jung" <[EMAIL PROTECTED]>
Cc: "zope-dev" <[EMAIL PROTECTED]>
Sent: Wednesday, June 06, 2001 5:30 PM
Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex)


> On Monday 04 June 2001 16:55, Andreas Jung wrote:
> >>Looks like you should write your own index type. Zope 2.4
> >>comes with an PlugableIndex interface to allow third-party
> >>indexes to be integrated into the Catalog.
>
> this brings up an interesting question of what is the best way to register
a
> new plugindex thats distributed with a product. Glancing over the cvs logs
it
> looks as though plugin indexes are arranged to be the first product
installed
> in Application.py. Given that what is the suggested method for registering
a
> new plugin index?

I think this should be subject of a small How-To. Anyway...to register
a plugin index you have to call "context.registerClass(...)". Take
a look at PluginIndexes/__init__.py how Zopes indexes are
registered. Other indexes should do it in the same way.

The reason why PluginIndexes are installed as first product is that there
are some dependencies between PluginIndexes and other Zope Products.
Products are usually inialized in alphabetical order. But in this case
we made an exception.

Andreas



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-06 Thread ender

On Monday 04 June 2001 16:55, Andreas Jung wrote:
>>Looks like you should write your own index type. Zope 2.4
>>comes with an PlugableIndex interface to allow third-party
>>indexes to be integrated into the Catalog.

this brings up an interesting question of what is the best way to register a 
new plugindex thats distributed with a product. Glancing over the cvs logs it 
looks as though plugin indexes are arranged to be the first product installed 
in Application.py. Given that what is the suggested method for registering a 
new plugin index?

Kapil


>>Andreas
>>- Original Message -
>>From: "Chris Withers" <[EMAIL PROTECTED]>
>>To: <[EMAIL PROTECTED]>
>>Sent: Monday, June 04, 2001 4:05 PM
>>Subject: [Zope-dev] Request for a Pluggin Index (NameIndex)
>>
>>> Hi,
>>>
>>> If anyone's got the time or fancies a challenge, could they write an
>>> index that behaves as follows:
>>>
>>> Indexed values:
>>> 1) C.J.Withers
>>> 2) Chris Withers
>>> 3) C Petrilli
>>> 4) Christopher McDonough
>>>
>>> search result
>>> C  1,2,3,4
>>> C.J.Withers1
>>> c.j.Withers1
>>> withers mcdonough  1,2,4
>>> Chris  2,4
>>> Christo4
>>>
>>> I think the basic rules are:
>>> - split on whitespace and punctuation (not accentuated characters and the
>>> like ;-)
>>> - index each remaining name part
>>> - when searching, return all records where any of the name parts match
>>> something like:
>>> string.find(name_part,search_expression)
>>>
>>> ...oh yeah, and do it blindingly quickly ;-)
>>>
>>> This would be really useful for the Creator dublin core field and
>>> anywhere you're searching for someone's name. The CMF could benefit from
>>> it and
>>
>>would
>>
>>> eliminate the phrase next to the Creator field which has haunted me from
>>> Squishdot:
>>>
>>> " Note that you must enter their username exactly. "
>>>
>>> cheers,
>>>
>>> Chris
>>>
>>>
>>>
>>> ___
>>> Zope-Dev maillist  -  [EMAIL PROTECTED]
>>> http://lists.zope.org/mailman/listinfo/zope-dev
>>> **  No cross posts or HTML encoding!  **
>>> (Related lists -
>>>  http://lists.zope.org/mailman/listinfo/zope-announce
>>>  http://lists.zope.org/mailman/listinfo/zope )
>>
>>___
>>Zope-Dev maillist  -  [EMAIL PROTECTED]
>>http://lists.zope.org/mailman/listinfo/zope-dev
>>**  No cross posts or HTML encoding!  **
>>(Related lists -
>> http://lists.zope.org/mailman/listinfo/zope-announce
>> http://lists.zope.org/mailman/listinfo/zope )

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-05 Thread Andreas Jung


- Original Message - 
From: "Chris Withers" <[EMAIL PROTECTED]>
To: "Andreas Jung" <[EMAIL PROTECTED]>
Cc: "zope-dev" <[EMAIL PROTECTED]>
Sent: Tuesday, June 05, 2001 11:30 AM
Subject: Re: [Zope-dev] Request for a Pluggin Index (NameIndex)


> > Looks like you should write your own index type. Zope 2.4
> > comes with an PlugableIndex interface to allow third-party
> > indexes to be integrated into the Catalog.
> 
> Yeah, I know all that, and I'm very much looking forward to playing with
> this. :-)
> However, the email was an invitation for anyone who's interested and
> currently has time on their hands (yeah, I know, there's lots of us like
> that ;-) to have a go at writing the index type for me...
>  

 I think it should not be a large problem to write such an index because 
it looks like you can subclass the TextIndex class and replace/extend
the needed functionality.

Andreas


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-05 Thread Matt Hamilton

On Tue, 5 Jun 2001, Chris Withers wrote:

> > Looks like you should write your own index type. Zope 2.4
> > comes with an PlugableIndex interface to allow third-party
> > indexes to be integrated into the Catalog.
> 
> Yeah, I know all that, and I'm very much looking forward to playing with
> this. :-)
> However, the email was an invitation for anyone who's interested and
> currently has time on their hands (yeah, I know, there's lots of us like
> that ;-) to have a go at writing the index type for me...

I would like to help if I had time :)  I think the most efficient way of
doing what you want is to construct an index based on a 'Suffix Trie' this
essentially allows matching of arbitrary substrings very quickly, the only
problem is that it takes up a fair amount of space.  The upside is that it
can be updated and incrementally added to quite easily (unlike many
inverted list implementations).

I confess I have not had the chance to look at the pluggable index types
in 2.4, but would really like to as I would like to port over some
indexing code I was working on for another project that allows compressed
storage of inverted lists for indexes.  On average you can store a 32-bit
document id/ref in around 4 bits, which means you save a lot of space and
can keep stopwords in the lexicon (as an example try searching for 'to be
or not to be' in an index that removes stopwords :).  Not only do you save
space, but due to the way the inverted list is read and decompressed you
save time on disk access for large indexes as there is less to physically
read.

-Matt

-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-05 Thread Chris Withers

> Looks like you should write your own index type. Zope 2.4
> comes with an PlugableIndex interface to allow third-party
> indexes to be integrated into the Catalog.

Yeah, I know all that, and I'm very much looking forward to playing with
this. :-)
However, the email was an invitation for anyone who's interested and
currently has time on their hands (yeah, I know, there's lots of us like
that ;-) to have a go at writing the index type for me...

cheers,

Chris


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-04 Thread Andreas Jung

Looks like you should write your own index type. Zope 2.4
comes with an PlugableIndex interface to allow third-party
indexes to be integrated into the Catalog.

Andreas
- Original Message -
From: "Chris Withers" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, June 04, 2001 4:05 PM
Subject: [Zope-dev] Request for a Pluggin Index (NameIndex)


> Hi,
>
> If anyone's got the time or fancies a challenge, could they write an index
> that behaves as follows:
>
> Indexed values:
> 1) C.J.Withers
> 2) Chris Withers
> 3) C Petrilli
> 4) Christopher McDonough
>
> search result
> C  1,2,3,4
> C.J.Withers1
> c.j.Withers1
> withers mcdonough  1,2,4
> Chris  2,4
> Christo4
>
> I think the basic rules are:
> - split on whitespace and punctuation (not accentuated characters and the
> like ;-)
> - index each remaining name part
> - when searching, return all records where any of the name parts match
> something like:
> string.find(name_part,search_expression)
>
> ...oh yeah, and do it blindingly quickly ;-)
>
> This would be really useful for the Creator dublin core field and anywhere
> you're searching for someone's name. The CMF could benefit from it and
would
> eliminate the phrase next to the Creator field which has haunted me from
> Squishdot:
>
> " Note that you must enter their username exactly. "
>
> cheers,
>
> Chris
>
>
>
> ___
> Zope-Dev maillist  -  [EMAIL PROTECTED]
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



[Zope-dev] Request for a Pluggin Index (NameIndex)

2001-06-04 Thread Chris Withers

Hi,

If anyone's got the time or fancies a challenge, could they write an index
that behaves as follows:

Indexed values:
1) C.J.Withers
2) Chris Withers
3) C Petrilli
4) Christopher McDonough

search result
C  1,2,3,4
C.J.Withers1
c.j.Withers1
withers mcdonough  1,2,4
Chris  2,4
Christo4

I think the basic rules are:
- split on whitespace and punctuation (not accentuated characters and the
like ;-)
- index each remaining name part
- when searching, return all records where any of the name parts match
something like:
string.find(name_part,search_expression)

...oh yeah, and do it blindingly quickly ;-)

This would be really useful for the Creator dublin core field and anywhere
you're searching for someone's name. The CMF could benefit from it and would
eliminate the phrase next to the Creator field which has haunted me from
Squishdot:

" Note that you must enter their username exactly. "

cheers,

Chris



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )