Re: [Zope-dev] Catalog improvements

2001-11-29 Thread Chris Withers

Andreas Jung wrote:
> 
> I think the software "MG" from the book "Managing Gigabytes" is GPLed and
> currently
> released as mg-1.21. Walking through the TOC of the book, it seems to be a
> very detailed
> sources about text processing and gives very much informations about
> different indexes types.
> But I miss some explanations about current data structures like suffix
> arrays or suffix tree
> that have several advantages for text processing compared to B-Trees.

Hmmm... looks like it's time ot go buy a book :-)

cheers,

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Wolfram Kerber

Yes, this looks promising. Thanks!


- Original Message -
From: "Andreas Jung" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; "Wolfram Kerber" <[EMAIL PROTECTED]>
Cc: "Casey Duncan" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, November 28, 2001 3:18 PM
Subject: Re: [Zope-dev] Catalog improvements


> TopicIndexes are currently available in the 'ajung-topicindex' branch and
> are not yet part of the Zope core.
>
> Andreas
>
> - Original Message -
> From: "Chris McDonough" <[EMAIL PROTECTED]>
> To: "Wolfram Kerber" <[EMAIL PROTECTED]>
> Cc: "Casey Duncan" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Wednesday, November 28, 2001 10:06
> Subject: Re: [Zope-dev] Catalog improvements
>
>
> > Note that one way to get the effect of "cached queries" is to use a
> > TopicIndex, which I believe either Andreas or Tres has implemented.  See
> > http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes.  I can't find
> > the actual source code, though.  Maybe either Tres or Andreas knows
> > where it is?
> >
> > Wolfram Kerber wrote:
> > > Hi
> > >
> > > No, i wasn't aware of your product :-( , the only one i found was ZOQL
> by
> > > Stephan Richter, but that didn't help much. Well, now i have written
an
> > > implementation that reuses some of the code in TextIndex (for
> parenthesis
> > > parsing and insertion of a default operator) an then saves the query
in
> RPN
> > > format (so the Catalog does't need to think that hard when being
> queried).
> > > I have taken a look at your product, and i'd say a 'new' Catalog
should
> have
> > > sort of QueryParser plugins that know how to turn string-queries (as
> yours)
> > > or SQL to native Catalog queries ...
> > > I've also contacted the authors of the two proposals, just wasn't sure
> > > wether i should start this off, since i have no experience as to how
the
> > > fishbowl works and i'm expected to finish my current project sometime
> soon.
> > >
> > >
> > > Wolfram
> > >
> > > - Original Message -
> > > From: "Casey Duncan" <[EMAIL PROTECTED]>
> > > To: "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> > > Sent: Tuesday, November 27, 2001 2:48 PM
> > > Subject: Re: [Zope-dev] Catalog improvements
> > >
> > >
> > >
> > >>On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:
> > >>
> > >>>Hi,
> > >>>
> > >>>i'm currently working on a product that allows to attach relational
> > >>>information to zope-objects. It works quite well so far, but to
further
> > >>>enhance it i need to make some changes to the Catalog. I could
perhaps
> > >>>implement it as a separate product, but i strongly feel that those
> > >>>
> > > changes
> > >
> > >>>are best applied to the Catalog itself, as they are of general use (i
> > >>>think) and involve a lot of changes to the inner workings of the
> > >>>
> > > Catalog.
> > >
> > >>>In particular i need the following:
> > >>>
> > >>>- named/stored queries
> > >>>these are precompiled queries, so they can be executed without
parsing
> > >>>
> > > and
> > >
> > >>>are easily cacheable
> > >>>i.e. similar to what is implemented in CMFTopic, but stored in the
> > >>>
> > > Catalog
> > >
> > >>>and a bit smarter
> > >>>
> > >>>- caching support
> > >>>
> > >>>- unions and intersections
> > >>>sub-queries (i.e. queries that are directed at a certain index)
should
> > >>>
> > > be
> > >
> > >>>more flexibly combineable
> > >>>
> > >>I have some code that implements this in my CatalogQuery product. It
> > >>
> > > creates
> > >
> > >>a query object from a string. Presently these are not persistent, but
> they
> > >>could easily be made to be to create precompiled queries.
> > >>
> > >>code at: http://www.zope.org/Members/Kaivo/CatalogQuery
> > >>
> > >>
> > >>>I searched this mailing-list as well as zop

Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Andreas Jung


- Original Message -
From: "Matt Hamilton" <[EMAIL PROTECTED]>
To: "Andreas Jung" <[EMAIL PROTECTED]>
Cc: "Chris Withers" <[EMAIL PROTECTED]>; "Casey Duncan"
<[EMAIL PROTECTED]>; "Steve Alexander" <[EMAIL PROTECTED]>; "Wolfram
Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, November 28, 2001 09:55
Subject: Re: [Zope-dev] Catalog improvements


> On Wed, 28 Nov 2001, Andreas Jung wrote:
>
> > I think the software "MG" from the book "Managing Gigabytes" is GPLed
and
> > currently
> > released as mg-1.21. Walking through the TOC of the book, it seems to be
a
> > very detailed
> > sources about text processing and gives very much informations about
> > different indexes types.
> > But I miss some explanations about current data structures like suffix
> > arrays or suffix tree
> > that have several advantages for text processing compared to B-Trees.
>
> Suffix Trees/Tries take up a *lot* of space.  But they are very fast, and
> useful for searching for substrings.

Usually four times the amount of the data to be indexed ;-)

Andreas


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Matt Hamilton

On Wed, 28 Nov 2001, Andreas Jung wrote:

> I think the software "MG" from the book "Managing Gigabytes" is GPLed and
> currently
> released as mg-1.21. Walking through the TOC of the book, it seems to be a
> very detailed
> sources about text processing and gives very much informations about
> different indexes types.
> But I miss some explanations about current data structures like suffix
> arrays or suffix tree
> that have several advantages for text processing compared to B-Trees.

Suffix Trees/Tries take up a *lot* of space.  But they are very fast, and
useful for searching for substrings.  The main gist of the stuff in
'Managing Gigabytes' is that it is possible to store an ascending list of
integers in a compressed form, such that on average each integer requires
only 4 bits to represent it.  This is obviously much more compact than a
straight list of 32 or 64 bit integers/longs (plus any overhead python
adds to its inbuild list type).  The other point is that you can read and
decode the lists very quickly (you don't need to decompress the entire
list first before reading it).  Also consecutive numbers only take 1 bit
of storage, this means that 'stopwords' that are normally omitted from
indexes due to their very high frequency (and hence bloat of the index)
can be stored very efficiently.

One problem is that all of the research done in MG is based on much older
hardware than is currently availible and they try to make certain
optimisations, which nowadays don't save much time.

-Matt

-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Andreas Jung


- Original Message -
From: "Chris Withers" <[EMAIL PROTECTED]>
To: "Matt Hamilton" <[EMAIL PROTECTED]>
Cc: "Casey Duncan" <[EMAIL PROTECTED]>; "Steve Alexander"
<[EMAIL PROTECTED]>; "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, November 28, 2001 09:27
Subject: Re: [Zope-dev] Catalog improvements


> Matt Hamilton wrote:
> >
> > I would like in on that too :)  About a year or so ago I was working on
a
> > full-text indexing system for indexing several gigabytes of text
(mailing
> > list archives).  Most of it was written in C and uses quite a lot of
cool
> > algorithms from various information retrieval papers and books.  I have
> > been hoping to have the time to take parts of it and work it into the
new
> > PluginIndex architecture.  The existing code uses BerkeleyDB files to
hold
> > the index structures, but I would like to use ZODB instead to give it a
> > bit more modularity.
>
> Hi Matt,
>
> Are any of these algorithms publicly available? I'd be _very_ interested
in them
> :-)
>

I think the software "MG" from the book "Managing Gigabytes" is GPLed and
currently
released as mg-1.21. Walking through the TOC of the book, it seems to be a
very detailed
sources about text processing and gives very much informations about
different indexes types.
But I miss some explanations about current data structures like suffix
arrays or suffix tree
that have several advantages for text processing compared to B-Trees.

Andreas

-
   -Andreas JungZope Corporation   -
  -   EMail: [EMAIL PROTECTED]http://www.zope.com  -
 -  "Python Powered"   http://www.python.org -
  -   "Makers of Zope"   http://www.zope.org  -
   -  "Life is a fulltime occupation"  -
-




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Chris Withers

Matt Hamilton wrote:
> 
> I would like in on that too :)  About a year or so ago I was working on a
> full-text indexing system for indexing several gigabytes of text (mailing
> list archives).  Most of it was written in C and uses quite a lot of cool
> algorithms from various information retrieval papers and books.  I have
> been hoping to have the time to take parts of it and work it into the new
> PluginIndex architecture.  The existing code uses BerkeleyDB files to hold
> the index structures, but I would like to use ZODB instead to give it a
> bit more modularity.

Hi Matt,

Are any of these algorithms publicly available? I'd be _very_ interested in them
:-)

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Andreas Jung

TopicIndexes are currently available in the 'ajung-topicindex' branch and
are not yet part of the Zope core.

Andreas

- Original Message -
From: "Chris McDonough" <[EMAIL PROTECTED]>
To: "Wolfram Kerber" <[EMAIL PROTECTED]>
Cc: "Casey Duncan" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, November 28, 2001 10:06
Subject: Re: [Zope-dev] Catalog improvements


> Note that one way to get the effect of "cached queries" is to use a
> TopicIndex, which I believe either Andreas or Tres has implemented.  See
> http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes.  I can't find
> the actual source code, though.  Maybe either Tres or Andreas knows
> where it is?
>
> Wolfram Kerber wrote:
> > Hi
> >
> > No, i wasn't aware of your product :-( , the only one i found was ZOQL
by
> > Stephan Richter, but that didn't help much. Well, now i have written an
> > implementation that reuses some of the code in TextIndex (for
parenthesis
> > parsing and insertion of a default operator) an then saves the query in
RPN
> > format (so the Catalog does't need to think that hard when being
queried).
> > I have taken a look at your product, and i'd say a 'new' Catalog should
have
> > sort of QueryParser plugins that know how to turn string-queries (as
yours)
> > or SQL to native Catalog queries ...
> > I've also contacted the authors of the two proposals, just wasn't sure
> > wether i should start this off, since i have no experience as to how the
> > fishbowl works and i'm expected to finish my current project sometime
soon.
> >
> >
> > Wolfram
> >
> > - Original Message -
> > From: "Casey Duncan" <[EMAIL PROTECTED]>
> > To: "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> > Sent: Tuesday, November 27, 2001 2:48 PM
> > Subject: Re: [Zope-dev] Catalog improvements
> >
> >
> >
> >>On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:
> >>
> >>>Hi,
> >>>
> >>>i'm currently working on a product that allows to attach relational
> >>>information to zope-objects. It works quite well so far, but to further
> >>>enhance it i need to make some changes to the Catalog. I could perhaps
> >>>implement it as a separate product, but i strongly feel that those
> >>>
> > changes
> >
> >>>are best applied to the Catalog itself, as they are of general use (i
> >>>think) and involve a lot of changes to the inner workings of the
> >>>
> > Catalog.
> >
> >>>In particular i need the following:
> >>>
> >>>- named/stored queries
> >>>these are precompiled queries, so they can be executed without parsing
> >>>
> > and
> >
> >>>are easily cacheable
> >>>i.e. similar to what is implemented in CMFTopic, but stored in the
> >>>
> > Catalog
> >
> >>>and a bit smarter
> >>>
> >>>- caching support
> >>>
> >>>- unions and intersections
> >>>sub-queries (i.e. queries that are directed at a certain index) should
> >>>
> > be
> >
> >>>more flexibly combineable
> >>>
> >>I have some code that implements this in my CatalogQuery product. It
> >>
> > creates
> >
> >>a query object from a string. Presently these are not persistent, but
they
> >>could easily be made to be to create precompiled queries.
> >>
> >>code at: http://www.zope.org/Members/Kaivo/CatalogQuery
> >>
> >>
> >>>I searched this mailing-list as well as zope.org to get an idea about
> >>>
> > what
> >
> >>>has already been discussed and requested, and there seems to be some
> >>>interest in improving the Catalog. Some people even seem to have worked
> >>>
> > on
> >
> >>>this, perhaps they could give an update on this? Possibly i don't have
> >>>
> > to
> >
> >>>write everything from scratch...
> >>>
> >>I would be willing to help both in coding and getting the code put into
> >>
> > the
> >
> >>Zope core.
> >>
> >>
> >>>I would have put this into a proposal, but there already are two
> >>>
> > proposals
> >
> >>>that deal with the features i want, one is dedicated to
> >>>union

Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Chris McDonough

Note that one way to get the effect of "cached queries" is to use a 
TopicIndex, which I believe either Andreas or Tres has implemented.  See 
http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes.  I can't find 
the actual source code, though.  Maybe either Tres or Andreas knows 
where it is?

Wolfram Kerber wrote:
> Hi
> 
> No, i wasn't aware of your product :-( , the only one i found was ZOQL by
> Stephan Richter, but that didn't help much. Well, now i have written an
> implementation that reuses some of the code in TextIndex (for parenthesis
> parsing and insertion of a default operator) an then saves the query in RPN
> format (so the Catalog does't need to think that hard when being queried).
> I have taken a look at your product, and i'd say a 'new' Catalog should have
> sort of QueryParser plugins that know how to turn string-queries (as yours)
> or SQL to native Catalog queries ...
> I've also contacted the authors of the two proposals, just wasn't sure
> wether i should start this off, since i have no experience as to how the
> fishbowl works and i'm expected to finish my current project sometime soon.
> 
> 
> Wolfram
> 
> - Original Message -
> From: "Casey Duncan" <[EMAIL PROTECTED]>
> To: "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Tuesday, November 27, 2001 2:48 PM
> Subject: Re: [Zope-dev] Catalog improvements
> 
> 
> 
>>On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:
>>
>>>Hi,
>>>
>>>i'm currently working on a product that allows to attach relational
>>>information to zope-objects. It works quite well so far, but to further
>>>enhance it i need to make some changes to the Catalog. I could perhaps
>>>implement it as a separate product, but i strongly feel that those
>>>
> changes
> 
>>>are best applied to the Catalog itself, as they are of general use (i
>>>think) and involve a lot of changes to the inner workings of the
>>>
> Catalog.
> 
>>>In particular i need the following:
>>>
>>>- named/stored queries
>>>these are precompiled queries, so they can be executed without parsing
>>>
> and
> 
>>>are easily cacheable
>>>i.e. similar to what is implemented in CMFTopic, but stored in the
>>>
> Catalog
> 
>>>and a bit smarter
>>>
>>>- caching support
>>>
>>>- unions and intersections
>>>sub-queries (i.e. queries that are directed at a certain index) should
>>>
> be
> 
>>>more flexibly combineable
>>>
>>I have some code that implements this in my CatalogQuery product. It
>>
> creates
> 
>>a query object from a string. Presently these are not persistent, but they
>>could easily be made to be to create precompiled queries.
>>
>>code at: http://www.zope.org/Members/Kaivo/CatalogQuery
>>
>>
>>>I searched this mailing-list as well as zope.org to get an idea about
>>>
> what
> 
>>>has already been discussed and requested, and there seems to be some
>>>interest in improving the Catalog. Some people even seem to have worked
>>>
> on
> 
>>>this, perhaps they could give an update on this? Possibly i don't have
>>>
> to
> 
>>>write everything from scratch...
>>>
>>I would be willing to help both in coding and getting the code put into
>>
> the
> 
>>Zope core.
>>
>>
>>>I would have put this into a proposal, but there already are two
>>>
> proposals
> 
>>>that deal with the features i want, one is dedicated to
>>>unions/intersections, the other (TopicIndexes) to performance issues (i
>>>dont't know what's the status of these though, especially the first one
>>>
> is
> 
>>>rather old), and i don't want to hijack them without asking. As so often
>>>
> i
> 
>>>will need to complete my current project first, but would then like to
>>>
> help
> 
>>>in improving the Catalog for a more general use.
>>>
>>Possibly we need to rekindle discussion. I would suggest contacting the
>>authors of those proposals to see how compatible your concepts are wth
>>theirs. Perhaps a new proposal should be drafted with the new ideas and ty
>>them back to the previous ones. If there is redundancy, that can be worked
>>out.
>>
>>
>>>So, if there is interest, i would propose to collect some ideas and
>>>comments about how a better C

Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Wolfram Kerber

Hi

No, i wasn't aware of your product :-( , the only one i found was ZOQL by
Stephan Richter, but that didn't help much. Well, now i have written an
implementation that reuses some of the code in TextIndex (for parenthesis
parsing and insertion of a default operator) an then saves the query in RPN
format (so the Catalog does't need to think that hard when being queried).
I have taken a look at your product, and i'd say a 'new' Catalog should have
sort of QueryParser plugins that know how to turn string-queries (as yours)
or SQL to native Catalog queries ...
I've also contacted the authors of the two proposals, just wasn't sure
wether i should start this off, since i have no experience as to how the
fishbowl works and i'm expected to finish my current project sometime soon.


Wolfram

- Original Message -
From: "Casey Duncan" <[EMAIL PROTECTED]>
To: "Wolfram Kerber" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, November 27, 2001 2:48 PM
Subject: Re: [Zope-dev] Catalog improvements


> On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:
> > Hi,
> >
> > i'm currently working on a product that allows to attach relational
> > information to zope-objects. It works quite well so far, but to further
> > enhance it i need to make some changes to the Catalog. I could perhaps
> > implement it as a separate product, but i strongly feel that those
changes
> > are best applied to the Catalog itself, as they are of general use (i
> > think) and involve a lot of changes to the inner workings of the
Catalog.
> > In particular i need the following:
> >
> > - named/stored queries
> > these are precompiled queries, so they can be executed without parsing
and
> > are easily cacheable
> > i.e. similar to what is implemented in CMFTopic, but stored in the
Catalog
> > and a bit smarter
> >
> > - caching support
> >
> > - unions and intersections
> > sub-queries (i.e. queries that are directed at a certain index) should
be
> > more flexibly combineable
>
> I have some code that implements this in my CatalogQuery product. It
creates
> a query object from a string. Presently these are not persistent, but they
> could easily be made to be to create precompiled queries.
>
> code at: http://www.zope.org/Members/Kaivo/CatalogQuery
>
> >
> > I searched this mailing-list as well as zope.org to get an idea about
what
> > has already been discussed and requested, and there seems to be some
> > interest in improving the Catalog. Some people even seem to have worked
on
> > this, perhaps they could give an update on this? Possibly i don't have
to
> > write everything from scratch...
>
> I would be willing to help both in coding and getting the code put into
the
> Zope core.
>
> > I would have put this into a proposal, but there already are two
proposals
> > that deal with the features i want, one is dedicated to
> > unions/intersections, the other (TopicIndexes) to performance issues (i
> > dont't know what's the status of these though, especially the first one
is
> > rather old), and i don't want to hijack them without asking. As so often
i
> > will need to complete my current project first, but would then like to
help
> > in improving the Catalog for a more general use.
>
> Possibly we need to rekindle discussion. I would suggest contacting the
> authors of those proposals to see how compatible your concepts are wth
> theirs. Perhaps a new proposal should be drafted with the new ideas and ty
> them back to the previous ones. If there is redundancy, that can be worked
> out.
>
> >
> > So, if there is interest, i would propose to collect some ideas and
> > comments about how a better Catalog should look like, how it could be
best
> > implemented and how to organize this effort (with respect to the already
> > existing proposals).
>
> I am very interested in such a discussion. Let me know what I can do to
help.
>
> /---\
>   Casey Duncan, Sr. Web Developer
>   National Legal Aid and Defender Association
>   [EMAIL PROTECTED]
> \---/


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Steve Alexander

Casey Duncan wrote:

>
> No unfortunately I think it got lost in the shuffle around the time of my 
> cross-country move. Any chance of sending it over again? I am revamping some 
> of my "old" products, perhaps this will give me an excuse to release a new 
> version of catquery.


I'll look them up and send them again soon.


> Yes, I second, third and forth that motion. I have a bunch of ideas kicking 
> around for ZODB-level indexing. Let's talk more. Perhaps we should arrange an 
> "indexing and catalog" chat on #zope.


That sounds like a good idea.


I'm writing an academic paper/presentation that I need to present on the 
6/7 December. Some time after that would be best for me.

If other good folk can collate the background information and make some 
sense of the different ideas, and put that on a wiki page, I can 
contribute to that as I have time, and then well have some sort of 
framework for a discussion on IRC.

--
Steve Alexander
Software Engineer
Cat-Box limited



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Matt Hamilton

On Tue, 27 Nov 2001, Andreas Jung wrote:

> Is this code available for public ?

Sort of :)  It used to be around, but the server with it on is currently
offline and in need of a new disk controller, so it is not to hand.  It is
also poorly commented :( and written in very highly optimised (read:
illegible) C.

The main bits needed from it are the routines to store an retrieve
compressed lists of ascending integers (ie. used in indexes).  I want to
write a python wrapper around them and release a list-like python data
structure that will allow efficient storage of indexes.  The other bit is
the code for doing the cosine ranking similarity comparison in order to
rank the documents in order of relevance to a query.

Most of the code is taken from the book/code 'Managing Gigabytes'
by Witten, Moffat & Bell (http://www.cs.mu.OZ.AU/mg/)  The code is quite
old now (1999) and designed for quite large systems, or reletively static
text (ie. doesn't do incremental indexing very well).  I worked on
developing a 'forward' index which could be easily updated, and then
inverted quite quickly on a regular basis (since it didn't need to parse
the source text again).


-Matt

-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Andreas Jung

Is this code available for public ?

Andreas
- Original Message -
From: "Matt Hamilton" <[EMAIL PROTECTED]>
To: "Casey Duncan" <[EMAIL PROTECTED]>
Cc: "Steve Alexander" <[EMAIL PROTECTED]>; "Wolfram Kerber"
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, November 27, 2001 10:06
Subject: Re: [Zope-dev] Catalog improvements


> On Tue, 27 Nov 2001, Casey Duncan wrote:
>
> > > I'm interested in this too, and I'm keen to get a solution that will
> > > work with just the ZODB, without needing all of Zope.
> >
> > Yes, I second, third and forth that motion. I have a bunch of ideas
kicking
> > around for ZODB-level indexing. Let's talk more. Perhaps we should
arrange an
> > "indexing and catalog" chat on #zope.
>
> I would like in on that too :)  About a year or so ago I was working on a
> full-text indexing system for indexing several gigabytes of text (mailing
> list archives).  Most of it was written in C and uses quite a lot of cool
> algorithms from various information retrieval papers and books.  I have
> been hoping to have the time to take parts of it and work it into the new
> PluginIndex architecture.  The existing code uses BerkeleyDB files to hold
> the index structures, but I would like to use ZODB instead to give it a
> bit more modularity.



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Matt Hamilton

On Tue, 27 Nov 2001, Casey Duncan wrote:

> > I'm interested in this too, and I'm keen to get a solution that will
> > work with just the ZODB, without needing all of Zope.
>
> Yes, I second, third and forth that motion. I have a bunch of ideas kicking
> around for ZODB-level indexing. Let's talk more. Perhaps we should arrange an
> "indexing and catalog" chat on #zope.

I would like in on that too :)  About a year or so ago I was working on a
full-text indexing system for indexing several gigabytes of text (mailing
list archives).  Most of it was written in C and uses quite a lot of cool
algorithms from various information retrieval papers and books.  I have
been hoping to have the time to take parts of it and work it into the new
PluginIndex architecture.  The existing code uses BerkeleyDB files to hold
the index structures, but I would like to use ZODB instead to give it a
bit more modularity.

-Matt


-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Casey Duncan

On Tuesday 27 November 2001 09:49 am, Steve Alexander allegedly wrote:
> Casey Duncan wrote:
> > I have some code that implements this in my CatalogQuery product. It
> > creates a query object from a string. Presently these are not persistent,
> > but they could easily be made to be to create precompiled queries.
> >
> > code at: http://www.zope.org/Members/Kaivo/CatalogQuery
>
> Casey, did you get a chance to look at my patches for adding an extended
> uniqueValues method to CatalogQuery?

No unfortunately I think it got lost in the shuffle around the time of my 
cross-country move. Any chance of sending it over again? I am revamping some 
of my "old" products, perhaps this will give me an excuse to release a new 
version of catquery.

>
> > I would be willing to help both in coding and getting the code put into
> > the Zope core.
>
>  me too!
>
> >>So, if there is interest, i would propose to collect some ideas and
> >>comments about how a better Catalog should look like, how it could be
> >> best implemented and how to organize this effort (with respect to the
> >> already existing proposals).
> >
> > I am very interested in such a discussion. Let me know what I can do to
> > help.
>
> I'm interested in this too, and I'm keen to get a solution that will
> work with just the ZODB, without needing all of Zope.

Yes, I second, third and forth that motion. I have a bunch of ideas kicking 
around for ZODB-level indexing. Let's talk more. Perhaps we should arrange an 
"indexing and catalog" chat on #zope.

/---\
  Casey Duncan, Sr. Web Developer
  National Legal Aid and Defender Association
  [EMAIL PROTECTED]
\---/

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Steve Alexander

Casey Duncan wrote:

> 
> I have some code that implements this in my CatalogQuery product. It creates 
> a query object from a string. Presently these are not persistent, but they 
> could easily be made to be to create precompiled queries.
> 
> code at: http://www.zope.org/Members/Kaivo/CatalogQuery


Casey, did you get a chance to look at my patches for adding an extended 
uniqueValues method to CatalogQuery?

 
> I would be willing to help both in coding and getting the code put into the 
> Zope core.


 me too!


 
>>So, if there is interest, i would propose to collect some ideas and
>>comments about how a better Catalog should look like, how it could be best
>>implemented and how to organize this effort (with respect to the already
>>existing proposals).
> 
> I am very interested in such a discussion. Let me know what I can do to help.


I'm interested in this too, and I'm keen to get a solution that will 
work with just the ZODB, without needing all of Zope.


--
Steve Alexander
Software Engineer
Cat-Box limited



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Casey Duncan

On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:
> Hi,
>
> i'm currently working on a product that allows to attach relational
> information to zope-objects. It works quite well so far, but to further
> enhance it i need to make some changes to the Catalog. I could perhaps
> implement it as a separate product, but i strongly feel that those changes
> are best applied to the Catalog itself, as they are of general use (i
> think) and involve a lot of changes to the inner workings of the Catalog.
> In particular i need the following:
>
> - named/stored queries
> these are precompiled queries, so they can be executed without parsing and
> are easily cacheable
> i.e. similar to what is implemented in CMFTopic, but stored in the Catalog
> and a bit smarter
>
> - caching support
>
> - unions and intersections
> sub-queries (i.e. queries that are directed at a certain index) should be
> more flexibly combineable

I have some code that implements this in my CatalogQuery product. It creates 
a query object from a string. Presently these are not persistent, but they 
could easily be made to be to create precompiled queries.

code at: http://www.zope.org/Members/Kaivo/CatalogQuery

>
> I searched this mailing-list as well as zope.org to get an idea about what
> has already been discussed and requested, and there seems to be some
> interest in improving the Catalog. Some people even seem to have worked on
> this, perhaps they could give an update on this? Possibly i don't have to
> write everything from scratch...

I would be willing to help both in coding and getting the code put into the 
Zope core.

> I would have put this into a proposal, but there already are two proposals
> that deal with the features i want, one is dedicated to
> unions/intersections, the other (TopicIndexes) to performance issues (i
> dont't know what's the status of these though, especially the first one is
> rather old), and i don't want to hijack them without asking. As so often i
> will need to complete my current project first, but would then like to help
> in improving the Catalog for a more general use.

Possibly we need to rekindle discussion. I would suggest contacting the 
authors of those proposals to see how compatible your concepts are wth 
theirs. Perhaps a new proposal should be drafted with the new ideas and ty 
them back to the previous ones. If there is redundancy, that can be worked 
out.

>
> So, if there is interest, i would propose to collect some ideas and
> comments about how a better Catalog should look like, how it could be best
> implemented and how to organize this effort (with respect to the already
> existing proposals).

I am very interested in such a discussion. Let me know what I can do to help.

/---\
  Casey Duncan, Sr. Web Developer
  National Legal Aid and Defender Association
  [EMAIL PROTECTED]
\---/

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-21 Thread Wolfram Kerber


- Original Message -
From: "Jeffrey P Shell" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, November 21, 2001 7:38 PM
Subject: Re: [Zope-dev] Catalog improvements


>
> On Tuesday, November 20, 2001, at 03:35  PM, Wolfram Kerber wrote:
>
> > Hi,
> >
> > i'm currently working on a product that allows to attach relational
> > information to zope-objects. It works quite well so far, but to further
> > enhance it i need to make some changes to the Catalog. I could perhaps
> > implement it as a separate product, but i strongly feel that those
> > changes
> > are best applied to the Catalog itself, as they are of general use
> > (i think)
> > and involve a lot of changes to the inner workings of the Catalog. In
> > particular i need the following:
> >
> > - named/stored queries
> > these are precompiled queries, so they can be executed without
> > parsing and
> > are easily cacheable
> > i.e. similar to what is implemented in CMFTopic, but stored in the
> > Catalog
> > and a bit smarter
>
> There used to be something like this in ZTables/Tabula (a Zope 1.x
> product that was sort of the genesis of the Catalog, for better or
> worse) called 'Hierarchies'.  Hierarchies were actually indexes (I
> think the current Keyword index is descended from the Keyword
> Hierarchy).
>
> I don't know what happened to that code.  If it's not available,
> you could probably achieve the effect that you're looking for here
> with PluginIndexes

I think your right. Indexes also have a management interface that could be
used to define the query. It could result in a nesting problem however, if
'QueryIndexes' rely on each others results (that they should be able to). I
would possibly need a management view that shows the hirarchical structure
of the Indexes, but it can be merely that, a view.
I'll try this out...

>, which wouldn't require changing the Catalog at all.

I'd say, if  i would _not_ store the result of the query and just delegate
to other indexes this would be true, otherwise i would need some notify
mechanism to tell if my result is affected by an indexing call, and/or at
least be notified when the call is over so i can update the result by
issuing a query, but the latter would mean to 'take the big hit' as you
mentioned, wich i think isn't acceptable.

> Just write a "Query Index" that indexes objects that match
> its pre-cooked Query.  This would speed up searching tremendously,
> but you could take a big hit at indexing time if you have many of
> them.
>
> Jeffrey P Shell, [EMAIL PROTECTED]

thanks,

Wolfram


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-21 Thread Jeffrey P Shell


On Tuesday, November 20, 2001, at 03:35  PM, Wolfram Kerber wrote:

> Hi,
>
> i'm currently working on a product that allows to attach relational
> information to zope-objects. It works quite well so far, but to further
> enhance it i need to make some changes to the Catalog. I could perhaps
> implement it as a separate product, but i strongly feel that those 
> changes
> are best applied to the Catalog itself, as they are of general use 
> (i think)
> and involve a lot of changes to the inner workings of the Catalog. In
> particular i need the following:
>
> - named/stored queries
> these are precompiled queries, so they can be executed without 
> parsing and
> are easily cacheable
> i.e. similar to what is implemented in CMFTopic, but stored in the 
> Catalog
> and a bit smarter

There used to be something like this in ZTables/Tabula (a Zope 1.x 
product that was sort of the genesis of the Catalog, for better or 
worse) called 'Hierarchies'.  Hierarchies were actually indexes (I 
think the current Keyword index is descended from the Keyword 
Hierarchy).

I don't know what happened to that code.  If it's not available, 
you could probably achieve the effect that you're looking for here 
with PluginIndexes, which wouldn't require changing the Catalog at 
all.  Just write a "Query Index" that indexes objects that match 
its pre-cooked Query.  This would speed up searching tremendously, 
but you could take a big hit at indexing time if you have many of 
them.

Jeffrey P Shell, [EMAIL PROTECTED]


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



[Zope-dev] Catalog improvements

2001-11-20 Thread Wolfram Kerber

Hi,

i'm currently working on a product that allows to attach relational
information to zope-objects. It works quite well so far, but to further
enhance it i need to make some changes to the Catalog. I could perhaps
implement it as a separate product, but i strongly feel that those changes
are best applied to the Catalog itself, as they are of general use (i think)
and involve a lot of changes to the inner workings of the Catalog. In
particular i need the following:

- named/stored queries
these are precompiled queries, so they can be executed without parsing and
are easily cacheable
i.e. similar to what is implemented in CMFTopic, but stored in the Catalog
and a bit smarter

- caching support

- unions and intersections
sub-queries (i.e. queries that are directed at a certain index) should be
more flexibly combineable

I searched this mailing-list as well as zope.org to get an idea about what
has already been discussed and requested, and there seems to be some
interest in improving the Catalog. Some people even seem to have worked on
this, perhaps they could give an update on this? Possibly i don't have to
write everything from scratch...
I would have put this into a proposal, but there already are two proposals
that deal with the features i want, one is dedicated to
unions/intersections, the other (TopicIndexes) to performance issues (i
dont't know what's the status of these though, especially the first one is
rather old), and i don't want to hijack them without asking. As so often i
will need to complete my current project first, but would then like to help
in improving the Catalog for a more general use.

So, if there is interest, i would propose to collect some ideas and comments
about how a better Catalog should look like, how it could be best
implemented and how to organize this effort (with respect to the already
existing proposals).


--
Wolfram Kerber
Gallileus GmbH   http://www.gallileus.info/



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )