Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-20 Thread Chris Withers

 On Mon, 18 Jun 2001, Andreas Jung wrote:
 
  These are good ideas to improve the TextIndex. I already encouraged
  Erik to put alltogether into a Fishbowl proposal,
 
 Which I would do, if I had time.  Which I will have, but not for another
 two weeks. :-)

I'm guessing this is the point at which your problems become mine? ;-)

*grinz*

Chris


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-20 Thread Erik Enge

On Tue, 19 Jun 2001, Chris Withers wrote:

 I'm guessing this is the point at which your problems become mine? ;-)

*evil laughter*  Yes :-)

We should write about it and publish it to the community...


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-19 Thread Rik Hoekstra

 
 Rik Hoekstra writes:
   This raises the question how dependent the splitter on the paticularities of the
   document source - I do not really see how different splitters could be useful
   for one single document. This is perhaps less obvious than it appears, as you
   may want to use different splitters for documents in different languages. Taken
   as a whole I would say choosing a splitter would be a decision that had to be
   taken at indexing time anyway. But perhaps it's just my imagination that is

 
 Of couse, the search must follow the same splitting rules
 than the indexing did. Changing the rules (the splitter
 or its configuration) after indexing will make the index
 inconsistent.
 

I agree; in fact I think we're saying the same. What is more interesting, is how
(less than when) you decide to use which splitter. With heterogeneous documents
I'd think it would be difficult to decide automagically...

Rik

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-19 Thread Erik Enge

On Mon, 18 Jun 2001, Andreas Jung wrote:

 These are good ideas to improve the TextIndex. I already encouraged
 Erik to put alltogether into a Fishbowl proposal,

Which I would do, if I had time.  Which I will have, but not for another
two weeks. :-)


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Rik Hoekstra



Chris McDonough wrote:
 
 It just occurred to me that depending on the splitter to do
 positions makes it impossible to alter the splitter without
 reindexing the whole text index... but I think this is a
 reasonable tradeoff.  Other opinions welcome.
 

This raises the question how dependent the splitter on the paticularities of the
document source - I do not really see how different splitters could be useful
for one single document. This is perhaps less obvious than it appears, as you
may want to use different splitters for documents in different languages. Taken
as a whole I would say choosing a splitter would be a decision that had to be
taken at indexing time anyway. But perhaps it's just my imagination that is
lacking. 

There is a much greater dependence on the lexicon here. And indeed several
different lexicons could be applied to a set of documents depending of what is
wanted. 

my 2 cents

Rik

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Rik Hoekstra

 
  Once you're satisfied with the implementation, would you be willing
  submit the module to the collector?
 
 Do you think you (or someone else for that matter) could have a look at
 [1] the method that returns the position in the document - positionInDoc()
 - to how that could be made to run much faster?  Maybe it is how it
 used...  It is too slow to be very useful when indexing large amounts of
 data.
 
 Anyway, I suck at making Python fast (or using it the right way, which
 ever I've fallen pray for this time ;-), and any hints would be greatly
 appretiated.
 
 I've been indexing and searching a lot this weekend, and bar that problem
 with the indexing-speed it seems ok and I have no issues submitting it to
 the Collector.
 
Doing something similar (in fact what I needed was citations of word usage) I
took a two step approach, with the idea that most of the actual returning of
results would have to be done on a much smaller subset of documents than if
you'd have to index all documents with word indexes and positions.

I use a normal textindex for querying. Then if a document is returned by the
query I start processing the documents. This requires parsing the query in a
slightly different way (throw out the NOTs). The two step approach has the
advantage that you can postpone processing actual documents until you return the
results for the specific documents. 

Using your positionInDoc will require a _lot_ of processing (why does it use
string.split btw and not Splitter?; why split on   and not on
string.whitespace?). I have used string.find for finding word positions, which
is probably faster than looping a list of words. BTW, I'd rather use Splitter,
but word positions appeared not to be reliable (bug, or something I didn't
understand; anyhow, string.find works for me and is fast)

def splitit(txt, word):
postions = []
start = 0
while 1:
  res = string.find(txt, word, start)
  if res is -1:
  break
  else:
  start = res+1
  postions.append(res)
return postions


sidenotePerhaps using re would perhaps also be an option, but allowing regular
expressions will complicate searching a lot, so I use globbing lexicon for
expanding and then do the matching on the expanded items (if necessary - not if
using [wordpart]*)/sidenote

Advantages of using this approach:
- it's faster. 
- it splits up the query processing part in different subparts which also
contributes to speeding things up. 
- it's also more flexible, as you can divide searching and parsing over
different webrequests, and even make them dependend on the number of results.
For example: why return text fragments from all documents if your users will not
be able to see all the results anyway. Or why return all fragments containing
word combinations from one single document while returning a few occurrences
from different documents is more useful for your users. Note that this will
mainly affect returning text fragments, which may or may not be useful.

There's also a couple of disadvantages (as I see them , but there may be more):
- it only works with exact word positions and not numbers in a text. The within
two words approach may be remedied by using string.split on substrings however
if really needed. Depending on you purposes an even rougher approach is by
taking some default length for words (this is a bit faster). These are not very
elegant solutions, though.
- because of an approach that is not so coupled with (Z)Catalog, integration
strategies are less obvious (at least for me)
- the positionIndex might be used for further processing as is, in my approach
this is less obvious.


another 2 cents

Rik

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread R. David Murray

On Sun, 17 Jun 2001, Chris McDonough wrote:
 index_object, because the splitter return has all the words
 in order, even the dupes... as you iterate, you can mutate

Is this part of the current formal Splitter Interface? If not,
it needs to be if other code is going to depend on it.

Oh, yeah, and where is the formal Splitter interface documented grin?
I don't see anything in SearchIndex, and a search for splitter interface
on zope.org didn't turn up anything useful.

--RDM


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Andreas Jung

The Splitter interface is not really document. However Zope 2.4
has a much better support for 3rd party splitters.

Andreas
- Original Message -
From: R. David Murray  [EMAIL PROTECTED]
To: Chris McDonough [EMAIL PROTECTED]
Cc: Erik Enge [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, June 18, 2001 11:39 AM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


 On Sun, 17 Jun 2001, Chris McDonough wrote:
  index_object, because the splitter return has all the words
  in order, even the dupes... as you iterate, you can mutate

 Is this part of the current formal Splitter Interface? If not,
 it needs to be if other code is going to depend on it.

 Oh, yeah, and where is the formal Splitter interface documented grin?
 I don't see anything in SearchIndex, and a search for splitter interface
 on zope.org didn't turn up anything useful.

 --RDM


 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Dieter Maurer

Rik Hoekstra writes:
  This raises the question how dependent the splitter on the paticularities of the
  document source - I do not really see how different splitters could be useful
  for one single document. This is perhaps less obvious than it appears, as you
  may want to use different splitters for documents in different languages. Taken
  as a whole I would say choosing a splitter would be a decision that had to be
  taken at indexing time anyway. But perhaps it's just my imagination that is
  lacking. 
There are lots of things you may want to change based on
experience with your index:

  *  change the set of token boundary characters
 they define, where words are broken out.

  *  change the set of removed characters
 they are removed from the words, usually for
 normalization.

 In German, e.g., you can write both Auto-Lackierer
 and Autolackierer. You want to normalize
 these different spellings.

  *  change the set of composing characters

 German is very rich in composite terms.
 You may want to index under each component term.
 For this, you need the rules on how the composition
 is build.
 For text, it is usually '-'. But if you have
 computer sources, '_' or ':' may be relevant, too.

Of couse, the search must follow the same splitting rules
than the indexing did. Changing the rules (the splitter
or its configuration) after indexing will make the index
inconsistent.


Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Andreas Jung

These are good ideas to improve the TextIndex. I already encouraged Erik
to put alltogether into a Fishbowl proposal,

Andreas
- Original Message -
From: Dieter Maurer [EMAIL PROTECTED]
To: Rik Hoekstra [EMAIL PROTECTED]
Cc: Chris McDonough [EMAIL PROTECTED]; Erik Enge
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, June 18, 2001 4:59 PM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


 Rik Hoekstra writes:
   This raises the question how dependent the splitter on the
paticularities of the
   document source - I do not really see how different splitters could be
useful
   for one single document. This is perhaps less obvious than it appears,
as you
   may want to use different splitters for documents in different
languages. Taken
   as a whole I would say choosing a splitter would be a decision that had
to be
   taken at indexing time anyway. But perhaps it's just my imagination
that is
   lacking.
 There are lots of things you may want to change based on
 experience with your index:

   *  change the set of token boundary characters
  they define, where words are broken out.

   *  change the set of removed characters
  they are removed from the words, usually for
  normalization.

  In German, e.g., you can write both Auto-Lackierer
  and Autolackierer. You want to normalize
  these different spellings.

   *  change the set of composing characters

  German is very rich in composite terms.
  You may want to index under each component term.
  For this, you need the rules on how the composition
  is build.
  For text, it is usually '-'. But if you have
  computer sources, '_' or ':' may be relevant, too.

 Of couse, the search must follow the same splitting rules
 than the indexing did. Changing the rules (the splitter
 or its configuration) after indexing will make the index
 inconsistent.


 Dieter

 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Erik Enge

On Sat, 16 Jun 2001 [EMAIL PROTECTED] wrote:

 Lexis-Nexis:  Sean w/2 Upton  (where w/2 is within 2 words)

This wouldn't be hard to make happen.  I don't know if it is better to do
it before of after the parsers, though.  Maybe a more userfriendly alias
would be best as a default?



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Erik Enge

On Fri, 15 Jun 2001, Chris McDonough wrote:

 Once you're satisfied with the implementation, would you be willing
 submit the module to the collector?

Do you think you (or someone else for that matter) could have a look at
[1] the method that returns the position in the document - positionInDoc()
- to how that could be made to run much faster?  Maybe it is how it
used...  It is too slow to be very useful when indexing large amounts of
data.

Anyway, I suck at making Python fast (or using it the right way, which
ever I've fallen pray for this time ;-), and any hints would be greatly
appretiated.

I've been indexing and searching a lot this weekend, and bar that problem
with the indexing-speed it seems ok and I have no issues submitting it to
the Collector.

[1] URL:http://nittin.net/erik/software/PositionIndex/PositionIndex.py


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Chris McDonough

It just occurred to me that depending on the splitter to do
positions makes it impossible to alter the splitter without
reindexing the whole text index... but I think this is a
reasonable tradeoff.  Other opinions welcome.

On Sun, 17 Jun 2001 15:57:20 -0400
 Chris McDonough [EMAIL PROTECTED] wrote:
 On Sun, 17 Jun 2001 21:05:47 +0200 (CEST)
  Erik Enge [EMAIL PROTECTED] wrote:
  On Fri, 15 Jun 2001, Chris McDonough wrote:
  
   Once you're satisfied with the implementation, would
  you be willing
   submit the module to the collector?
  
  Do you think you (or someone else for that matter)
 could
  have a look at
  [1] the method that returns the position in the
 document
  - positionInDoc()
  - to how that could be made to run much faster?  Maybe
 it
  is how it
  used...  It is too slow to be very useful when indexing
  large amounts of
  data.
 
 Erik,
 
 It looks like you call proximityInsert for each item
 returned from the splitter on the doc source.  Instead of
 looking for the position in the source document by
 splitting
 the source up again within proximityInsert, you can keep
 a
 simple counter while you iterate over the splitter return
 in
 index_object, because the splitter return has all the
 words
 in order, even the dupes... as you iterate, you can
 mutate
 the position entry for that word/documentId pair within
 proximityInsert.  You never actually need to manually
 split
 the document source, instead just always rely on the
 splitter to bust up the doc, and manipulate the position
 list in place.  This is not the most efficient way, but
 it's
 more efficient than your current way.
 
 Therefore, the bit in index_object becomes:
 
 i = 0
 for word in splitter(source): 
 self.proximityInsert(word, documentId, i)
 i = i + 1
 
 The proximityInsert method becomes:
 
 def proximityInsert(self, word, documentId, i):
 Insert proximity information about this wid (word
 id)
 in
 the index' proximity bucket.
 wid=self.getWid(word)
 prox=self._proximity
 if not prox.has_key(wid):
 prox[wid]=IOBTree()
 prox[wid][documentId]=[i]
 self._p_changed = 1
 else:
 if i in prox[wid][documentId]: return
 prox[wid][documentId].append(i)
 self._p_changed = 1
 
 .. and the positionInDoc method goes away.
 
 I didn't scan too hard for what else in the source this
 would break.
 
  Anyway, I suck at making Python fast (or using it the
  right way, which
  ever I've fallen pray for this time ;-), and any hints
  would be greatly
  appretiated.
  
  I've been indexing and searching a lot this weekend,
 and
  bar that problem
  with the indexing-speed it seems ok and I have no
 issues
  submitting it to
  the Collector.
 
 Cool...
 
  
  [1] URL:http://nittin.net/erik/software/PositionIndex/PositionIndex.py
  
 


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Chris McDonough

On Sun, 17 Jun 2001 21:05:47 +0200 (CEST)
 Erik Enge [EMAIL PROTECTED] wrote:
 On Fri, 15 Jun 2001, Chris McDonough wrote:
 
  Once you're satisfied with the implementation, would
 you be willing
  submit the module to the collector?
 
 Do you think you (or someone else for that matter) could
 have a look at
 [1] the method that returns the position in the document
 - positionInDoc()
 - to how that could be made to run much faster?  Maybe it
 is how it
 used...  It is too slow to be very useful when indexing
 large amounts of
 data.

Erik,

It looks like you call proximityInsert for each item
returned from the splitter on the doc source.  Instead of
looking for the position in the source document by splitting
the source up again within proximityInsert, you can keep a
simple counter while you iterate over the splitter return in
index_object, because the splitter return has all the words
in order, even the dupes... as you iterate, you can mutate
the position entry for that word/documentId pair within
proximityInsert.  You never actually need to manually split
the document source, instead just always rely on the
splitter to bust up the doc, and manipulate the position
list in place.  This is not the most efficient way, but it's
more efficient than your current way.

Therefore, the bit in index_object becomes:

i = 0
for word in splitter(source):   
self.proximityInsert(word, documentId, i)
i = i + 1

The proximityInsert method becomes:

def proximityInsert(self, word, documentId, i):
Insert proximity information about this wid (word id)
in
the index' proximity bucket.
wid=self.getWid(word)
prox=self._proximity
if not prox.has_key(wid):
prox[wid]=IOBTree()
prox[wid][documentId]=[i]
self._p_changed = 1
else:
if i in prox[wid][documentId]: return
prox[wid][documentId].append(i)
self._p_changed = 1

.. and the positionInDoc method goes away.

I didn't scan too hard for what else in the source this
would break.

 Anyway, I suck at making Python fast (or using it the
 right way, which
 ever I've fallen pray for this time ;-), and any hints
 would be greatly
 appretiated.
 
 I've been indexing and searching a lot this weekend, and
 bar that problem
 with the indexing-speed it seems ok and I have no issues
 submitting it to
 the Collector.

Cool...

 
 [1] URL:http://nittin.net/erik/software/PositionIndex/PositionIndex.py
 


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



RE: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Dieter Maurer

[EMAIL PROTECTED] writes:
  A lot of folks who do power searches, say, librarians or other trained
  researchers, familiar with the bells and whistles of more powerful search
  engines, will want a simple operator for proximity, with the ability to
  specify proximity depth:
  
  For example:
  
  Lexis-Nexis: Sean w/2 Upton  (where w/2 is within 2 words)
   Also, lexis doesn't count stop-words in proximity
  indexes.
  Folio/Nextpage:  Sean Upton@2
  
  IMHO, the syntax is clean and very brief in the Lexis-Nexis case and should
  suppliment a more generic 
   Sean ... Upton
  style search.
I do not think, it is a good idea to have an infix operator
for proximity searches. This combines just 2 words but
proximity searches may involve more than two words:
a set of words, near together (e.g. in one paragraph, sentence,
within x words).


Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-16 Thread Erik Enge

On Fri, 15 Jun 2001, Chris McDonough wrote:

 Once you're satisfied with the implementation, would you be willing
 submit the module to the collector?

Will do.  Have you thought about how users actually are to use
exact-phrase?  What I'm thinking I will do here (currently I've only been
testing explicitly with adjoinedby in the query) is to insert
adjoinedby in phrased searches:

erik enge- erik adjoinedby enge
erik ... enge  - erik near enge

What do you think?

I'll be submitting PositionIndex.py and ResultList.py in a day or two.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



RE: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-16 Thread sean . upton

A lot of folks who do power searches, say, librarians or other trained
researchers, familiar with the bells and whistles of more powerful search
engines, will want a simple operator for proximity, with the ability to
specify proximity depth:

For example:

Lexis-Nexis:Sean w/2 Upton  (where w/2 is within 2 words)
Also, lexis doesn't count stop-words in proximity
indexes.
Folio/Nextpage: Sean Upton@2

IMHO, the syntax is clean and very brief in the Lexis-Nexis case and should
suppliment a more generic 
Sean ... Upton
style search.

Sean

-Original Message-
From: Chris McDonough [mailto:[EMAIL PROTECTED]]
Sent: Saturday, June 16, 2001 2:59 AM
To: Erik Enge
Cc: [EMAIL PROTECTED]
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


Erik Enge wrote:
 
 On Fri, 15 Jun 2001, Chris McDonough wrote:
 
  Once you're satisfied with the implementation, would you be willing
  submit the module to the collector?
 
 Will do.  Have you thought about how users actually are to use
 exact-phrase?  What I'm thinking I will do here (currently I've only been
 testing explicitly with adjoinedby in the query) is to insert
 adjoinedby in phrased searches:
 
 erik enge- erik adjoinedby enge
 erik ... enge  - erik near enge
 
 What do you think?

These both look like good spellings, and I think erik near enge would
be a good alias for erik ... enge as well..

- C

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-15 Thread Erik Enge

On Thu, 14 Jun 2001, Chris McDonough wrote:

 Excellent!  I haven't looked at it in detail, but thanks very much for
 contributing it! Maybe we can roll some of this work into a
 position-aware Text Index

It is actually a TextIndex on steoroids.  Remove the _proximity attribute
and a couple of methods and what you are left with is a standard
TextIndex.  So I think what you already have is a position-aware
TextIndex.  That's how I'm planning to use it anyway :)

 or maybe even a new kind of Pluggable Index.

:-)


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-15 Thread Erik Enge

On Thu, 14 Jun 2001, Erik Enge wrote:

 To be really useful I think the PossitionIndex' _proximity dictionary
 needs to be turned into a BTree of some sort, but apart from that I
 don't know what is missing.

It's now using BTrees.  And I renamed it to PositionIndex (thanks to
Chris Withers for this :-).
 
 And speed might be a problem, haven't really tested that yet.  Will
 during the weekend though.

I indexed 30.000 objects using PositionIndex and searching (both
exact-phrase and near) is very fast.  It doesn't seem to be bloated,
either (the _proximity-attribute, that is).

Do you guys have a testing-suite for indexes?  Maybe some I can apply to
this index of mine?


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-15 Thread Chris McDonough

Erik,

Once you're satisfied with the implementation, would you be willing submit
the module to the collector?

- C

- Original Message -
From: Erik Enge [EMAIL PROTECTED]
To: Chris McDonough [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, June 15, 2001 11:53 AM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


 On Thu, 14 Jun 2001, Erik Enge wrote:

  To be really useful I think the PossitionIndex' _proximity dictionary
  needs to be turned into a BTree of some sort, but apart from that I
  don't know what is missing.

 It's now using BTrees.  And I renamed it to PositionIndex (thanks to
 Chris Withers for this :-).

  And speed might be a problem, haven't really tested that yet.  Will
  during the weekend though.

 I indexed 30.000 objects using PositionIndex and searching (both
 exact-phrase and near) is very fast.  It doesn't seem to be bloated,
 either (the _proximity-attribute, that is).

 Do you guys have a testing-suite for indexes?  Maybe some I can apply to
 this index of mine?


 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-14 Thread Chris McDonough

Excellent!  I haven't looked at it in detail, but thanks very much for
contributing it! Maybe we can roll some of this work into a position-aware
Text Index, or maybe even a new kind of Pluggable Index.

- C

- Original Message -
From: Erik Enge [EMAIL PROTECTED]
To: Chris McDonough [EMAIL PROTECTED]
Cc: Oren Yosifon [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, June 14, 2001 12:45 PM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


 On Thu, 14 Jun 2001, Erik Enge wrote:

  Me got a patch: URL:http://nittin.net/erik/software/PossitionIndex.

 And I should mention that it has only been tested on Zope 2.3.2.

 (BTW, thanks, Chris, for suggesting how to code it.)




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )