Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-06-24 Thread Chris McDonough

Abel, many thanks for this analysis, I've put this into the
Collector...

On Sat, 23 Jun 2001 22:59:32 +0200
 abel deuring [EMAIL PROTECTED] wrote:
 Erik,
 
 I'm afraid that your patch does not solve all the
 problems you mentioned
 in an earlier mail.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-06-23 Thread abel deuring

Erik Enge wrote:
 
 On Wed, 30 May 2001, Erik Enge wrote:
 
  I'm going bug hunting...
 
 I'm back :)
 
 I think I found the bug.  In lib/python/SearchIndex/GlobbingLexicon.py in
 the query_hook() method.  It seems to say that: if I can't find a '*' or
 a '?' in the word, then go to else-clause, where the else-clause says
 sodd off.
 
 Since it iterates over the query, 'word' is actually a list if you use
 parens in your query, and you won't find any wildcards there.  I think.
 
 Add a dash of recursiveness, and it seems to be solved (for me):
 
 def erik_hook(self, q):
 doc string
 words = []
 for w in q:
 if ( (self.multi_wc in w) or
  (self.single_wc in w) ):
 wids = self.get(w)
 for wid in wids:
 if words:
 words.append(Or)
 words.append(wid)
 else:
 words.append(self.erik_hook(w))
 return words or ['']
 
 def query_hook(self, q):
 expand wildcards
 words = []
 for w in q:
 if ( (self.multi_wc in w) or
  (self.single_wc in w) ):
 wids = self.get(w)
 for wid in wids:
 if words:
 words.append(Or)
 words.append(wid)
 else:
 words.append(self.erik_hook(w))
 
 Not really tested, but it seems to work.  This might have been resolved in
 CVS, I don't know, should I post it as a bug?

Erik,

I'm afraid that your patch does not solve all the problems you mentioned
in an earlier mail.

You are right that the implementation of query_hook in Zope 2.3.2 and
2.4.0b1 cannot handle words with wildcards in nested lists, but your
patch will lead to endless recursion, if you enter the most simple
query: just one word without wildcards. In this case, if (
(self.multi_wc in w)... evaluates to false, hence self.erik_hook is
call for this word, where if ( (self.multi_wc in w)... is again false,
and erik_hook is called again...

The statement q = parse(s) in UnTextIndex.query (and
PositionIndex.query) before the call to query_hook can return nested
lists, so query_hook must be aware of this.

This can be done with:

def query_hook(self, q):
expand wildcards
words = []
for w in q:
if type(w) is type([]):
words.append(self.query_hook(w))
else:
if ( (self.multi_wc in w) or
 (self.single_wc in w) ):
wids = self.get(w)
for wid in wids:
if words:
words.append(Or)
words.append(wid)
else:
words.append(w)
# if words is empty, return something that will make
textindex's
# __getitem__ return an empty result list
return words or ['']

You also mentioned the strange results of queries like eri* and enge.
These are caused by another bug in query_hook:

The results from the wildcard expansion are simply inserted into the
result list. Example: ab* and xyz may be expended by query_hook into 

['aba', 'or', 'abb', 'or', 'abc', 'and', 'xyz']

Since UnTextIndex.evaluate looks first for 'and' operators, this is
eqivalent to 

['aba', 'or', 'abb', 'or', ['abc', 'and', 'xyz']]

(The funny (or confusing) side effect is that ab* and xyz may return
different results compared with xyz and ab*, because aba and xyz
probably gives results different from those for abc and xyz.)

but we need a result like

[['aba', 'or', 'abb', 'or', 'abc'], 'and', 'xyz']

This version of query_hook below fixes the problem:

def query_hook(self, q):
expand wildcards
words = []
for w in q:
if type(w) is type(''):
if ( (self.multi_wc in w) or
 (self.single_wc in w) ):
wids = self.get(w)
alternatives = []
for wid in wids:
if alternatives:
alternatives.append(Or)
alternatives.append(wid)
words.append(alternatives or [''])
else:
words.append(w)
else:
words.append(self.query_hook(w))
# if words is empty, return something that will make textindex's
# __getitem__ return an empty result list
return words or ['']

You also mentioned the parse result

['abc', '...', '...', '...', 'def']

which you could not reproduce. Playing with Catalogs, I accidentally
produced the corresponding query: 'abc ... def', i.e., the double
quotation marks are part of the query string. UnTextIndex.quotes splits
the string between two quotation marks 

Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-30 Thread Chris McDonough

Thanks for tracking this down... If you're so inclined, please put this
in the Collector (with a description of the problem, as well as a way to
reproduce it, the patch alone isn't nearly as helpful) so it doesn't get
dropped on the floor.  I doubt very much that it's fixed in CVS.

- C


Erik Enge wrote:
 
 On Wed, 30 May 2001, Erik Enge wrote:
 
  I'm going bug hunting...
 
 I'm back :)
 
 I think I found the bug.  In lib/python/SearchIndex/GlobbingLexicon.py in
 the query_hook() method.  It seems to say that: if I can't find a '*' or
 a '?' in the word, then go to else-clause, where the else-clause says
 sodd off.
 
 Since it iterates over the query, 'word' is actually a list if you use
 parens in your query, and you won't find any wildcards there.  I think.
 
 Add a dash of recursiveness, and it seems to be solved (for me):
 
 def erik_hook(self, q):
 doc string
 words = []
 for w in q:
 if ( (self.multi_wc in w) or
  (self.single_wc in w) ):
 wids = self.get(w)
 for wid in wids:
 if words:
 words.append(Or)
 words.append(wid)
 else:
 words.append(self.erik_hook(w))
 return words or ['']
 
 def query_hook(self, q):
 expand wildcards
 words = []
 for w in q:
 if ( (self.multi_wc in w) or
  (self.single_wc in w) ):
 wids = self.get(w)
 for wid in wids:
 if words:
 words.append(Or)
 words.append(wid)
 else:
 words.append(self.erik_hook(w))
 
 Not really tested, but it seems to work.  This might have been resolved in
 CVS, I don't know, should I post it as a bug?
 
 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-29 Thread Erik Enge

On Thu, 24 May 2001, Michel Pelletier wrote:

 I don't think you are using a globbing vocabulary.

I think I am:

 print_info(applic.Catalog(word='scripto*'))
unsplitted ['scripto*']
unl: ['scripto*']
unq: [104623, 'or', 112198, 'or', 151568]
Length: 6
Content: [mybrains instance at 1226d358, mybrains instance at 127bb540, mybrains 
instance at 12bd8138, mybrains instance at 127bb658, mybrains instance at 
1226c620, mybrains instance at 12092eb0]
 print_info(applic.Catalog(word='(scripto*)'))
unsplitted []
unsplitted ['scripto*']
unl: ['scripto*']
unsplitted []
unl: [['scripto*']]
unq: ['scripto*']
unq: [['scripto*']]
Length: 0
Content: []


the unsplitted, unl and unq are my debug flags, but you can see what
happens: without parens the '*' has it's desired effect, with, it doesn't.

Got a clue?  Is this my bug, or ZCatalog's?


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-29 Thread Michel Pelletier

On Tue, 29 May 2001, Erik Enge wrote:

 On Thu, 24 May 2001, Michel Pelletier wrote:
 
 the unsplitted, unl and unq are my debug flags, but you can see what
 happens: without parens the '*' has it's desired effect, with, it doesn't.
 
 Got a clue?  Is this my bug, or ZCatalog's?

Must be ZCatalog's.  I'm guessing the paren matching takes a different
code path that doesn't expand wildcards.

-Michel


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-25 Thread Erik Enge

On Thu, 24 May 2001, Michel Pelletier wrote:

 I don't think you are using a globbing vocabulary.

But globbing works for other queries.  In the same catalog.
 
 If you are not using a glob vocab, I suspect it stripped out the ? and
 is hitting on 'eri'.  Do you have that word anywhere?

I tried searching for:

   eri

and that gave me four results.  No globbing, then?
 
 Then again, where did you get these objects?  If you were looking at the
 wrong point in the code, the wildcards may not have been expanded yet.

Could be it...
 
  
   [['erik', '...', '...', '...', 'enge']]
 
 Where do you see this?

I can't reproduce it right now, I'll let you know if I see it again.
 
 Make sure you are using a globbing vocab.  Note that you can't change
 a catalog's vocabulary once the catalog is made, so you have to make a
 new catalog.

How can I change it for a new one?


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-24 Thread Casey Duncan

Erik Enge wrote:
 
 Hi,
 
 is it me, or is this just not working:
 
 (word1 or word*) and (wor?3)
 
 ie. wildcards in TextIndex queries.  I can't seem to make it work, and I'm
 not able to track down where it stops working.  Should it work in the
 first place?
 
 Zope 2.3.2
 
 Thanks.
 

Works great for me. Perhaps you are using a Vocabulary that has Globbing
turned off?

-- 
| Casey Duncan
| Kaivo, Inc.
| [EMAIL PROTECTED]
`--

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-24 Thread Erik Enge

On Thu, 24 May 2001, Casey Duncan wrote:

 Works great for me. Perhaps you are using a Vocabulary that has
 Globbing turned off?

I'm not sure, how do I check?

This query works:

wil?car*

This doesn't:

(wil?car* or something else) and (word1 and word2)

I can't see that the query-parsers in UnTextIndex.py transforms them
differently, but I might be missing something obvious.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-24 Thread Casey Duncan

Erik Enge wrote:
 
 On Thu, 24 May 2001, Casey Duncan wrote:
 
  Works great for me. Perhaps you are using a Vocabulary that has
  Globbing turned off?
 
 I'm not sure, how do I check?
 
 This query works:
 
 wil?car*
 
 This doesn't:
 
 (wil?car* or something else) and (word1 and word2)

I'm not sure how well grouping with parens is supported right now. I
know phrase matching isn't supported very well.

 
 I can't see that the query-parsers in UnTextIndex.py transforms them
 differently, but I might be missing something obvious.

-- 
| Casey Duncan
| Kaivo, Inc.
| [EMAIL PROTECTED]
`--

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-24 Thread Michel Pelletier

On Thu, 24 May 2001, Erik Enge wrote:

 This query works:
 
   wil?car*
 
 This doesn't:
 
   (wil?car* or something else) and (word1 and word2)

If the first works, then you are using a globbing vocabulary.  The second
one should work, but maybe there is a bug.  Or perhaps your search
criteria is so strict that you are getting no results.

 I can't see that the query-parsers in UnTextIndex.py transforms them
 differently, but I might be missing something obvious.

There's _nothing_ obvious in that particular chunk of code.

-Michel


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-24 Thread Christian Robottom Reis

On Thu, 24 May 2001, Erik Enge wrote:

 Good, then it's just not me.  Is the overall design philosophy for
 ZCatalog/Catalog/SearchIndex documented anywhere?  (By the way, from
 lib/python/SearchIndex/TextIndex.py, what is sws and cv3?)

I'm trying to get a knot of knowledge into my head by studying the
SearchIndex modules. I'm writing more or less what's happening out on
http://wiki.async.com.br/index.php?SearchIndex but I'm not moving so fast,
as the code isn't too trivial.

If you want to stop by and help, by all means. :-)

Take care,
--
/\/\ Christian Reis, Senior Engineer, Async Open Source, Brazil
~\/~ http://async.com.br/~kiko/ | [+55 16] 274 4311


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Wildcards in TextIndex query. Do they work?

2001-05-24 Thread Michel Pelletier

On Thu, 24 May 2001, Erik Enge wrote:

 On Thu, 24 May 2001, Michel Pelletier wrote:
 
  If the first works, then you are using a globbing vocabulary.  The
  second one should work, but maybe there is a bug.  Or perhaps your
  search criteria is so strict that you are getting no results.
 
 Hm.  Something isn't right here.

I don't think you are using a globbing vocabulary.

 This:
   
eric
 
 got me 70 hits.
 
 This:
 
eri?
 
 got me 4 hits.

If you are not using a glob vocab, I suspect it stripped out the ? and is
hitting on 'eri'.  Do you have that word anywhere?

 That's a bit strange, if you ask me :)
 
 This:
 
(erik) and (enge)
 
 returned 1 hit
 
 This:
 
(erik) and (eng?)
 
 gave me none.

Which could make sense if you were not using a glob vocab.

 The first one looked like this after the parsers had nibbled on it:
 
   [['erik'], 'and', ['enge']]
 
 And the latter one:
 
   [['erik'], 'and', ['eng?']]

This one should look like [['erik'], 'and', ['enge', 'engs', 'engf',
...]] and match all the words that match the pattern eng?.  If this isn't
being expanded, then you are not using a globbing vocabulary.

Then again, where did you get these objects?  If you were looking at the
wrong point in the code, the wildcards may not have been expanded yet.

 
  [['erik', '...', '...', '...', 'enge']]

Where do you see this?

 Where should I look next to figure out what's going on?

Make sure you are using a globbing vocab.  Note that you can't change a
catalog's vocabulary once the catalog is made, so you have to make a new
catalog.

   I can't see that the query-parsers in UnTextIndex.py transforms them
   differently, but I might be missing something obvious.
  
  There's _nothing_ obvious in that particular chunk of code.
 
 Good, then it's just not me.  Is the overall design philosophy for
 ZCatalog/Catalog/SearchIndex documented anywhere?

The catalog has evolved over the past four years.  Most of the text index
query parser code was written by someone long gone from this company, and
certainly way before my time.  The catalog is, in fact, the evolution of a
completely different product called ZTables, now long dead in the annals
of Principia history. 

This person did not document their design, so the answer is no.  I had
some UML models once, but my modeling tool ate them.

  (By the way, from
 lib/python/SearchIndex/TextIndex.py, what is sws and cv3?)

Very old consulting projects, long dead.

-Michel


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )