Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
Abel, many thanks for this analysis, I've put this into the Collector... On Sat, 23 Jun 2001 22:59:32 +0200 abel deuring [EMAIL PROTECTED] wrote: Erik, I'm afraid that your patch does not solve all the problems you mentioned in an earlier mail. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
Erik Enge wrote: On Wed, 30 May 2001, Erik Enge wrote: I'm going bug hunting... I'm back :) I think I found the bug. In lib/python/SearchIndex/GlobbingLexicon.py in the query_hook() method. It seems to say that: if I can't find a '*' or a '?' in the word, then go to else-clause, where the else-clause says sodd off. Since it iterates over the query, 'word' is actually a list if you use parens in your query, and you won't find any wildcards there. I think. Add a dash of recursiveness, and it seems to be solved (for me): def erik_hook(self, q): doc string words = [] for w in q: if ( (self.multi_wc in w) or (self.single_wc in w) ): wids = self.get(w) for wid in wids: if words: words.append(Or) words.append(wid) else: words.append(self.erik_hook(w)) return words or [''] def query_hook(self, q): expand wildcards words = [] for w in q: if ( (self.multi_wc in w) or (self.single_wc in w) ): wids = self.get(w) for wid in wids: if words: words.append(Or) words.append(wid) else: words.append(self.erik_hook(w)) Not really tested, but it seems to work. This might have been resolved in CVS, I don't know, should I post it as a bug? Erik, I'm afraid that your patch does not solve all the problems you mentioned in an earlier mail. You are right that the implementation of query_hook in Zope 2.3.2 and 2.4.0b1 cannot handle words with wildcards in nested lists, but your patch will lead to endless recursion, if you enter the most simple query: just one word without wildcards. In this case, if ( (self.multi_wc in w)... evaluates to false, hence self.erik_hook is call for this word, where if ( (self.multi_wc in w)... is again false, and erik_hook is called again... The statement q = parse(s) in UnTextIndex.query (and PositionIndex.query) before the call to query_hook can return nested lists, so query_hook must be aware of this. This can be done with: def query_hook(self, q): expand wildcards words = [] for w in q: if type(w) is type([]): words.append(self.query_hook(w)) else: if ( (self.multi_wc in w) or (self.single_wc in w) ): wids = self.get(w) for wid in wids: if words: words.append(Or) words.append(wid) else: words.append(w) # if words is empty, return something that will make textindex's # __getitem__ return an empty result list return words or [''] You also mentioned the strange results of queries like eri* and enge. These are caused by another bug in query_hook: The results from the wildcard expansion are simply inserted into the result list. Example: ab* and xyz may be expended by query_hook into ['aba', 'or', 'abb', 'or', 'abc', 'and', 'xyz'] Since UnTextIndex.evaluate looks first for 'and' operators, this is eqivalent to ['aba', 'or', 'abb', 'or', ['abc', 'and', 'xyz']] (The funny (or confusing) side effect is that ab* and xyz may return different results compared with xyz and ab*, because aba and xyz probably gives results different from those for abc and xyz.) but we need a result like [['aba', 'or', 'abb', 'or', 'abc'], 'and', 'xyz'] This version of query_hook below fixes the problem: def query_hook(self, q): expand wildcards words = [] for w in q: if type(w) is type(''): if ( (self.multi_wc in w) or (self.single_wc in w) ): wids = self.get(w) alternatives = [] for wid in wids: if alternatives: alternatives.append(Or) alternatives.append(wid) words.append(alternatives or ['']) else: words.append(w) else: words.append(self.query_hook(w)) # if words is empty, return something that will make textindex's # __getitem__ return an empty result list return words or [''] You also mentioned the parse result ['abc', '...', '...', '...', 'def'] which you could not reproduce. Playing with Catalogs, I accidentally produced the corresponding query: 'abc ... def', i.e., the double quotation marks are part of the query string. UnTextIndex.quotes splits the string between two quotation marks
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
Thanks for tracking this down... If you're so inclined, please put this in the Collector (with a description of the problem, as well as a way to reproduce it, the patch alone isn't nearly as helpful) so it doesn't get dropped on the floor. I doubt very much that it's fixed in CVS. - C Erik Enge wrote: On Wed, 30 May 2001, Erik Enge wrote: I'm going bug hunting... I'm back :) I think I found the bug. In lib/python/SearchIndex/GlobbingLexicon.py in the query_hook() method. It seems to say that: if I can't find a '*' or a '?' in the word, then go to else-clause, where the else-clause says sodd off. Since it iterates over the query, 'word' is actually a list if you use parens in your query, and you won't find any wildcards there. I think. Add a dash of recursiveness, and it seems to be solved (for me): def erik_hook(self, q): doc string words = [] for w in q: if ( (self.multi_wc in w) or (self.single_wc in w) ): wids = self.get(w) for wid in wids: if words: words.append(Or) words.append(wid) else: words.append(self.erik_hook(w)) return words or [''] def query_hook(self, q): expand wildcards words = [] for w in q: if ( (self.multi_wc in w) or (self.single_wc in w) ): wids = self.get(w) for wid in wids: if words: words.append(Or) words.append(wid) else: words.append(self.erik_hook(w)) Not really tested, but it seems to work. This might have been resolved in CVS, I don't know, should I post it as a bug? ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
On Thu, 24 May 2001, Michel Pelletier wrote: I don't think you are using a globbing vocabulary. I think I am: print_info(applic.Catalog(word='scripto*')) unsplitted ['scripto*'] unl: ['scripto*'] unq: [104623, 'or', 112198, 'or', 151568] Length: 6 Content: [mybrains instance at 1226d358, mybrains instance at 127bb540, mybrains instance at 12bd8138, mybrains instance at 127bb658, mybrains instance at 1226c620, mybrains instance at 12092eb0] print_info(applic.Catalog(word='(scripto*)')) unsplitted [] unsplitted ['scripto*'] unl: ['scripto*'] unsplitted [] unl: [['scripto*']] unq: ['scripto*'] unq: [['scripto*']] Length: 0 Content: [] the unsplitted, unl and unq are my debug flags, but you can see what happens: without parens the '*' has it's desired effect, with, it doesn't. Got a clue? Is this my bug, or ZCatalog's? ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
On Tue, 29 May 2001, Erik Enge wrote: On Thu, 24 May 2001, Michel Pelletier wrote: the unsplitted, unl and unq are my debug flags, but you can see what happens: without parens the '*' has it's desired effect, with, it doesn't. Got a clue? Is this my bug, or ZCatalog's? Must be ZCatalog's. I'm guessing the paren matching takes a different code path that doesn't expand wildcards. -Michel ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
On Thu, 24 May 2001, Michel Pelletier wrote: I don't think you are using a globbing vocabulary. But globbing works for other queries. In the same catalog. If you are not using a glob vocab, I suspect it stripped out the ? and is hitting on 'eri'. Do you have that word anywhere? I tried searching for: eri and that gave me four results. No globbing, then? Then again, where did you get these objects? If you were looking at the wrong point in the code, the wildcards may not have been expanded yet. Could be it... [['erik', '...', '...', '...', 'enge']] Where do you see this? I can't reproduce it right now, I'll let you know if I see it again. Make sure you are using a globbing vocab. Note that you can't change a catalog's vocabulary once the catalog is made, so you have to make a new catalog. How can I change it for a new one? ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
Erik Enge wrote: Hi, is it me, or is this just not working: (word1 or word*) and (wor?3) ie. wildcards in TextIndex queries. I can't seem to make it work, and I'm not able to track down where it stops working. Should it work in the first place? Zope 2.3.2 Thanks. Works great for me. Perhaps you are using a Vocabulary that has Globbing turned off? -- | Casey Duncan | Kaivo, Inc. | [EMAIL PROTECTED] `-- ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
On Thu, 24 May 2001, Casey Duncan wrote: Works great for me. Perhaps you are using a Vocabulary that has Globbing turned off? I'm not sure, how do I check? This query works: wil?car* This doesn't: (wil?car* or something else) and (word1 and word2) I can't see that the query-parsers in UnTextIndex.py transforms them differently, but I might be missing something obvious. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
Erik Enge wrote: On Thu, 24 May 2001, Casey Duncan wrote: Works great for me. Perhaps you are using a Vocabulary that has Globbing turned off? I'm not sure, how do I check? This query works: wil?car* This doesn't: (wil?car* or something else) and (word1 and word2) I'm not sure how well grouping with parens is supported right now. I know phrase matching isn't supported very well. I can't see that the query-parsers in UnTextIndex.py transforms them differently, but I might be missing something obvious. -- | Casey Duncan | Kaivo, Inc. | [EMAIL PROTECTED] `-- ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
On Thu, 24 May 2001, Erik Enge wrote: This query works: wil?car* This doesn't: (wil?car* or something else) and (word1 and word2) If the first works, then you are using a globbing vocabulary. The second one should work, but maybe there is a bug. Or perhaps your search criteria is so strict that you are getting no results. I can't see that the query-parsers in UnTextIndex.py transforms them differently, but I might be missing something obvious. There's _nothing_ obvious in that particular chunk of code. -Michel ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
On Thu, 24 May 2001, Erik Enge wrote: Good, then it's just not me. Is the overall design philosophy for ZCatalog/Catalog/SearchIndex documented anywhere? (By the way, from lib/python/SearchIndex/TextIndex.py, what is sws and cv3?) I'm trying to get a knot of knowledge into my head by studying the SearchIndex modules. I'm writing more or less what's happening out on http://wiki.async.com.br/index.php?SearchIndex but I'm not moving so fast, as the code isn't too trivial. If you want to stop by and help, by all means. :-) Take care, -- /\/\ Christian Reis, Senior Engineer, Async Open Source, Brazil ~\/~ http://async.com.br/~kiko/ | [+55 16] 274 4311 ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Wildcards in TextIndex query. Do they work?
On Thu, 24 May 2001, Erik Enge wrote: On Thu, 24 May 2001, Michel Pelletier wrote: If the first works, then you are using a globbing vocabulary. The second one should work, but maybe there is a bug. Or perhaps your search criteria is so strict that you are getting no results. Hm. Something isn't right here. I don't think you are using a globbing vocabulary. This: eric got me 70 hits. This: eri? got me 4 hits. If you are not using a glob vocab, I suspect it stripped out the ? and is hitting on 'eri'. Do you have that word anywhere? That's a bit strange, if you ask me :) This: (erik) and (enge) returned 1 hit This: (erik) and (eng?) gave me none. Which could make sense if you were not using a glob vocab. The first one looked like this after the parsers had nibbled on it: [['erik'], 'and', ['enge']] And the latter one: [['erik'], 'and', ['eng?']] This one should look like [['erik'], 'and', ['enge', 'engs', 'engf', ...]] and match all the words that match the pattern eng?. If this isn't being expanded, then you are not using a globbing vocabulary. Then again, where did you get these objects? If you were looking at the wrong point in the code, the wildcards may not have been expanded yet. [['erik', '...', '...', '...', 'enge']] Where do you see this? Where should I look next to figure out what's going on? Make sure you are using a globbing vocab. Note that you can't change a catalog's vocabulary once the catalog is made, so you have to make a new catalog. I can't see that the query-parsers in UnTextIndex.py transforms them differently, but I might be missing something obvious. There's _nothing_ obvious in that particular chunk of code. Good, then it's just not me. Is the overall design philosophy for ZCatalog/Catalog/SearchIndex documented anywhere? The catalog has evolved over the past four years. Most of the text index query parser code was written by someone long gone from this company, and certainly way before my time. The catalog is, in fact, the evolution of a completely different product called ZTables, now long dead in the annals of Principia history. This person did not document their design, so the answer is no. I had some UML models once, but my modeling tool ate them. (By the way, from lib/python/SearchIndex/TextIndex.py, what is sws and cv3?) Very old consulting projects, long dead. -Michel ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )