Re: [Zope-dev] ZCatalog text index search bugs?

2000-06-13 Thread Michel Pelletier

Dieter Maurer wrote:
> 
> R. David Murray writes:
>  > I am very confused.
>  >
>  >
>  >   The TextIndex search does an AND, not an OR, of the search
>  >   words: if you ask it to find "foo bar", it returns only
>  >   objects matching *both* "foo" and "bar", rather than object
>  >   matching *either* "foo" or "bar" (which Jason expected).
> This is definitely not the case!
> 
>  > Indeed, if you do a search that includes a word that is not on an
>  > item, the item is not returned.  So how is that working?
> That is a bug I discovered and analysed yesterday.
>   The index lookup is done by "index[word]". This raises
>   a KeyError exception, if "word" is not in the index. The exception
>   aborts the search; it returns without hit.
>   The behaviour is correct for "and" but of cause
>   wrong for "or".

This is fixed in 2.2b1, whe index not finding the word should not raise
an exception anymore and the search should not abort.  Can you confirm
that when you get a chance?
 
-- 

-Michel Pelletier

http://www.zope.org/Members/michel/MyWiki

Visit WikiCentral for the latest Zen:

http://www.zope.org/Members/WikiCentral

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog text index search bugs?

2000-06-13 Thread Dieter Maurer

R. David Murray writes:
 > I am very confused.
 > 
 > I'm looking at the SearchIndex source under 2.1.4 (2.1.6 seems to be
 > the same).  In Lexicon.py the 'query' method defines the default_operator
 > to be 'or'.  I can't see that TextIndex overrides this when it calls
 > it.
 > 
 > But the response to PR 1141 (against 2.1.6) in the collector says:
 > 
 >   The TextIndex search does an AND, not an OR, of the search
 >   words: if you ask it to find "foo bar", it returns only
 >   objects matching *both* "foo" and "bar", rather than object
 >   matching *either* "foo" or "bar" (which Jason expected).
This is definitely not the case!

 > Indeed, if you do a search that includes a word that is not on an
 > item, the item is not returned.  So how is that working?
That is a bug I discovered and analysed yesterday.
  The index lookup is done by "index[word]". This raises
  a KeyError exception, if "word" is not in the index. The exception
  aborts the search; it returns without hit.
  The behaviour is correct for "and" but of cause
  wrong for "or".

After I had gotten several complaints about reporting bugs
already fixed in CVS, I checked out the CVS today.
The relevant code has undergone quite a few modifications.
I will take some time to see, if the bug remains.
Today, it is already too late. Maybe, I will see tomorrow.

 > 

 > So I think 'or' searching is broken, and that text indexes being
 > a default 'and' search is just an accident .
You are right!

 > 

 > (*) I recall reading that the 'near' operator, which is used if
 > the splitter breaks up a word in the search string, is not really
 > supported and that the 'and' operator is used instead.)
The "near" operator in 2.1.6 raises an exception. This means,
no hits.

I have posted a patch some days ago that fixes near searches,
except for objects that can not be stored in ZODB like
LocalFS objects. For such objects, it uses an "and" to
approximate "near".

 > 

 > If I can reproduce this in 2.2.0b I'll file it in the collector.
I will watch any posts from you.
It is not necessary that we do the same work.


Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




[Zope-dev] ZCatalog text index search bugs?

2000-06-12 Thread R. David Murray

I am very confused.

I'm looking at the SearchIndex source under 2.1.4 (2.1.6 seems to be
the same).  In Lexicon.py the 'query' method defines the default_operator
to be 'or'.  I can't see that TextIndex overrides this when it calls
it.

But the response to PR 1141 (against 2.1.6) in the collector says:

  The TextIndex search does an AND, not an OR, of the search
  words: if you ask it to find "foo bar", it returns only
  objects matching *both* "foo" and "bar", rather than object
  matching *either* "foo" or "bar" (which Jason expected).

Indeed, if you do a search that includes a word that is not on an
item, the item is not returned.  So how is that working?

A possible answer is:  if you do a search like 'something or
somethingelse', this *also* does not return the object if one of
those words is not on the object.  So is 'or' searching broken?

Note that if you do a search like "something or with", this returns
the object, "with" being a stop word.   So does "something with".
On the other hand, "something and with" does *not* return the
object.

So I think 'or' searching is broken, and that text indexes being
a default 'and' search is just an accident .

Following up on the 'something and with', though:  Since "with" is
a stop word, it can never be on the object.  Since the user entering
search words into the search form doesn't know what the list of
stop words is, this stikes me as broken behavior.  Anyone disagree?

I also have a problem with a word such as "T-shirt".  If I search
on "T-shirt", my object that has that word in its text index does
not show up.  The splitter should be breaking that into "t" and
"shirt", right?  Is the problem that single letters are discarded
by the Splitter, therefore T is like a stop word (but it isn't in
the stopword table), therefore the implicit 'and' search(*) fails?
To corroborate this, a search for "something t" finds that
record, but "something and t" does not.  This can't be the whole
answer, though, since searching on just 'shirt' does *not*
return the object.

(*) I recall reading that the 'near' operator, which is used if
the splitter breaks up a word in the search string, is not really
supported and that the 'and' operator is used instead.)

I can't tell yet if this bug is (these bugs are?) fixed in 2.2.0b1
since I can't see the source release yet.  Looking at the a1 source,
things have moved around a bit.  But I see that "default_operator"
is still set to 'or', so I suspect these bugs may remain...

If I can reproduce this in 2.2.0b I'll file it in the collector.

--RDM




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )