On Wed, 26 Feb 2014, Alexander Wagner wrote:
> http://invenio-software.org/ticket/131

There are some more tickets that are open in this regard, notably:

   http://invenio-software.org/ticket/453

>        041__a:"eng"
>
> vs.
>
>        (041__a:"eng")

Note that CDS also emits the following warning in the 2nd case:

   No exact match found for (041__a:"eng"), using 041 a: eng instead...

This substitute query is wrongly guessed, which leads to wrong results.

The troubles stem from the following.  There are physics terms such as
'SU(1)' that we don't want to interpret as a parenthesised search, but
rather do literal match.  Upon seeing '(041__a:"eng")', the system
interprets it similarly, i.e. not as a "composed search", but as a "math
search", so to speak.  This is mostly because there is no blank within
parenthesised expression.  Adding something tautological to create a
Boolean expression would overcome this interpretation, for example:

  (041__a:"eng" eng)

would return the same number of hits as 041__a:"eng".

In summary, the best way to use parentheses in order to express
"composed searches" is not to use parentheses around "singletons", but
always around "Boolean expressions", e.g. things containing at least
some white space.

> (ind:"val1" and ind:"val2") and ((ind:"val3" or ind:"val4") or
> ind:"val5")

This use is perfectly OK.

> I fear there's still a bug in the in bracket handling.

Yes, e.g. see the above ticket #453.

We may try to improve parenthesised expression check for word boundaries
in order to behave more properly for queries like "(xy:zzy)", e.g. to
give preference to "composed search" interpretation.  Though there are
situations like "(p,q)" where one wants to retain "math search"
interpretation we are favouring now...

Best regards
-- 
Tibor Simko

Reply via email to