Hi Sam, all,

Apologies if already known, but Sam today stumbled upon the following
query and got zero results

037:hep-ph/0105155 OR 037:astro-ph/0104076


It is because Invenio uses slashes as brackets meaning regex query,
unfortunately this usage is ambiguous, because '/' character can be a
part of the normal token

So the query is wrongly tokenized

['+', 'hep-ph/0105155 OR 037:astro-ph/0104076', '037', 'a']

Using parenthesis doesn't help

037:"hep-ph/0105155" OR "037:astro-ph/0104076"

because the pattern is considering anything inside '/ /'

re_pattern_regexp_quotes = re.compile("\/(.*?)\/")

It would be possible to use negative lookbehind and escaping, but that
requires two operations (change regex, replace escape)

In [78]: re.compile('(?<!\\\\)\/(.*?)(?<!\\\\)\/').findall('037:hep/123
OR 037:exp/567')
Out[78]: ['123 OR 037:exp']

In [79]: re.compile('(?<!\\\\)\/(.*?)(?<!\\\\)\/').findall('037:hep\/123
OR 037:exp\/567')
Out[79]: []

I guess it is a harder problem.

Roman


On Thu, Mar 1, 2012 at 1:51 PM, Carli Samuele <[email protected]> wrote:
> '(037:hep-th/0112017) | (037:hep-th/0112020)'
> --
> |--
> | Samuele Carli
> |--
> | Contacts:
> |
> |       Home page   : www.csspace.net
> |       E-mail      : carlisamuele _at_ csspace.net
> |       Icq         : 60401601
> |       MSN         : [email protected] (no e-mails here!)
> |       Skype       : wohthan
> |       jabber/gtalk: [email protected]
> |--

Reply via email to