Hi Sam, all,
Apologies if already known, but Sam today stumbled upon the following
query and got zero results
037:hep-ph/0105155 OR 037:astro-ph/0104076
It is because Invenio uses slashes as brackets meaning regex query,
unfortunately this usage is ambiguous, because '/' character can be a
part of the normal token
So the query is wrongly tokenized
['+', 'hep-ph/0105155 OR 037:astro-ph/0104076', '037', 'a']
Using parenthesis doesn't help
037:"hep-ph/0105155" OR "037:astro-ph/0104076"
because the pattern is considering anything inside '/ /'
re_pattern_regexp_quotes = re.compile("\/(.*?)\/")
It would be possible to use negative lookbehind and escaping, but that
requires two operations (change regex, replace escape)
In [78]: re.compile('(?<!\\\\)\/(.*?)(?<!\\\\)\/').findall('037:hep/123
OR 037:exp/567')
Out[78]: ['123 OR 037:exp']
In [79]: re.compile('(?<!\\\\)\/(.*?)(?<!\\\\)\/').findall('037:hep\/123
OR 037:exp\/567')
Out[79]: []
I guess it is a harder problem.
Roman
On Thu, Mar 1, 2012 at 1:51 PM, Carli Samuele <[email protected]> wrote:
> '(037:hep-th/0112017) | (037:hep-th/0112020)'
> --
> |--
> | Samuele Carli
> |--
> | Contacts:
> |
> | Home page : www.csspace.net
> | E-mail : carlisamuele _at_ csspace.net
> | Icq : 60401601
> | MSN : [email protected] (no e-mails here!)
> | Skype : wohthan
> | jabber/gtalk: [email protected]
> |--