-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just to shoot in here, what about introducing precedence in this case?
By considering always the index (or marc field) references first, thus
splitting up the query before doing the regular expression to look for
regular expressions.

Of course, we are still stuck when:
hep-ph/0105155 OR astro-ph/0104076

But precedence is also our friend here, by having the space or "OR"
splitting the query before doing anything more. However, this could be
ambiguous WRT having OR or SPACE inside the regular expression. We
could then consider both and choose the one giving back most
specialised results and/or inform the user about this ambiguity.

Cheers,
Jan

On 03/01/2012 04:19 PM, Roman Chyla wrote:
> Hi Sam, all,
> 
> Apologies if already known, but Sam today stumbled upon the
> following query and got zero results
> 
> 037:hep-ph/0105155 OR 037:astro-ph/0104076
> 
> 
> It is because Invenio uses slashes as brackets meaning regex
> query, unfortunately this usage is ambiguous, because '/' character
> can be a part of the normal token
> 
> So the query is wrongly tokenized
> 
> ['+', 'hep-ph/0105155 OR 037:astro-ph/0104076', '037', 'a']
> 
> Using parenthesis doesn't help
> 
> 037:"hep-ph/0105155" OR "037:astro-ph/0104076"
> 
> because the pattern is considering anything inside '/ /'
> 
> re_pattern_regexp_quotes = re.compile("\/(.*?)\/")
> 
> It would be possible to use negative lookbehind and escaping, but
> that requires two operations (change regex, replace escape)
> 
> In [78]:
> re.compile('(?<!\\\\)\/(.*?)(?<!\\\\)\/').findall('037:hep/123 OR
> 037:exp/567') Out[78]: ['123 OR 037:exp']
> 
> In [79]:
> re.compile('(?<!\\\\)\/(.*?)(?<!\\\\)\/').findall('037:hep\/123 OR
> 037:exp\/567') Out[79]: []
> 
> I guess it is a harder problem.
> 
> Roman
> 
> 
> On Thu, Mar 1, 2012 at 1:51 PM, Carli Samuele
> <[email protected]> wrote:
>> '(037:hep-th/0112017) | (037:hep-th/0112020)' -- |-- | Samuele
>> Carli |-- | Contacts: | |       Home page   : www.csspace.net |
>> E-mail      : carlisamuele _at_ csspace.net |       Icq         :
>> 60401601 |       MSN         : [email protected] (no e-mails
>> here!) |       Skype       : wohthan |       jabber/gtalk:
>> [email protected] |--


- -- 
- --
Jan Åge Lavik

CERN System Librarian
GS-SIS

Office: 3-1-014
Mailbox: C27800
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPT6h8AAoJEC02y7lWYDZkvzIH/iMRc/BvvV++bWCliic5xlku
iSN/hu8kR0/lMyPaHu1yAjHU3vgJf/D3pidDzsjAnPD074cDT0dA8v0U7WATk0or
H/adojaVwtjWhZj+ZZpwU1vo0lfkfJa0loRhY+VImB6nUB+uj6v2S+AaVNv/+Czn
uDQTyRA0PIVChZy7TsKpUVI3cLCnDT0ZFo4qhWAzo7C/MTHCaaLal2Md+pIpdjXB
xIpqB5f9JgWyaY8G1eEfdj7vp6+EtXWmc9erIxMzuK6XzZTILAfPpY37MbqC94f/
AlFXZc52X0MbXgpfLEwn1uLtNAcj5Uo5kqU4n3rCmHp/4M0OAIzX8yOa4MbyqgI=
=/i6v
-----END PGP SIGNATURE-----

Reply via email to