It is difficult for me to understand your code (or maybe it is too late and I 
should go to bed).
For example: What are you doing in line 167?
match.group(u'book') matches on that (see method comment): 
...(?!\s)(?P<book>[\d]*[^\d]+)(?<!\s)...
Means from the first non-whitespace character until the last non-whitespace 
character at the bookname end. It may contain digits, but only at the beginning.
Therefore your regexp makes no sense if I see it right.
167     +            regex_book = re.compile(u'^[1-4]?[\. ]{0,2}%s' % 
book.lower(),

I guess, what you want to do is something like this: Change line 232
- u'^\s*(?!\s)(?P<book>[\d]*[^\d]+)(?<!\s)\s*'
- u'^\s*(?!\s)(?P<book>\(?P<booknum>[\d])*\s*\(?P<bookbase>[^\d])+)(?<!\s)\s*'
Then you can make your new regexp:
regex_book = re.compile(u'^%s\.? ?%s' % (match.group(u'booknum'), 
match.group(u'bookbase')), re.UNICODE | re.IGNORECASE)

I've seen right, that you add the strings to the completer? Well, in that case: 
Why do you want to make any soft decission? The problem with this is, that you 
will produce a behaviour, which is intransparent for the user. The book name 
recognition I've written is such general, because internationalized names are 
very different. If you put some meaning on the dot, because we have it in 
German, it might contraproductive for other languages. With your current 
algorithm users might write regular expressions and the code is interpreting it.

My suggestion:
Make hard decission.
If you want to make it the best way possible:
Take the match.group(u'book') string, and escape all reserved characters:
# escape reserved characters
        for character in u'\\.^$*+?{}[]()':
            bookname = bookname.replace(character, u'\\' + character)
Then you can replace all whitespaces by a abriatary number of whitespaces and 
make a caseinsensitivematch:
re.compile(u'\s*%s\s*' % u'\s*'.join(bookname.split()), re.UNICODE | 
re.IGNORECASE)
Use regex.match() and not regexp.search(). Otherwhise John would be found in 
the Epistle of John as well. (And users will enter two-letter shortcuts and 
find something completely different)
-- 
https://code.launchpad.net/~orangeshirt/openlp/bibles/+merge/95805
Your team OpenLP Core is subscribed to branch lp:openlp.

_______________________________________________
Mailing list: https://launchpad.net/~openlp-core
Post to     : openlp-core@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openlp-core
More help   : https://help.launchpad.net/ListHelp

Reply via email to