Alex Feddor wrote:
Hi

I am looking for method enables advanced text string search. Method string.find() or re module seems no supporting what I am looking for. The idea is as follows:

Text ="FDA meeting was successful. New drug is approved for whole sale distribution!" I would like to scan the text using AND and OR operators and gets -1 or other value if the searching elements haven't found in the text.

Example 01:
search criteria:  "FDA" AND ( "approve*" OR "supported")
The catch is that in Text variable FDA and approve words are not one after another (other words are in between).

Bring on your hardest searches...

class Pattern(object): pass

class Logical(Pattern):
    def __init__(self, pat1, pat2):
        self.pat1 = pat1
        self.pat2 = pat2
    def __call__(self, text):
        a, b = self.pat1(text), self.pat2(text)
        if self.op(a != len(text), b != len(text)):
            return min((a, b))
        return len(text)
    def __str__(self):
        return '(%s %s %s)' % (self.pat1, self.op_name, self.pat2)

class P(Pattern):
    def __init__(self, pat):
        self.pat = pat
    def __call__(self, text):
        ret = text.find(self.pat)
        return ret if ret != -1 else len(text)
    def __str__(self):
        return '"%s"' % self.pat

class NOT(Pattern):
    def __init__(self, pat):
        self.op_name = 'NOT'
        self.pat = pat
    def __call__(self, text):
        ret = self.pat(text)
        return ret - 1 if ret == len(text) else len(text)
    def __str__(self):
        return '%s (%s)' % (self.op_name, self.pat)

class XOR(Logical):
    def __init__(self, pat1, pat2):
        self.op_name = 'XOR'
        self.op = lambda a, b: not(a and b) and (a or b)
        super().__init__(pat1, pat2)

class OR(Logical):
    def __init__(self, pat1, pat2):
        self.op_name = 'OR'
        self.op = lambda a, b: a or b
        super().__init__(pat1, pat2)

class AND(Logical):
    def __init__(self, pat1, pat2):
        self.op_name = 'AND'
        self.op = lambda a, b: a and b
        super().__init__(pat1, pat2)

class Suite(object):
    def __init__(self, pat):
        self.pat = pat
    def __call__(self, text):
        ret = self.pat(text)
        return ret if ret != len(text) else -1
    def __str__(self):
        return '[%s]' % self.pat

pat1 = P('FDA')
pat2 = P('approve*')
pat3 = P('supported')
p = Suite(AND(pat1, OR(pat2, pat3)))
print(p(''))
print(p('FDA'))
print(p('FDA supported'))
print(p('supported FDA'))
print(p('blah FDA bloh supported blih'))
print(p('blah FDA bleh supported bloh supported blih '))
p = Suite(AND(OR(pat1, pat2), XOR(pat2, NOT(pat3))))
print(p)
print(p(''))
print(p('FDA'))
print(p('FDA supported'))
print(p('supported sdc FDA sd'))
print(p('blah blih FDA bluh'))
print(p('blah blif supported blog'))

#################

I guess I went a bit overboard here (had too much time on hand), the working is based on function composition, so instead of evaluation, you composes a function (or more accurately, a callable class) that will evaluate the logical value and return the index of the first item that matches the logical expression. It currently uses str's builtin find, but I guess it wouldn't be very hard to adapt it to use the re myfind() below (only P class will need to change)

The Suite class is only there to turn the NotFound sentinel from len(text) to -1 (used len(text) since it simplifies the code a lot...)

Caveat: The NOT class cannot reliably convert a False to True because I don't know what index number to use.

Code written for efficient vertical space, not the most readable in the world.

No guarantee no bug.

Idea:
Overrides the operator on Pattern class so we could write it like: P("Hello") & P("World") instead of AND(P("Hello"), P("World"))

Example 02:
search criteria: "Ben"
The catch is that code sould find only exact Ben words not also words which that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben is the right word we are looking for.

The second one was easier...

import re
def myfind(pattern, text):
    pattern = r'(.*?)\b(%s)\b(.*)' % pattern
    m = re.match(pattern, text)
    if m:
        return len(m.group(1))

textfound = 'This is a Ben test string'
texttrick = 'This is a Benquick Benseek McBen QuickBenSeek string'
textnotfound = 'He is away'
textmulti = 'Our Ben found another Ben which is quite odd'
pat = 'Ben'
print(myfind(pat, textfound))    # 10
print(myfind(pat, texttrick))    # None
print(myfind(pat, textnotfound)) # None
print(myfind(pat, textmulti))    # 4

if you only want to test for existence, simply:

pattern = 'Ben'
if re.match(r'(.*?)\b(%s)\b(.*)' % pattern, text):
    pass

I would really appreciated your advice - code sample / links how above can be achieved! if possible I would appreciated solution achieved with free of charge module.

Standard library is free of charge, no?

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to