On Jul 31, 3:56 pm, Mensanator <[EMAIL PROTECTED]> wrote: > On Jul 31, 3:07 pm, [EMAIL PROTECTED] wrote: > > > > > > > I am using regular expressions to search a string (always full > > sentences, maybe more than one sentence) for common abbreviations and > > remove the periods. I need to break the string into different > > sentences but split('.') doesn't solve the whole problem because of > > possible periods in the middle of a sentence. > > > So I have... > > > ---------------- > > > import re > > > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.') > > > # this will find abbreviations like e.g. or i.e. in the middle of a > > sentence. > > # then I want to remove the periods. > > > ---------------- > > > I want to keep the ie or eg but just take out the periods. Any > > ideas? Of course newString = middle_abbr.sub('',txt) where txt is the > > string will take out the entire abbreviation with the alphanumeric > > characters included. > >>> middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.') > >>> s = 'A test, i.e., an example.' > >>> a = middle_abbr.search(s) # find the abbreviation > >>> b = re.compile('\.') # period pattern > >>> c = b.sub('',a.group(0)) # remove periods from abbreviation > >>> d = middle_abbr.sub(c,s) # substitute new abbr for old > >>> d > > 'A test, ie, an example.'
A more versatile version: import re middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.') s = 'A test, i.e., an example.' a = middle_abbr.search(s) # find the abbreviation b = re.compile('\.') # period pattern c = b.sub('',a.group(0)) # remove periods from abbreviation d = middle_abbr.sub(c,s) # substitute new abbr for old print d print print s = """A test, i.e., an example. Yet another test, i.e., example with 2 abbr.""" a = middle_abbr.search(s) # find the abbreviation c = b.sub('',a.group(0)) # remove periods from abbreviation d = middle_abbr.sub(c,s) # substitute new abbr for old print d print print s = """A test, i.e., an example. Yet another test, i.e., example with 2 abbr. A multi-test, e.g., one with different abbr.""" done = False while not done: a = middle_abbr.search(s) # find the abbreviation if a: c = b.sub('',a.group(0)) # remove periods from abbreviation s = middle_abbr.sub(c,s,1) # substitute new abbr for old ONCE else: # repeat until all removed done = True print s ## A test, ie, an example. ## ## ## A test, ie, an example. ## Yet another test, ie, example with 2 abbr.' ## ## ## A test, ie, an example. ## Yet another test, ie, example with 2 abbr. ## A multi-test, eg, one with different abbr. -- http://mail.python.org/mailman/listinfo/python-list