Anthra Norell wrote:
superpollo wrote:
hi.

what is the most pythonic way to substitute substrings?

eg: i want to apply:

foo --> bar
baz --> quux
quuux --> foo

so that:

fooxxxbazyyyquuux --> barxxxquuxyyyfoo

bye

So it goes. The more it matters, the sillier the misatakes. The method __init__ () was incomplete and __call__ () was missing, Sorry abount that. Here the whole thing again:


class Translator: r""" Will translate any number of targets, handling them correctly if some overlap.

      Making Translator
          T = Translator (definitions, [eat = 1])
'definitions' is a sequence of pairs: ((target, substitute),(t2, s2), ...) 'eat' says whether untargeted sections pass (translator) or are skipped (extractor).
              Makes a translator by default (eat = False)
T.eat is an instance attribute that can be changed at any time. Definitions example: (('a','A'),('b','B'),('ab','ab'),('abc','xyz') # ('ab','ab') see Tricks.
          ('\x0c', 'page break'), ('\r\n','\n'), ('   ','\t'))
Order doesn't matter. Running
          translation = T (source)

      Tricks
          Deletion:  ('target', '')
Exception: (('\n',''), ('\n\n','\n\n')) # Eat LF except paragraph breaks. Exception: (('\n', '\r\n'), ('\r\n',\r\n')) # Unix to DOS, would leave DOS unchanged
          Translation cascade:
# Rejoin text lines per paragraph Unix or DOS, inserting inter-word space if missing Mark_LF = Translator ((('\n','+LF+'),('\r\n','+LF+'),('\r\n\r\n','\r\n\r\n'),('\n\n','\n\n'))) # Pick positively identifiable mark for Unix and DOS end of lines Single_Space_Mark = Translator (((' +LF+', ' '),('+LF+', ' '),('-+LF+', '')))
              no_lf_text = Single_Space_Mark (Mark_LF (text))
          Translation cascade:
              # Nesting calls
              reptiles = T_latin_english (T_german_latin (reptilien))

      Limitations
1. The number of substitutions and the maximum size of input depends on the respective
              capabilities of the Python re module.
          2. Regular expressions will not work as such.

      Author:
          Frederic Rentsch (anthra.nor...@bluewin.ch).
              """

   def __init__ (self, definitions, eat = 0):

       '''
definitions: a sequence of pairs of strings. ((target, substitute), (t, s), ...) eat: False (0) means translate: unaffected data passes unaltered. True (1) means extract: unaffected data doesn't pass (gets eaten). Extraction filters typically require substitutes to end with some separator,
                else they fuse together. (E.g. ' ', '\t' or '\n')
           'eat' is an attribute that can be switched anytime.

''' self.eat = eat
       self.compile_sequence_of_pairs (definitions)
def compile_sequence_of_pairs (self, definitions):

       '''
           Argument 'definitions' is a sequence of pairs:
           (('target 1', 'substitute 1'), ('t2', 's2'), ...)
Order doesn't matter.
       '''
import re
       self.definitions = definitions
       targets, substitutes = zip (*definitions)
       re_targets = [re.escape (item) for item in targets]
       re_targets.sort (reverse = True)
self.targets_set = set (targets) self.table = dict (definitions)
       regex_string = '|'.join (re_targets)
       self.regex = re.compile (regex_string, re.DOTALL)
def __call__ (self, s):
       hits = self.regex.findall (s)
       nohits = self.regex.split (s)
valid_hits = set (hits) & self.targets_set # Ignore targets with illegal re modifiers.
       if valid_hits:
substitutes = [self.table [item] for item in hits if item in valid_hits] + [] # Make lengths equal for zip to work right
           if self.eat:
               return ''.join (substitutes)
else: zipped = zip (nohits, substitutes) return ''.join (list (reduce (lambda a, b: a + b, [zipped][0]))) + nohits [-1]
       else:
           if self.eat:
               return ''
           else:
               return s

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to