On Jan 18, 4:26 pm, Steven D'Aprano <st...@remove-this- cybersource.com.au> wrote: > On Mon, 18 Jan 2010 06:23:44 -0800, Iain King wrote: > > On Jan 18, 2:17 pm, Adi Eyal <a...@digitaltrowel.com> wrote: > [...] > >> Using regular expressions the answer is short (and sweet) > > >> mapping = { > >> "foo" : "bar", > >> "baz" : "quux", > >> "quuux" : "foo" > > >> } > > >> pattern = "(%s)" % "|".join(mapping.keys()) > >> repl = lambda x : mapping.get(x.group(1), x.group(1)) > >> s = "fooxxxbazyyyquuux" > >> re.subn(pattern, repl, s) > > > Winner! :) > > What are the rules for being declared "Winner"? For the simple case > given, calling s.replace three times is much faster: more than twice as > fast. > > But a bigger problem is that the above "winner" may not work correctly if > there are conflicts between the target strings (e.g. 'a'->'X', > 'aa'->'Y'). The problem is that the result you get depends on the order > of the searches, BUT as given, that order is non-deterministic. > dict.keys() returns in an arbitrary order, which means the caller can't > specify the order except by accident. For example: > > >>> repl = lambda x : m[x.group(1)] > >>> m = {'aa': 'Y', 'a': 'X'} > >>> pattern = "(%s)" % "|".join(m.keys()) > >>> subn(pattern, repl, 'aaa') # expecting 'YX' > > ('XXX', 3) > > The result that you get using this method will be consistent but > arbitrary and unpredictable. > > For those who care, here's my timing code: > > from timeit import Timer > > setup = """ > mapping = {"foo" : "bar", "baz" : "quux", "quuux" : "foo"} > pattern = "(%s)" % "|".join(mapping.keys()) > repl = lambda x : mapping.get(x.group(1), x.group(1)) > repl = lambda x : mapping[x.group(1)] > s = "fooxxxbazyyyquuux" > from re import subn > """ > > t1 = Timer("subn(pattern, repl, s)", setup) > t2 = Timer( > "s.replace('foo', 'bar').replace('baz', 'quux').replace('quuux', 'foo')", > "s = 'fooxxxbazyyyquuux'") > > And the results on my PC: > > >>> min(t1.repeat(number=100000)) > 1.1273870468139648 > >>> min(t2.repeat(number=100000)) > > 0.49491715431213379 > > -- > Steven
Adi elicited that response from me because his solution was vastly more succinct than everything else that had appeared up til that point while still meeting the OP's requirements. The OP never cared about overlap between 2 'find' strings, just between the 'find' and 'replace' strings (though I did take it into account in my second post for the sake of completeness). His code could have been a little cleaner, I'd have trimmed it to: mapping = {"foo": "bar", "baz": "quux", "quuux": "foo"} pattern = "(%s)" % "|".join(mapping) repl = lambda x : mapping[x.group(1)] s = "fooxxxbazyyyquuux" re.subn(pattern, repl, s) but apart from that was very pythonic: explicit, succinct, and all the heavy work is done by the module (i.e. in compiled c code in the majority case of CPython). It can be 'upgraded' to cover the find- find overlap if you really want (I *believe* regexps will match the leftmost option in a group first): subs = [("foo", "bar"), ("baz", "quux"), ("quuux", "foo")] pattern = "(%s)" % "|".join((x[0] for x in subs)) mapping = dict(subs) repl = lambda x : mapping[x.group(1)] s = "fooxxxbazyyyquuux" re.subn(pattern, repl, s) Anyway, there's no prize for winning, but by all means: if you think someone else's code and not a variation on this should win for most pythonic, then make your nomination :) Iain -- http://mail.python.org/mailman/listinfo/python-list