On 2015-08-18 22:42, Laurent Pointal wrote:
Hello,

I want to make a replacement in a string, to ensure that ellipsis are
surrounded by spaces (this is not a typographycal problem, but a preparation
for late text chunking).

I tried with regular expressions and the SRE_Pattern.sub() method, but I
have an unexpected duplication of the replacement pattern:


The code:

ellipfind_re = re.compile(r"((?=\.\.\.)|…)", re.IGNORECASE|re.VERBOSE)
ellipfind_re.sub(' ... ',
        "C'est un essai... avec différents caractères… pour voir.")

And I retrieve:

"C'est un essai ... ... avec différents caractères ...  pour voir."
                     ^^^

I tested with/without group capture, same result.

My Python version:
Python 3.4.3 (default, Mar 26 2015, 22:03:40)
[GCC 4.9.2] on linux

Any idea ?

(?=...) is a lookahead; a non-capture group is (?:...).

The regex should be r"((?:\.\.\.)|…)", which can be simplified to just
r"\.\.\.|…" for your use-case. (You don't need the
re.IGNORECASE|re.VERBOSE either!)

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to