New submission from Patrick Maupin: The addition of a capturing group in a re.split() pattern, e.g. using '(\n)' instead of '\n', causes a factor of 10 performance degradation.
I use re.split a() lot, but never noticed the issue before. It was extremely noticeable on 1000 patterns in a 5BG file, though, requiring 40 seconds instead of 4. I have attached a script demonstrating the issue. I have tested on 2.7 and 3.4, but have no reason to believe it doesn't exist on other vesions as well. Thanks, Pat ---------- components: Regular Expressions files: splitter2.py messages: 245137 nosy: Patrick Maupin, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: re.split performance degraded significantly by capturing group type: performance versions: Python 2.7, Python 3.4 Added file: http://bugs.python.org/file39676/splitter2.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24426> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com