New submission from Patrick Maupin:

The addition of a capturing group in a re.split() pattern, e.g. using '(\n)' 
instead of '\n', causes a factor of 10 performance degradation.

I use re.split a() lot, but never noticed the issue before.  It was extremely 
noticeable on 1000 patterns in a 5BG file, though, requiring 40 seconds instead 
of 4.

I have attached a script demonstrating the issue.  I have tested on 2.7 and 
3.4, but have no reason to believe it doesn't exist on other vesions as well.

Thanks,
Pat

----------
components: Regular Expressions
files: splitter2.py
messages: 245137
nosy: Patrick Maupin, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.split performance degraded significantly by capturing group
type: performance
versions: Python 2.7, Python 3.4
Added file: http://bugs.python.org/file39676/splitter2.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24426>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to