[issue43014] tokenize spends a lot of time in `re.compile(...)`

Anthony Sottile Sun, 24 Jan 2021 00:34:27 -0800


New submission from Anthony Sottile <[email protected]>:


I did some profiling (attached a few files here with svgs) of running this 
script:

```python
import io
import tokenize

# picked as the second longest file in cpython
with open('Lib/test/test_socket.py', 'rb') as f:
    bio = io.BytesIO(f.read())


def main():
    for _ in range(10):
        bio.seek(0)
        for _ in tokenize.tokenize(bio.readline):
            pass

if __name__ == '__main__':
    exit(main())
```


the first profile is before the optimization, the second is after the 
optimization

The optimization takes the execution from ~6300ms to ~4500ms on my machine 
(representing a 28% - 39% improvement depending on how you calculate it)

(I'll attach the pstats and svgs after creation, seems I can only attach one 
file at once)

----------
components: Library (Lib)
files: out.pstats
messages: 385572
nosy: Anthony Sottile
priority: normal
severity: normal
status: open
title: tokenize spends a lot of time in `re.compile(...)`
type: performance
versions: Python 3.10, Python 3.9
Added file: https://bugs.python.org/file49759/out.pstats

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue43014>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43014] tokenize spends a lot of time in `re.compile(...)`

Reply via email to