As for the universal new lines— it seems that either converting when the file is read (default behavior) or a simple replace of “\r\n” first is a simple solution.
I’m still confused about the use case though. It seems it involves large amounts of text, where you need to access individual lines, but not where a list-of-lines makes sense. I can see that. But I’m having trouble imagining a use case where you need that, and performance is critical for building the data structure. But if all those are requirements, I’d think a custom data structure would be ideal — one that was both a large single string, and a sequence (and iterable) of lines. Which reminds me of a ragged array, which I have implemented as an extension to numpy. (Both in pure python and Cython). Perhaps a C-implemented (or accelerated) class that does all this would be a nice third party package. And if proven useful, stdlib in the future. I imagine you wouldn’t want the dependency, but it would be interesting to benchmark a numpy solution. Numpy isn’t very memory efficient for strings (UCS-4), but it should be fast. Final note: it seems the regex solution for a single char is performant, but overly complicated. I’ve been very happy that I can do most anything with string methods, and rarely need to reach for regex. For something this simple, it would be nice to have a string method. -CHB On Sun, Jun 19, 2022 at 3:34 PM Jonathan Fine <jfine2...@gmail.com> wrote: > Hi > > This is a nice problem, well presented. Here's four comments / questions. > > 1. How does the introduction of faster CPython in Python 3.11 affect the > benchmarks? > 2. Is there an across-the-board change that would speedup this > line-offsets task? > 3. To limit splitlines memory use (at small performance cost), chunk the > input string into say 4 kb blocks. > 4. Perhaps anything done here for strings should also be done for bytes. > > -- > Jonathan > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/AETGT5HDF3QOFODOWKB4X45ZE4CZ7Y3M/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7XP7FB5X6W7OVVZK3CWTWZLBTAA3SVHK/ Code of Conduct: http://python.org/psf/codeofconduct/