As for the universal new lines— it seems that either converting when the
file is read (default behavior) or a simple replace of “\r\n” first is a
simple solution.

I’m still confused about the use case though. It seems it involves large
amounts of text, where you need to access individual lines, but not where a
list-of-lines makes sense. I can see that. But I’m having trouble imagining
a use case where you need that, and performance is critical for building
the data structure.

But if all those are requirements, I’d think a custom data structure would
be ideal — one that was both a large single string, and a sequence (and
iterable) of lines.

Which reminds me of a ragged array, which I have implemented as an
extension to numpy. (Both in pure python and Cython).

Perhaps a C-implemented (or accelerated) class that does all this would be
a nice third party package. And if proven useful, stdlib  in the future.

I imagine  you wouldn’t want the dependency, but it would be interesting to
benchmark a numpy solution. Numpy isn’t very memory efficient for strings
(UCS-4), but it should be fast.

Final note: it seems the regex solution for a single char is performant,
but overly complicated. I’ve been very happy that I can do most anything
with string methods, and rarely need to reach for regex. For something this
simple, it would be nice to have a string method.

-CHB




On Sun, Jun 19, 2022 at 3:34 PM Jonathan Fine <jfine2...@gmail.com> wrote:

> Hi
>
> This is a nice problem, well presented. Here's four comments / questions.
>
> 1. How does the introduction of faster CPython in Python 3.11 affect the
> benchmarks?
> 2. Is there an across-the-board change that would speedup this
> line-offsets task?
> 3. To limit splitlines memory use (at small performance cost), chunk the
> input string into say 4 kb blocks.
> 4. Perhaps anything done here for strings should also be done for bytes.
>
> --
> Jonathan
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/AETGT5HDF3QOFODOWKB4X45ZE4CZ7Y3M/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7XP7FB5X6W7OVVZK3CWTWZLBTAA3SVHK/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to