[Python-ideas] Re: Add a line_offsets() method to str

2022-06-21 Thread Inada Naoki
On Tue, Jun 21, 2022 at 3:49 AM Marco Sulla wrote: > > On Sun, 19 Jun 2022 at 03:06, Inada Naoki wrote: > > FWIW, I had proposed str.iterlines() to fix incompatibility between > > IO.readlines() and str.splitlines(). > > It's a good idea IMHO. In your mind, str.iterlines() will find only > \n,

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-20 Thread Steve Jorgensen
Steve Jorgensen wrote: > Jonathan Slenders wrote: > > Hi everyone, > > Today was the 3rd time I came across a situation where it was needed to > > retrieve all the positions of the line endings (or beginnings) in a very > > long python string as efficiently as possible. First time, it was needed

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-20 Thread Steve Jorgensen
Jonathan Slenders wrote: > Hi everyone, > Today was the 3rd time I came across a situation where it was needed to > retrieve all the positions of the line endings (or beginnings) in a very > long python string as efficiently as possible. First time, it was needed in > prompt_toolkit, where I spent

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-20 Thread MRAB
On 2022-06-20 16:12, Christopher Barker wrote: Hmm - I’m a bit confused about how you handle mixed / multiple line endings. If you use splitlines(), then it will remove the line endings, so if there are two-char line endings, then you’ll get off by one errors, yes? I would think you could

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-20 Thread Christopher Barker
Hmm - I’m a bit confused about how you handle mixed / multiple line endings. If you use splitlines(), then it will remove the line endings, so if there are two-char line endings, then you’ll get off by one errors, yes? I would think you could look for “\n”, and get the correct answer ( with

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-20 Thread Christopher Barker
If you are working with bytes, then numpy could be perfect— not a small dependency of course, but it should work, and work fast. And a cython method would be quite easy to write, but of course substantially harder to distribute :-( -CHB On Sun, Jun 19, 2022 at 5:30 PM Jonathan Slenders wrote:

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-19 Thread Jonathan Slenders
Thanks all for all the responses! That's quite a bit to think about. A couple of thoughts: 1. First, I do support a transition to UTF-8, so I understand we don't want to add more methods that deal with character offsets. (I'm familiar with how strings work in Rust.) However, does that mean we

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-19 Thread Christopher Barker
As for the universal new lines— it seems that either converting when the file is read (default behavior) or a simple replace of “\r\n” first is a simple solution. I’m still confused about the use case though. It seems it involves large amounts of text, where you need to access individual lines,

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-19 Thread Jonathan Fine
Hi This is a nice problem, well presented. Here's four comments / questions. 1. How does the introduction of faster CPython in Python 3.11 affect the benchmarks? 2. Is there an across-the-board change that would speedup this line-offsets task? 3. To limit splitlines memory use (at small

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-19 Thread n . musolino
Jonathan Slenders wrote: > Hi everyone, > Today was the 3rd time I came across a situation where it was needed to > retrieve all the positions of the line endings (or beginnings) in a very > long python string as efficiently as possible. > > [...] > > Would it make sense to add a

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-19 Thread Stephen J. Turnbull
Jonathan Slenders writes: > Good catch! One correction here, I somewhat mixed up the benchmarks. I > forgot both projects of mine required support for universal line endings > exactly like splitlines() does this out of the box. I can't remember ever seeing an application where such a method

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Inada Naoki
On Sat, Jun 18, 2022 at 5:13 AM Jonathan Slenders wrote: > First time, it was needed in prompt_toolkit, where I spent a crazy amount of > time looking for the most performant solution. > Third time is for the Rich/Textual project from Will McGugan. (See: >

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread MRAB
On 2022-06-18 21:55, Jonathan Slenders wrote: Good catch! One correction here, I somewhat mixed up the benchmarks. I forgot both projects of mine required support for universal line endings exactly like splitlines() does this out of the box. That requires a more complex regex pattern. I was

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Jeremiah Gabriel Pascual
> This makes me realize that `str.indexes(char)` is actually not what I need, > but really a `str.line_offsets()` which returns exactly the positions that > `str.splitlines()` would use. Does that make sense? I'm also thinking of a generic `str.split_indices(char)` that handles all characters

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Paul Moore
On Sat, 18 Jun 2022 at 23:15, Eric V. Smith via Python-ideas wrote: > > On 6/18/2022 5:34 PM, Paul Moore wrote: > > After all, it has the > > advantage of working on older versions of Python (and given that one > > of your use cases is Textual, I can't imagine anyone would be happy if > > that

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Eric V. Smith via Python-ideas
On 6/18/2022 5:34 PM, Paul Moore wrote: After all, it has the advantage of working on older versions of Python (and given that one of your use cases is Textual, I can't imagine anyone would be happy if that required Python 2.12+...) Guido's "no 2.8" shirt apparently didn't stop 2.9 through

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Paul Moore
On Sat, 18 Jun 2022 at 21:57, Jonathan Slenders wrote: > > Good catch! One correction here, I somewhat mixed up the benchmarks. I forgot > both projects of mine required support for universal line endings exactly > like splitlines() does this out of the box. That requires a more complex >

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Jonathan Slenders
Good catch! One correction here, I somewhat mixed up the benchmarks. I forgot both projects of mine required support for universal line endings exactly like splitlines() does this out of the box. That requires a more complex regex pattern. I was actually using: re.compile(r"\n|\r(?!\n)") And then

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Lucas Wiman
I'm a little confused by the benchmark. Using re looks pretty competitive in terms of speed, and should be much more memory efficient. # https://www.gutenberg.org/cache/epub/100/pg100.txt (5.7mb; ~170K lines) with open('/tmp/shakespeare.txt', 'r') as f: text = f.read() import re from

[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Christopher Barker
My first thought is that you are using the wrong data structure here— perhaps a list of lines would make more sense than one big ol’ string. That being said, a way to get all the indexes of a character could be useful I’d support that idea. -CHB On Fri, Jun 17, 2022 at 10:13 PM Jonathan