On Tue, Jun 21, 2022 at 3:49 AM Marco Sulla
wrote:
>
> On Sun, 19 Jun 2022 at 03:06, Inada Naoki wrote:
> > FWIW, I had proposed str.iterlines() to fix incompatibility between
> > IO.readlines() and str.splitlines().
>
> It's a good idea IMHO. In your mind, str.iterlines() will find only
> \n,
Steve Jorgensen wrote:
> Jonathan Slenders wrote:
> > Hi everyone,
> > Today was the 3rd time I came across a situation where it was needed to
> > retrieve all the positions of the line endings (or beginnings) in a very
> > long python string as efficiently as possible. First time, it was needed
Jonathan Slenders wrote:
> Hi everyone,
> Today was the 3rd time I came across a situation where it was needed to
> retrieve all the positions of the line endings (or beginnings) in a very
> long python string as efficiently as possible. First time, it was needed in
> prompt_toolkit, where I spent
On 2022-06-20 16:12, Christopher Barker wrote:
Hmm - I’m a bit confused about how you handle mixed / multiple line
endings. If you use splitlines(), then it will remove the line endings,
so if there are two-char line endings, then you’ll get off by one
errors, yes?
I would think you could
Hmm - I’m a bit confused about how you handle mixed / multiple line
endings. If you use splitlines(), then it will remove the line endings, so
if there are two-char line endings, then you’ll get off by one errors, yes?
I would think you could look for “\n”, and get the correct answer ( with
If you are working with bytes, then numpy could be perfect— not a small
dependency of course, but it should work, and work fast.
And a cython method would be quite easy to write, but of course
substantially harder to distribute :-(
-CHB
On Sun, Jun 19, 2022 at 5:30 PM Jonathan Slenders
wrote:
Thanks all for all the responses! That's quite a bit to think about.
A couple of thoughts:
1. First, I do support a transition to UTF-8, so I understand we don't want
to add more methods that deal with character offsets. (I'm familiar with
how strings work in Rust.) However, does that mean we
As for the universal new lines— it seems that either converting when the
file is read (default behavior) or a simple replace of “\r\n” first is a
simple solution.
I’m still confused about the use case though. It seems it involves large
amounts of text, where you need to access individual lines,
Hi
This is a nice problem, well presented. Here's four comments / questions.
1. How does the introduction of faster CPython in Python 3.11 affect the
benchmarks?
2. Is there an across-the-board change that would speedup this line-offsets
task?
3. To limit splitlines memory use (at small
Jonathan Slenders wrote:
> Hi everyone,
> Today was the 3rd time I came across a situation where it was needed to
> retrieve all the positions of the line endings (or beginnings) in a very
> long python string as efficiently as possible.
>
> [...]
>
> Would it make sense to add a
Jonathan Slenders writes:
> Good catch! One correction here, I somewhat mixed up the benchmarks. I
> forgot both projects of mine required support for universal line endings
> exactly like splitlines() does this out of the box.
I can't remember ever seeing an application where such a method
On Sat, Jun 18, 2022 at 5:13 AM Jonathan Slenders wrote:
> First time, it was needed in prompt_toolkit, where I spent a crazy amount of
> time looking for the most performant solution.
> Third time is for the Rich/Textual project from Will McGugan. (See:
>
On 2022-06-18 21:55, Jonathan Slenders wrote:
Good catch! One correction here, I somewhat mixed up the benchmarks. I
forgot both projects of mine required support for universal line endings
exactly like splitlines() does this out of the box. That requires a more
complex regex pattern. I was
> This makes me realize that `str.indexes(char)` is actually not what I need,
> but really a `str.line_offsets()` which returns exactly the positions that
> `str.splitlines()` would use. Does that make sense?
I'm also thinking of a generic `str.split_indices(char)` that handles all
characters
On Sat, 18 Jun 2022 at 23:15, Eric V. Smith via Python-ideas
wrote:
>
> On 6/18/2022 5:34 PM, Paul Moore wrote:
> > After all, it has the
> > advantage of working on older versions of Python (and given that one
> > of your use cases is Textual, I can't imagine anyone would be happy if
> > that
On 6/18/2022 5:34 PM, Paul Moore wrote:
After all, it has the
advantage of working on older versions of Python (and given that one
of your use cases is Textual, I can't imagine anyone would be happy if
that required Python 2.12+...)
Guido's "no 2.8" shirt apparently didn't stop 2.9 through
On Sat, 18 Jun 2022 at 21:57, Jonathan Slenders wrote:
>
> Good catch! One correction here, I somewhat mixed up the benchmarks. I forgot
> both projects of mine required support for universal line endings exactly
> like splitlines() does this out of the box. That requires a more complex
>
Good catch! One correction here, I somewhat mixed up the benchmarks. I
forgot both projects of mine required support for universal line endings
exactly like splitlines() does this out of the box. That requires a more
complex regex pattern. I was actually using:
re.compile(r"\n|\r(?!\n)")
And then
I'm a little confused by the benchmark. Using re looks pretty competitive
in terms of speed, and should be much more memory efficient.
# https://www.gutenberg.org/cache/epub/100/pg100.txt (5.7mb; ~170K lines)
with open('/tmp/shakespeare.txt', 'r') as f:
text = f.read()
import re
from
My first thought is that you are using the wrong data structure here—
perhaps a list of lines would make more sense than one big ol’ string.
That being said, a way to get all the indexes of a character could be
useful I’d support that idea.
-CHB
On Fri, Jun 17, 2022 at 10:13 PM Jonathan
20 matches
Mail list logo