[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-11-03 Thread Andrew Barnert via Python-ideas
On Nov 2, 2019, at 20:33, Random832 wrote: > >> On Sun, Oct 27, 2019, at 03:10, Andrew Barnert wrote: >>> On Oct 26, 2019, at 19:59, Random832 wrote: >>> >>> A string representation considering of (say) a UTF-8 string, plus an >>> auxiliary list of byte indices of, say, 256-codepoint-long

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-11-02 Thread Random832
On Sun, Oct 27, 2019, at 03:10, Andrew Barnert wrote: > On Oct 26, 2019, at 19:59, Random832 wrote: > > > > A string representation considering of (say) a UTF-8 string, plus an > > auxiliary list of byte indices of, say, 256-codepoint-long chunks [along > > with perhaps a flag to say that the

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-28 Thread Steven D'Aprano
I think that we're more or less in broad agreement, but I wanted to comment on this: On Sun, Oct 27, 2019 at 09:41:00PM -0700, Andrew Barnert wrote: > Yes, that’s the whole point of the message you were responding to: > extended grapheme clusters are the Unicode approximation of > characters;

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Andrew Barnert via Python-ideas
On Oct 27, 2019, at 18:00, Steven D'Aprano wrote: > > On Sun, Oct 27, 2019 at 10:07:41AM -0700, Andrew Barnert via Python-ideas > wrote: > >>> File "/home/rosuav/tmp/demo.py", line 1 >>> print("Hello, world!') >>>^ >>> SyntaxError: EOL while scanning string literal >>

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Andrew Barnert via Python-ideas
On Oct 27, 2019, at 05:49, Chris Angelico wrote: >> Given zero-based indexing, and the string: >> >>"abÇÐεф" >> >> the index of "ф" better damn well be 5 rather than 8 (UTF-8), 10 >> (UTF-16) or 20 (UTF-32) or I'll be knocking on the API designer's door >> with a pitchfork and a flaming

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Andrew Barnert via Python-ideas
> On Oct 27, 2019, at 05:38, Steven D'Aprano wrote: > >> On Sun, Oct 27, 2019 at 12:10:22AM -0700, Andrew Barnert via Python-ideas >> wrote: >> >> If you redesign your find, re.search, etc. APIs to not return >> character indexes, then I think you can get away with not having >>

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Chris Angelico
On Sun, Oct 27, 2019 at 11:43 PM Steven D'Aprano wrote: > > On Sun, Oct 27, 2019 at 12:10:22AM -0700, Andrew Barnert via Python-ideas > wrote: > > > If you redesign your find, re.search, etc. APIs to not return > > character indexes, then I think you can get away with not having > >

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Steven D'Aprano
On Sun, Oct 27, 2019 at 12:10:22AM -0700, Andrew Barnert via Python-ideas wrote: > If you redesign your find, re.search, etc. APIs to not return > character indexes, then I think you can get away with not having > character-indexable strings. If string.index(c) doesn't return the index of c in

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Steven D'Aprano
On Sun, Oct 27, 2019 at 03:33:16PM +1100, Steven D'Aprano wrote: > else: > assert c <= '\U0001': Oops, missplaced a zero there. That was supposed to be '\U0010'. -- Steven ___ Python-ideas mailing list -- python-ideas@python.org

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Random832
On Sun, Oct 27, 2019, at 03:39, Andrew Barnert via Python-ideas wrote: > (Actually, IIRC, one of the two has a str type that, despite being 2.x, > is unicode rather than bytes, but with some extra undocumented > functionality to smuggle bytes around in a str and have it sometimes > work.) I do

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Andrew Barnert via Python-ideas
On Oct 26, 2019, at 21:33, Steven D'Aprano wrote: > > IronPython and Jython use whatever .Net and Java use. Which makes them sequences of UTF-16 code units, not code points. Which is allowed for the Python 2.x unicode type, but would violate the rules for 3.x str, but neither one has a 3.x.

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-27 Thread Andrew Barnert via Python-ideas
On Oct 26, 2019, at 19:59, Random832 wrote: > > A string representation considering of (say) a UTF-8 string, plus an > auxiliary list of byte indices of, say, 256-codepoint-long chunks [along with > perhaps a flag to say that the chunk is all-ASCII or not] would provide O(1) > random access,

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread David Mertz
PEP 393 The Unicode string type is changed to support multiple internal representations, depending on the character with the largest Unicode ordinal (1, 2, or 4 bytes) ... Ah, OK. I get it. One byte representation is only ASCII, which happens to match utf-8. Well, the latin-1 oddness. But the

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread Steven D'Aprano
On Sat, Oct 26, 2019 at 11:34:34PM -0400, David Mertz wrote: > What does actual CPython do currently to find that s[1_000_000], assuming > utf-8 internal representation? CPython doesn't use a UTF-8 internal representation. MicroPython *may*, but I don't know if they do anything fancy to avoid

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread Chris Angelico
On Sun, Oct 27, 2019 at 2:37 PM David Mertz wrote: > What does actual CPython do currently to find that s[1_000_000], assuming > utf-8 internal representation? > Mu. CPython does not have a UTF-8 internal representation. ChrisA ___ Python-ideas

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread David Mertz
Ok, true enough that dereferencing and limited linear search is still O(1). I could have phrased that slightly more precisely. But the trade-off part is true. Indexing into character 1 million of a utf-32 string is just one memory offset calculation, them following the reference. Indexing into

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread Random832
On Sat, Oct 26, 2019, at 20:26, David Mertz wrote: > Absolutely, utf-8 is a wonderful encoding. And indeed, worst case is > the same storage requirement as utf-16 or utf-32. For O(1) random > access into all strings, we have to eat 32-bits per character, one way > or the other, but of course

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread Random832
On Wed, Oct 23, 2019, at 19:00, Christopher Barker wrote: > On Sun, Oct 13, 2019 at 12:52 PM Andrew Barnert via Python-ideas > wrote: > > The main problem is that a str is a sequence of single-character str, each > > of which is a one-element sequence of itself, etc. forever. If you wanted >

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread Andrew Barnert via Python-ideas
On Oct 26, 2019, at 16:28, Steven D'Aprano wrote: > >> On Sun, Oct 13, 2019 at 12:41:55PM -0700, Andrew Barnert via Python-ideas >> wrote: >> On Oct 13, 2019, at 12:02, Steve Jorgensen wrote: > [...] >>> This proposal is a serious breakage of backward compatibility, so >>> would be something

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread David Mertz
Absolutely, utf-8 is a wonderful encoding. And indeed, worst case is the same storage requirement as utf-16 or utf-32. For O(1) random access into all strings, we have to eat 32-bits per character, one way or the other, but of course there are space/speed trade-offs one could make for intermediate

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread Steven D'Aprano
On Sat, Oct 26, 2019 at 07:38:19PM -0400, David Mertz wrote: > On Sat, Oct 26, 2019, 7:29 PM Steven D'Aprano > > > > (At worst, a code-point in UTF-8 takes three bytes, compared to four in > > UTF-16 or UTF-32.) > > > > http://www.fileformat.info/info/unicode/char/1/index.htm Oops, you're

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread David Mertz
On Sat, Oct 26, 2019, 7:29 PM Steven D'Aprano > (At worst, a code-point in UTF-8 takes three bytes, compared to four in > UTF-16 or UTF-32.) > http://www.fileformat.info/info/unicode/char/1/index.htm > ___ Python-ideas mailing list --

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread Steven D'Aprano
On Sun, Oct 13, 2019 at 12:41:55PM -0700, Andrew Barnert via Python-ideas wrote: > On Oct 13, 2019, at 12:02, Steve Jorgensen wrote: [...] > > This proposal is a serious breakage of backward compatibility, so > > would be something for Python 4.x, not 3.x. > > I’m pretty sure almost nobody

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-26 Thread Steven D'Aprano
On Fri, Oct 25, 2019 at 08:44:17PM -0700, Ben Rudiak-Gould wrote: > Nothing good can come of decomposing strings into Unicode code points. Sure there is. In Python, it's the fastest way to calculate the digit sum of an integer. It's also useful for implementing classical encryption algorithms,

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-25 Thread Ben Rudiak-Gould
Since this is Python 4000, where everything's made up and the points don't matter... I think there shouldn't be a char type, and also strings shouldn't be iterable, or indexable by integers, or anything else that makes them appear to be tuples of code points. Nothing good can come of decomposing

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-25 Thread Andrew Barnert via Python-ideas
On Oct 25, 2019, at 06:26, Serhiy Storchaka wrote: > > 25.10.19 15:53, Andrew Barnert via Python-ideas пише: >> If you were designing a new Python-like language today, or if you had a time >> machine back to the 90s, it would be a different story. > > Interesting, how far in past you will need

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-25 Thread Serhiy Storchaka
25.10.19 15:53, Andrew Barnert via Python-ideas пише: If you were designing a new Python-like language today, or if you had a time machine back to the 90s, it would be a different story. Interesting, how far in past you will need to travel? Initially builtin types did not have methods or

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-25 Thread Andrew Barnert via Python-ideas
On Oct 25, 2019, at 01:34, Paul Moore wrote: > > On Thu, 24 Oct 2019 at 23:47, Andrew Barnert via Python-ideas > wrote: >> But again, I don’t think either of these is the reason Python strings being >> iterable is a problem; I think it really is primarily about them being >> iterables of

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-25 Thread Paul Moore
On Thu, 24 Oct 2019 at 23:47, Andrew Barnert via Python-ideas wrote: > But again, I don’t think either of these is the reason Python strings being > iterable is a problem; I think it really is primarily about them being > iterables of strings. The *real* problem is that there's a whole load of

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-24 Thread Andrew Barnert via Python-ideas
On Oct 24, 2019, at 14:13, Greg Ewing wrote: > > I'm thinking of things like a function to recursively flatten > a nested list. You probably want it to stop when it gets to a > string, and not flatten the string into a list of characters. A function to recursively flatten a nested list should

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-24 Thread Greg Ewing
Christopher Barker wrote: wouldn't it? once you got to an object that couldn't be iterated, you'd know you had an atomic value. I'm thinking of things like a function to recursively flatten a nested list. You probably want it to stop when it gets to a string, and not flatten the string into a

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-24 Thread Christopher Barker
On Thu, Oct 24, 2019 at 1:13 AM Greg Ewing wrote: > Christopher Barker wrote: > > I've always wondered > > how disruptive it would be to add a char type > > I'm not sure if it would help much. Usually the problem with > strings being sequences of strings lies in the fact that they're > sequences

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-24 Thread Greg Ewing
Christopher Barker wrote: I've always wondered how disruptive it would be to add a char type I'm not sure if it would help much. Usually the problem with strings being sequences of strings lies in the fact that they're sequences at all. Code that operates generically on nested sequences often

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-24 Thread Anders Hovmöller
> On 24 Oct 2019, at 01:02, Christopher Barker wrote: > >  >> On Sun, Oct 13, 2019 at 12:52 PM Andrew Barnert via Python-ideas >> wrote: > >> The main problem is that a str is a sequence of single-character str, each >> of which is a one-element sequence of itself, etc. forever. If you

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-23 Thread Christopher Barker
There's a reason I've never actually proposed adding a char On Wed, Oct 23, 2019 at 5:34 PM Andrew Barnert wrote: > Well, just adding a char type (and presumably a way of defining char literals) wouldn’t be too disruptive. sure. > But changing str to iterate chars instead of strs, that

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-23 Thread Andrew Barnert via Python-ideas
On Oct 23, 2019, at 16:00, Christopher Barker wrote: > >> On Sun, Oct 13, 2019 at 12:52 PM Andrew Barnert via Python-ideas >> wrote: > >> The main problem is that a str is a sequence of single-character str, each >> of which is a one-element sequence of itself, etc. forever. If you wanted to

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-23 Thread Christopher Barker
On Sun, Oct 13, 2019 at 12:52 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote: > The main problem is that a str is a sequence of single-character str, each > of which is a one-element sequence of itself, etc. forever. If you wanted > to change this, I think it would make more

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-13 Thread Steve Jorgensen
Yup. I think you're absolutely right. After I posted this, I had a better idea: https://mail.python.org/archives/list/python-ideas@python.org/thread/OVP6SIOFNGGENJAJHXOS2AEUUPWSSRD2/ ___ Python-ideas mailing list -- python-ideas@python.org To

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-13 Thread Chris Angelico
On Mon, Oct 14, 2019 at 6:49 AM Andrew Barnert via Python-ideas wrote: > And finally, if you want to break strings, it’s probably worth at least > considering making UTF-8 strings first-class objects. They can’t be randomly > accessed, but with an iterable-plus API like files, with seek/tell,

[Python-ideas] Re: Python 4000: Have stringlike objects provide sequence views rather than being sequences

2019-10-13 Thread Andrew Barnert via Python-ideas
On Oct 13, 2019, at 12:02, Steve Jorgensen wrote: > > There are many cases in which it is awkward that testing whether an object is > a sequence returns `True` for instances of of `str`, `bytes`, etc. > > This proposal is a serious breakage of backward compatibility, so would be > something