On 7 Jun 2014 00:53, "Paul Sokolovsky" wrote:
>
> Yes. Except for one small detail - Python3 specifies these code points
> to be Unicode code points. And Unicode is a very bloated thing.
I rather suspect users of East Asian & African scripts might have a
different notion of what constitutes "bloa
On 7 June 2014 00:52, Paul Sokolovsky wrote:
> > At heart, this is exactly what the Python 3 "str" type is. The
> > universal convention is "code points".
>
> Yes. Except for one small detail - Python3 specifies these code points
> to be Unicode code points. And Unicode is a very bloated thing.
>
Hello,
On Fri, 06 Jun 2014 11:59:31 -0400
Terry Reedy wrote:
[]
> The other problem is that a small slice view of a large object keeps
> the large object alive, so a view user needs to think carefully about
> whether to make a copy or create a view, and later to copy views to
> delete the bas
On 06/06/2014 05:59 PM, Terry Reedy wrote:
The other problem is that a small slice view of a large object keeps the
large object alive, so a view user needs to think carefully about
whether to make a copy or create a view, and later to copy views to
delete the base object. This is not for beginne
On 6/6/2014 4:53 AM, Hrvoje Niksic wrote:
On 06/04/2014 05:52 PM, Mark Lawrence wrote:
Out of idle curiosity is there anything that stops MicroPython, or any
other implementation for that matter, from providing views of a string
rather than copying every time? IIRC memoryviews in CPython rely
On Fri, Jun 6, 2014 at 8:15 PM, Paul Sokolovsky wrote:
> I'm sorry if I was somehow related to that, my
> bringing in the formal language spec was more a rhetorical figure, a
> response to people claiming O(1) requirement.
This was exactly why this whole discussion came up, though. We were
debati
Hello,
On Fri, 6 Jun 2014 21:48:41 +1000
Tim Delaney wrote:
> On 6 June 2014 21:34, Paul Sokolovsky wrote:
>
> >
> > On Fri, 06 Jun 2014 20:11:27 +0900
> > "Stephen J. Turnbull" wrote:
> >
> > > Paul Sokolovsky writes:
> > >
> > > > That kinda means "string is atomic", instead of your
> > >
On 06/06/2014 09:53, Hrvoje Niksic wrote:
On 06/04/2014 05:52 PM, Mark Lawrence wrote:
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise),
re.finditer('\\S+', string) also provides the same behaviour and
gives me the sliced string, so there's no
Hello,
On Fri, 06 Jun 2014 09:32:25 +0100
Mark Lawrence wrote:
> On 04/06/2014 16:52, Mark Lawrence wrote:
> > On 04/06/2014 16:32, Steve Dower wrote:
> >>
> >> If copying into a separate list is a problem (memory-wise),
> >> re.finditer('\\S+', string) also provides the same behaviour and
> >>
On 6 June 2014 21:34, Paul Sokolovsky wrote:
>
> On Fri, 06 Jun 2014 20:11:27 +0900
> "Stephen J. Turnbull" wrote:
>
> > Paul Sokolovsky writes:
> >
> > > That kinda means "string is atomic", instead of your "characters
> > > are atomic".
> >
> > I would be very surprised if a language that be
On 6 June 2014 21:15, Paul Sokolovsky wrote:
> Hello,
>
> On Thu, 5 Jun 2014 23:15:54 +1000
> Nick Coghlan wrote:
>
>> On 5 June 2014 22:37, Paul Sokolovsky wrote:
>> > On Thu, 5 Jun 2014 22:20:04 +1000
>> > Nick Coghlan wrote:
>> >> problems caused by trusting the locale encoding to be correct
Hello,
On Fri, 06 Jun 2014 20:11:27 +0900
"Stephen J. Turnbull" wrote:
> Paul Sokolovsky writes:
>
> > That kinda means "string is atomic", instead of your "characters
> > are atomic".
>
> I would be very surprised if a language that behaved that way was
> called a "Python subset". No index
Hello,
On Thu, 5 Jun 2014 23:15:54 +1000
Nick Coghlan wrote:
> On 5 June 2014 22:37, Paul Sokolovsky wrote:
> > On Thu, 5 Jun 2014 22:20:04 +1000
> > Nick Coghlan wrote:
> >> problems caused by trusting the locale encoding to be correct, but
> >> the startup code will need non-trivial changes
Paul Sokolovsky writes:
> That kinda means "string is atomic", instead of your "characters are
> atomic".
I would be very surprised if a language that behaved that way was
called a "Python subset". No indexing, no slicing, no regexps, no
.split(), no .startswith(), no sorted() or .sort(), ...!
Steven D'Aprano wrote:
I don't know about car engine controllers, but presumably they have
diagnostic ports, and they may sometimes output text. If they output
text, then at least hypothetically car mechanics in Russia might prefer
their car to output "правда" and "ложный" rather than "true" an
Hello,
On Thu, 5 Jun 2014 22:38:13 +1000
Nick Coghlan wrote:
> On 5 June 2014 22:10, Stefan Krah wrote:
> > Paul Sokolovsky wrote:
> >> In this regard, I'm glad to participate in mind-resetting
> >> discussion. So, let's reiterate - there's nothing like "the best",
> >> "the only right", "the
Hello,
On Thu, 5 Jun 2014 22:21:30 +1000
Tim Delaney wrote:
> On 5 June 2014 22:01, Paul Sokolovsky wrote:
>
> >
> > All these changes are what let me dream on and speculate on
> > possibility that Python4 could offer an encoding-neutral string type
> > (which means based on bytes)
> >
>
> To
On 06/04/2014 05:52 PM, Mark Lawrence wrote:
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise), re.finditer('\\S+',
string) also provides the same behaviour and gives me the sliced string, so
there's no need to index for anything.
Out of idl
On 04/06/2014 16:52, Mark Lawrence wrote:
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise),
re.finditer('\\S+', string) also provides the same behaviour and gives
me the sliced string, so there's no need to index for anything.
Out of idle cur
On Fri, Jun 06, 2014 at 12:51:11PM +1200, Greg Ewing wrote:
> Steven D'Aprano wrote:
> >(1) I asked if it would be okay for MicroPython to *optionally* use
> >nominally Unicode strings limited to ASCII. Pretty much the only
> >response to this as been Guido saying "That would be a pretty lousy
>
Steven D'Aprano wrote:
> (1) I asked if it would be okay for MicroPython to *optionally* use
> nominally Unicode strings limited to ASCII. Pretty much the only
> response to this as been Guido saying "That would be a pretty lousy
> option", and since nobody has really defended the suggestion,
Paul Sokolovsky wrote:
All these changes are what let me dream on and speculate on
possibility that Python4 could offer an encoding-neutral string type
(which means based on bytes)
Can you elaborate on exactly what you have in mind?
You seem to want something different from Python 3 str,
Python
Steven D'Aprano wrote:
(1) I asked if it would be okay for MicroPython to *optionally* use
nominally Unicode strings limited to ASCII. Pretty much the only
response to this as been Guido saying "That would be a pretty lousy
option",
It would be limiting to have this as the *only* way of
deali
On 6 Jun 2014 05:13, "Glenn Linderman" wrote:
>
> On 6/5/2014 11:41 AM, Daniel Holth wrote:
>>
>> discover new things
>> like dance-encoded strings, bytes decoded using an incorrect encoding
>> intended to be transcoded into the correct encoding later, surrogates
>> that work perfectly until .enco
Le 04/06/2014 02:51, Chris Angelico a écrit :
On Wed, Jun 4, 2014 at 3:17 PM, Nick Coghlan wrote:
It would. The downsides of a UTF-8 representation would be slower
iteration and much slower (O(N)) indexing/slicing.
There's no reason for iteration to be slower. Slicing would get
O(slice offset
On 6/5/2014 11:41 AM, Daniel Holth wrote:
discover new things
like dance-encoded strings, bytes decoded using an incorrect encoding
intended to be transcoded into the correct encoding later, surrogates
that work perfectly until .encode(), str(bytes), APIs that disagree
with you about whether the
On 6/5/2014 3:10 AM, Paul Sokolovsky wrote:
Hello,
On Wed, 04 Jun 2014 22:15:30 -0400
Terry Reedy wrote:
think you are again batting at a strawman. If you mean 'read from a
file', and all you want to do is read bytes from and write bytes to
external 'files', then there is obviously no need to
On Thu, Jun 5, 2014 at 11:59 AM, Paul Moore wrote:
> On 5 June 2014 14:15, Nick Coghlan wrote:
>> As I've said before in other contexts, find me Windows, Mac OS X and
>> JVM developers, or educators and scientists that are as concerned by
>> the text model changes as folks that are primarily focu
On 5 June 2014 14:15, Nick Coghlan wrote:
> As I've said before in other contexts, find me Windows, Mac OS X and
> JVM developers, or educators and scientists that are as concerned by
> the text model changes as folks that are primarily focused on Linux
> system (including network) programming, an
On Wed, Jun 04, 2014 at 11:17:18AM +1000, Steven D'Aprano wrote:
> There is a discussion over at MicroPython about the internal
> representation of Unicode strings. Micropython is aimed at embedded
> devices, and so minimizing memory use is important, possibly even
> more important than performa
On 5 June 2014 22:37, Paul Sokolovsky wrote:
> On Thu, 5 Jun 2014 22:20:04 +1000
> Nick Coghlan wrote:
>> problems caused by trusting the locale encoding to be correct, but the
>> startup code will need non-trivial changes for that to happen - the
>> C.UTF-8 locale may even become widespread befo
On 5 June 2014 22:10, Stefan Krah wrote:
> Paul Sokolovsky wrote:
>> In this regard, I'm glad to participate in mind-resetting discussion.
>> So, let's reiterate - there's nothing like "the best", "the only right",
>> "the only correct", "righter than", "more correct than" in CPython's
>> impleme
Hello,
On Thu, 5 Jun 2014 22:20:04 +1000
Nick Coghlan wrote:
[]
> problems caused by trusting the locale encoding to be correct, but the
> startup code will need non-trivial changes for that to happen - the
> C.UTF-8 locale may even become widespread before we get there).
... And until those go
On 5 June 2014 22:01, Paul Sokolovsky wrote:
>
> All these changes are what let me dream on and speculate on
> possibility that Python4 could offer an encoding-neutral string type
> (which means based on bytes)
>
To me, an "encoding neutral string type" means roughly "characters are
atomic", and
On 5 June 2014 22:01, Paul Sokolovsky wrote:
>> Aside from
>> some of the POSIX locale handling issues on Linux, many of the
>> concerns are with the usability of bytes and bytearray, not with str -
>> that's why binary interpolation is coming back in 3.5, and there will
>> likely be other usabili
Paul Sokolovsky wrote:
> In this regard, I'm glad to participate in mind-resetting discussion.
> So, let's reiterate - there's nothing like "the best", "the only right",
> "the only correct", "righter than", "more correct than" in CPython's
> implementation of Unicode storage. It is *arbitrary*. W
Hello,
On Thu, 5 Jun 2014 21:43:16 +1000
Nick Coghlan wrote:
> On 5 June 2014 21:25, Paul Sokolovsky wrote:
> > Well, I understand the plan - hoping that people will "get over
> > this". And I'm personally happy to stay away from this "trolling",
> > but any discussion related to Unicode goes i
On 5 June 2014 21:25, Paul Sokolovsky wrote:
> Well, I understand the plan - hoping that people will "get over this".
> And I'm personally happy to stay away from this "trolling", but any
> discussion related to Unicode goes in circles and returns to feeling
> that Unicode at the central role as p
On 5 June 2014 17:54, Stephen J. Turnbull wrote:
> What matters to you is that str (unicode) is an opaque type -- there
> is no specification of the internal representation in the language
> reference, and in fact several different ones coexist happily across
> existing Python implementations -- a
Hello,
On Thu, 05 Jun 2014 16:54:11 +0900
"Stephen J. Turnbull" wrote:
> Paul Sokolovsky writes:
>
> > Please put that in perspective when alarming over O(1) indexing of
> > inherently problematic niche datatype. (Again, it's not my or
> > MicroPython's fault that it was forced as standard s
Hello,
On Wed, 04 Jun 2014 22:15:30 -0400
Terry Reedy wrote:
> On 6/4/2014 6:52 PM, Paul Sokolovsky wrote:
>
> > "Well" is subjective (or should be defined formally based on the
> > requirements). With my MicroPython hat on, an implementation which
> > receives a string, transcodes it, leading
Serhiy Storchaka writes:
> Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is
> used instead of UCS4) is the better choice for CPython. I suppose that
> with populating emoticons and other icon characters in nearest 5 or 10
> years, even English text will often contain
05.06.14 05:25, Terry Reedy написав(ла):
I mentioned it as an alternative during the '393 discussion. I more than
half agree that the FSR is the better choice for CPython, which had no
particular attachment to UTF-16 in the way that I think Jython, for
instance, does.
Yes, I remember. I thing t
Paul Sokolovsky writes:
> Please put that in perspective when alarming over O(1) indexing of
> inherently problematic niche datatype. (Again, it's not my or
> MicroPython's fault that it was forced as standard string type. Maybe
> if CPython seriously considered now-standard UTF-8 encoding, re
04.06.14 23:50, Glenn Linderman написав(ла):
3) (Most space efficient) One cached entry, that caches the last
codepoint/byte position referenced. UTF-8 is able to be traversed in
either direction, so "next/previous" codepoint access would be
relatively fast (and such are very common operations, e
05.06.14 03:03, Greg Ewing написав(ла):
Serhiy Storchaka wrote:
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't
use iterators. They use indices, str.find and/or regular expressions.
Common use case is quickly find substring starting from current
position using str.find or
05.06.14 03:08, Greg Ewing написав(ла):
Serhiy Storchaka wrote:
A language which doesn't support O(1) indexing is not Python, it is
only Python-like language.
That's debatable, but even if it's true, I don't think
there's anything wrong with MicroPython being only a
"Python-like language". As
Glenn Linderman writes:
> 3) (Most space efficient) One cached entry, that caches the last
> codepoint/byte position referenced. UTF-8 is able to be traversed in
> either direction, so "next/previous" codepoint access would be
> relatively fast (and such are very common operations, even whe
On 6/4/2014 6:54 PM, Serhiy Storchaka wrote:
05.06.14 00:21, Terry Reedy написав(ла):
On 6/4/2014 3:41 AM, Jeff Allen wrote:
Jython uses UTF-16 internally -- probably the only sensible choice in a
Python that can call Java. Indexing is O(N), fundamentally. By
"fundamentally", I mean for those s
On 6/4/2014 6:52 PM, Paul Sokolovsky wrote:
"Well" is subjective (or should be defined formally based on the
requirements). With my MicroPython hat on, an implementation which
receives a string, transcodes it, leading to bigger size, just to
immediately transcode back and send out - is awful, en
Hello,
On Thu, 05 Jun 2014 12:08:21 +1200
Greg Ewing wrote:
> Serhiy Storchaka wrote:
> > A language which doesn't support O(1) indexing is not Python, it is
> > only Python-like language.
>
> That's debatable, but even if it's true, I don't think
> there's anything wrong with MicroPython being
On Thu, Jun 5, 2014 at 10:03 AM, Greg Ewing wrote:
> StringPositions could support the following operations:
>
>StringPosition + int --> StringPosition
>StringPosition - int --> StringPosition
>StringPosition - StringPosition --> int
>
> These would be computed by counting characters f
Hello,
On Thu, 05 Jun 2014 12:03:17 +1200
Greg Ewing wrote:
> Serhiy Storchaka wrote:
> > html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize
> > don't use iterators. They use indices, str.find and/or regular
> > expressions. Common use case is quickly find substring starting
> > fr
Glenn Linderman wrote:
so algorithms that walk two strings at a time cannot use the same
StringPosition to do so... yep, this is quite divergent from CPython and
Python.
They can, it's just that at most one of the indexing
operations would be fast; the StringPosition would
devolve into an in
Glenn Linderman wrote:
For that kind of thing, you don't need an actual character
index, just some way of referring to a place in a string.
I think you meant codepoint index, rather than character index.
Probably, but what I said is true either way.
This starts to diverge from Python code
On 6/4/2014 5:08 PM, Glenn Linderman wrote:
On 6/4/2014 5:03 PM, Greg Ewing wrote:
Serhiy Storchaka wrote:
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize
don't use iterators. They use indices, str.find and/or regular
expressions. Common use case is quickly find substring star
On 6/4/2014 5:03 PM, Greg Ewing wrote:
Serhiy Storchaka wrote:
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize
don't use iterators. They use indices, str.find and/or regular
expressions. Common use case is quickly find substring starting from
current position using str.find or
Serhiy Storchaka wrote:
A language which doesn't support O(1) indexing is not Python, it is only
Python-like language.
That's debatable, but even if it's true, I don't think
there's anything wrong with MicroPython being only a
"Python-like language". As has been pointed out, fitting
Python onto
Serhiy Storchaka wrote:
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't
use iterators. They use indices, str.find and/or regular expressions.
Common use case is quickly find substring starting from current position
using str.find or re.search, process found token, advance
On Wed, Jun 4, 2014 at 5:11 PM, Paul Sokolovsky wrote:
> On Wed, 4 Jun 2014 16:12:23 -0600
> Eric Snow wrote:
>> Actually, there is a "formal, implementation-independent language
>> spec":
>>
>> https://docs.python.org/3/reference/
>
> Opening that link in browser, pressing Ctrl+F and pasting you
05.06.14 00:21, Terry Reedy написав(ла):
On 6/4/2014 3:41 AM, Jeff Allen wrote:
Jython uses UTF-16 internally -- probably the only sensible choice in a
Python that can call Java. Indexing is O(N), fundamentally. By
"fundamentally", I mean for those strings that have not yet noticed that
they con
Hello,
On Wed, 4 Jun 2014 16:12:23 -0600
Eric Snow wrote:
> On Wed, Jun 4, 2014 at 3:14 PM, Paul Sokolovsky
> wrote:
> > That said, and unlike previous attempts to develop a small Python
> > implementations (which of course existed), we're striving to be
> > exactly a Python language implementa
On Thu, Jun 5, 2014 at 8:52 AM, Paul Sokolovsky wrote:
> "Well" is subjective (or should be defined formally based on the
> requirements). With my MicroPython hat on, an implementation which
> receives a string, transcodes it, leading to bigger size, just to
> immediately transcode back and send o
05.06.14 01:04, Terry Reedy написав(ла):
PS. You do not seem to be aware of how well the current PEP393
implementation works. If you are going to write any more about it, I
suggest you run Tools/Stringbench/stringbench.py for timings.
AFAIK stringbench is ASCII-only, so it likely is compatible
Hello,
On Wed, 04 Jun 2014 18:04:52 -0400
Terry Reedy wrote:
> On 6/4/2014 5:14 PM, Paul Sokolovsky wrote:
>
> > That said, and unlike previous attempts to develop a small Python
> > implementations (which of course existed), we're striving to be
> > exactly a Python language implementation, no
On Wed, Jun 4, 2014 at 3:14 PM, Paul Sokolovsky wrote:
> That said, and unlike previous attempts to develop a small Python
> implementations (which of course existed), we're striving to be exactly
> a Python language implementation, not a Python-like language
> implementation. As there's no formal
On 6/4/2014 5:14 PM, Paul Sokolovsky wrote:
That said, and unlike previous attempts to develop a small Python
implementations (which of course existed), we're striving to be exactly
a Python language implementation, not a Python-like language
implementation. As there's no formal, implementation-
On 6/4/2014 2:28 PM, Chris Angelico wrote:
On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman wrote:
8) (Content specific variable size caches) Index each codepoint that is a
different byte size than the previous codepoint, allowing indexing to be
used in the intervals. Worst case size is like 2,
On Thu, 05 Jun 2014 00:14:32 +0300, Paul Sokolovsky wrote:
> That said, and unlike previous attempts to develop a small Python
> implementations (which of course existed), we're striving to be exactly
> a Python language implementation, not a Python-like language
> implementation. As there's no fo
On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman wrote:
> 8) (Content specific variable size caches) Index each codepoint that is a
> different byte size than the previous codepoint, allowing indexing to be
> used in the intervals. Worst case size is like 2, best case size is a single
> entry for
On 6/4/2014 3:41 AM, Jeff Allen wrote:
Jython uses UTF-16 internally -- probably the only sensible choice in a
Python that can call Java. Indexing is O(N), fundamentally. By
"fundamentally", I mean for those strings that have not yet noticed that
they contain no supplementary (>0x) characters
On 6/4/2014 3:41 AM, Jeff Allen wrote:
Jython uses UTF-16 internally -- probably the only sensible choice in a
Python that can call Java. Indexing is O(N), fundamentally. By
"fundamentally", I mean for those strings that have not yet noticed that
they contain no supplementary (>0x) characters
Hello,
On Wed, 4 Jun 2014 11:25:51 -0700
Guido van Rossum wrote:
> This thread has devolved into a flame war. I think we should trust the
> Micropython implementers (whoever they are -- are they participating
> here?)
I'm a regular contributor. I'm not sure if the author, Damien George,
is on
lt;mailto:pmis...@gmail.com>
Cc: python-dev <mailto:python-dev@python.org>
Subject: Re: [Python-Dev] Internal representation of strings and
Micropython
If we're voting I think representing Unicode internally in micropython
as utf-8 with O(N) indexing is a great idea, partly because I'
On Wed, Jun 04, 2014 at 03:32:25PM +, Steve Dower wrote:
> Steven D'Aprano wrote:
> > The language semantics says that a string is an array of code points. Every
> > index relates to a single code point, no code point extends over two or more
> > indexes.
> > There's a 1:1 relationship between
Hello,
On Wed, 04 Jun 2014 20:52:14 +0300
Serhiy Storchaka wrote:
[]
> > That's sad, I agree.
>
> Other languages (Go, Rust) can be happy without O(1) indexing of
> strings. All string and regex operations work with iterators or
> cursors, and I believe this approach is not significant worse t
This thread has devolved into a flame war. I think we should trust the
Micropython implementers (whoever they are -- are they participating here?)
to know their users and let them do what feels right to them. We should
just ask them not to claim full compatibility with any particular Python
version
Serhiy Storchaka writes:
> It would be interesting to collect a statistic about how many indexing
> operations happened during the life of a string in typical (Micro)Python
> program.
Probably irrelevant (I doubt anybody is going to be writing
programmers' editors in MicroPython), but by far
04.06.14 20:05, Paul Sokolovsky написав(ла):
On Wed, 04 Jun 2014 19:49:18 +0300
Serhiy Storchaka wrote:
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize
don't use iterators. They use indices, str.find and/or regular
expressions. Common use case is quickly find substring starting
04.06.14 17:49, Paul Sokolovsky написав(ла):
On Thu, 5 Jun 2014 00:26:10 +1000
Chris Angelico wrote:
On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka
wrote:
04.06.14 10:03, Chris Angelico написав(ла):
Right, which is why I don't like the idea. But you don't need
non-ASCII characters to blin
04.06.14 19:52, MRAB написав(ла):
In order to avoid indexing, you could use some kind of 'cursor' class to
step forwards and backwards along strings. The cursor could include
both the codepoint index and the byte index.
So you need different string library and different regular expression
libr
Hello,
On Wed, 04 Jun 2014 19:49:18 +0300
Serhiy Storchaka wrote:
[]
> > But show me real-world case for that. Common usecase is scanning
> > string left-to-right, that should be done using iterator and thus
> > O(N). Right-to-left scanning would be order(s) of magnitude less
> > frequent, as an
On 2014-06-04 14:33, Nick Coghlan wrote:
On 4 June 2014 15:39, wrote:
On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote:
There's a general expectation that indexing will be O(1) because
all the builtin containers that support that syntax use it for
O(1) lookup operations.
Depend
04.06.14 18:38, Paul Sokolovsky написав(ла):
Any non-trivial text parsing uses indices or regular expressions (and
regular expressions themself use indices internally).
I keep hearing this stuff, and unfortunately so far don't have enough
time to collect all that stuff and provide detailed resp
For Jython and IronPython, UTF-16 may be best internal encoding.
Recent languages (Swiffy, Golang, Rust) chose UTF-8 as internal encoding.
Using utf-8 is simple and efficient. For example, no need for utf-8
copy of the string when writing to file
and serializing to JSON.
When implementing Python
Hello,
On Thu, 5 Jun 2014 01:00:52 +1000
Chris Angelico wrote:
> On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky
> wrote:
> >> > But you need non-ASCII characters to display a title of MP3
> >> > track.
> >
> > Yes, but to display a title, you don't need to do codepoint access
> > at random -
On 04/06/2014 16:32, Steve Dower wrote:
If copying into a separate list is a problem (memory-wise), re.finditer('\\S+',
string) also provides the same behaviour and gives me the sliced string, so
there's no need to index for anything.
Out of idle curiosity is there anything that stops Micro
Paul Sokolovsky wrote:
> You just shouldn't write inefficient programs, voila. But if you want, you
> can keep writing inefficient programs, they just will be inefficient. Peace.
Can I nominate this for QOTD? :)
Cheers,
Steve
___
Python-Dev mailing lis
Hello,
On Wed, 04 Jun 2014 17:40:14 +0300
Serhiy Storchaka wrote:
> 04.06.14 17:02, Paul Moore написав(ла):
> > On 4 June 2014 14:39, Serhiy Storchaka wrote:
> >> I think than breaking O(1) expectation for indexing makes the
> >> implementation significant incompatible with Python. Virtually al
Steven D'Aprano wrote:
> The language semantics says that a string is an array of code points. Every
> index relates to a single code point, no code point extends over two or more
> indexes.
> There's a 1:1 relationship between code points and indexes. How is direct
> indexing "likely to be incorre
On Wed, Jun 4, 2014 at 10:12 AM, Steven D'Aprano wrote:
> On Wed, Jun 04, 2014 at 01:14:04PM +, Steve Dower wrote:
>> I'm agree with Daniel. Directly indexing into text suggests an
>> attempted optimization that is likely to be incorrect for a set of
>> strings.
>
> I'm afraid I don't understa
04.06.14 17:02, Paul Moore написав(ла):
On 4 June 2014 14:39, Serhiy Storchaka wrote:
I think than breaking O(1) expectation for indexing makes the implementation
significant incompatible with Python. Virtually all string operations in
Python operates with indices.
I don't use indexing on str
On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky wrote:
>> > But you need non-ASCII characters to display a title of MP3 track.
>
> Yes, but to display a title, you don't need to do codepoint access at
> random - you need to either take a block of memory (length in bytes) and
> do something with i
Hello,
On Thu, 5 Jun 2014 00:26:10 +1000
Chris Angelico wrote:
> On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka
> wrote:
> > 04.06.14 10:03, Chris Angelico написав(ла):
> >
> >> Right, which is why I don't like the idea. But you don't need
> >> non-ASCII characters to blink an LED or turn a
On Wed, Jun 04, 2014 at 01:38:57PM +0300, Paul Sokolovsky wrote:
> That's another reason why people don't like Unicode enforced upon them
Enforcing design and language decisions is the job of the programming
language. You might as well complain that Python forces C doubles as the
floating point
On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka wrote:
> 04.06.14 10:03, Chris Angelico написав(ла):
>
>> Right, which is why I don't like the idea. But you don't need
>> non-ASCII characters to blink an LED or turn a servo, and there is
>> significant resistance to the notion that appending a n
04.06.14 10:03, Chris Angelico написав(ла):
Right, which is why I don't like the idea. But you don't need
non-ASCII characters to blink an LED or turn a servo, and there is
significant resistance to the notion that appending a non-ASCII
character to a long ASCII-only string requires the whole str
On Wed, Jun 04, 2014 at 01:14:04PM +, Steve Dower wrote:
> I'm agree with Daniel. Directly indexing into text suggests an
> attempted optimization that is likely to be incorrect for a set of
> strings.
I'm afraid I don't understand this argument. The language semantics says
that a string i
On 4 June 2014 14:39, Serhiy Storchaka wrote:
> I think than breaking O(1) expectation for indexing makes the implementation
> significant incompatible with Python. Virtually all string operations in
> Python operates with indices.
I don't use indexing on strings except in rare situations. Sure I
MicroPython is going to be significantly incompatible with Python
anyway. But you should be able to run your mp code on regular Python.
On Wed, Jun 4, 2014 at 9:39 AM, Serhiy Storchaka wrote:
> 04.06.14 04:17, Steven D'Aprano написав(ла):
>
>> Would either of these trade-offs be acceptable while
1 - 100 of 128 matches
Mail list logo