Re: [pypy-dev] Making the most of internal UTF8

2020-02-26 Thread Dan Stromberg
On Wed, Feb 26, 2020 at 10:41 AM Matt Billenstein via pypy-dev < pypy-dev@python.org> wrote: > You can think of 'u' as being the default in python3 where 'b' was the > default in python2 (not ascii) - but most stdlib functions would accept > bytes as strings. > > So in python3, you don't need 'u'

Re: [pypy-dev] Making the most of internal UTF8

2020-02-26 Thread Matt Billenstein via pypy-dev
On Wed, Feb 26, 2020 at 09:08:49AM -0600, Jerry Spicklemire wrote: > In other words, within the range of ASCII characters, the UTF8 representation  > is identical to the ASCII representation. So, does that mean we can put the > 'u'  > and 'b' prefix nightmares behind us? It would help some diehards

Re: [pypy-dev] Making the most of internal UTF8

2020-02-26 Thread Antonio Cuni
To expand Armin's answer, the two most "visible" effects for end users are: - some_unicode.encode('utf-8') is essentially for free (because it is already UTF-8 internally) - some_bytes.decode('utf-8') is very chep (it just needs to check that some_bytes is valid utf-8) ciao, Anto On Wed, Feb 26

Re: [pypy-dev] Making the most of internal UTF8

2020-02-26 Thread Armin Rigo
Hi Jerry, On Wed, 26 Feb 2020 at 16:09, Jerry Spicklemire wrote: > Is there a tutorial about how to best take advantage of PyPy's internal UTF8? For best or for worse, this is only an internal feature. It has no effect for the end user. In particular, Python programs written for PyPy3.6 and fo

[pypy-dev] Making the most of internal UTF8

2020-02-26 Thread Jerry Spicklemire
Is there a tutorial about how to best take advantage of PyPy's internal UTF8? The docs say the PyPy now uses UTF8 internally to represent unicode. So, for an old codger, that sounds like were are back to a point where ASCII characters just act normally again, like in Python v.2, since ASCII IS UT

Re: [pypy-dev] Help needed: are you running windows with a non-ascii interface?

2020-02-26 Thread Armin Rigo
Hi again, On Wed, 26 Feb 2020 at 14:28, Armin Rigo wrote: > In particular the first escaped character \Uf44f really should be > two characters, '\x92O', and there is similar mangling later. Also > the first of the two unicodes is much shorter on CPython3. Finally, > the very last character

Re: [pypy-dev] Help needed: are you running windows with a non-ascii interface?

2020-02-26 Thread Armin Rigo
Hi Matti, On Wed, 26 Feb 2020 at 11:59, Matti Picus wrote: > - check what pypy3 returns for time.tzname? There is no code to decode > it, so it is probably a sting of bytes. What encoding is it in? On a french Windows I get, in CPython 3.6, a tuple of two unicodes that seem correct; and on PyPy3

[pypy-dev] Help needed: are you running windows with a non-ascii interface?

2020-02-26 Thread Matti Picus
Someone on stackoverflow asked why PyPy cannot run pandas. Here is the error, reformatted from the garbled original: https://gist.github.com/mattip/374e8ba49e2dd2e2b0d5c46a5cd612ed While there was an off-by-one error with the conversion when building time.tzname from the OS c call, I think th