subject:"IDLE \"Codepage\" Switching\?"

Re: IDLE "Codepage" Switching?

2023-01-18 Thread Eryk Sun

On 1/17/23, Stephen Tucker  wrote:
>
> 1. Can anybody explain the behaviour in IDLE (Python version 2.7.10)
> reported below? (It seems that the way it renders a given sequence of bytes
> depends on the sequence.)

In 2.x, IDLE tries to decode a byte string via unicode() before
writing to the Tk text widget. However, if the locale encoding (e.g.
the process ANSI code page) fails to decode one or more characters,
IDLE lets Tk figure out how to decode the byte string.

Python 2.7 has an older version of Tk that has peculiar behavior on
Windows when bytes in the range 0x80-0xBF are written to a text box.
Bytes in this range get translated to native wide characters (16-bit
characters) in the halfwidth/fullwidth Unicode block, i.e. translated
to Unicode U+FF80 - U+FFBF.

If IDLE decodes using code page 1252, then the ordinals 0x81, 0x8d,
0x8f, 0x90 and 0x9d can't be decoded. IDLE thus passes the undecoded
byte string to Tk. The example you provided that demonstrates the
behavior contains ordinal 0x9d (157).

I get similar behavior for the other undefined ordinal values in code
page 1252. For example, using IDLE 2.7.18 on Windows:

>>> print '\x81\xa1'
ﾁﾡ
>>> print 'a\xa1'
a¡

In the first case, ordinal 0x81 causes decoding to fail in IDLE, so
the byte string is passed as is to Tk, which maps it to
'\uff81\uffa1'. In the second case, OTOH, "\xa1" is decoded by IDLE as
"¡".

> 2. Does the IDLE in Python 3.x behave the same way?

No, in 3.x only Unicode str() objects are written to the Tk text
widget. Moreover, the text widget doesn't have the same behavior in
newer versions. It ignores bytes in the control-block range 0x80-0x9F,
and it decodes bytes in the range 0xA0-0xBF normally.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: IDLE "Codepage" Switching?

2023-01-18 Thread Peter J. Holzer

On 2023-01-18 11:05:24 -0500, Thomas Passin wrote:
> On 1/18/2023 5:43 AM, Stephen Tucker wrote:
> > Thanks for these responses.
> > 
> > I was encouraged to read that I'm not the only one to find this all
> > confusing.
> > 
> > I have investigated a little further.
> > 
> > 1. I produced the following IDLE log:
> > 
> > > > > mylongstr = ""
> > > > > for thisCP in range (1, 256):
> > mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "
> > 
> > 
> > > > > print mylongstr
> > 1, 2, 3, 4, 5, 6, 7, 8, 9,
> >   10, 11, 12,
> >   13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
> > 31,   32, ! 33, " 34, # 35, $ 36, % 37, & 38, ' 39, ( 40, ) 41, * 42, + 43,
> > , 44, - 45, . 46, / 47, 0 48, 1 49, 2 50, 3 51, 4 52, 5 53, 6 54, 7 55, 8
> > 56, 9 57, : 58, ; 59, < 60, = 61, > 62, ? 63, @ 64, A 65, B 66, C 67, D 68,
> > E 69, F 70, G 71, H 72, I 73, J 74, K 75, L 76, M 77, N 78, O 79, P 80, Q
> > 81, R 82, S 83, T 84, U 85, V 86, W 87, X 88, Y 89, Z 90, [ 91, \ 92, ] 93,
> > ^ 94, _ 95, ` 96, a 97, b 98, c 99, d 100, e 101, f 102, g 103, h 104, i
> > 105, j 106, k 107, l 108, m 109, n 110, o 111, p 112, q 113, r 114, s 115,
> > t 116, u 117, v 118, w 119, x 120, y 121, z 122, { 123, | 124, } 125, ~
> > 126, 127, ﾀ 128, ﾁ 129, ﾂ 130, ﾃ 131, ﾄ 132, ﾅ 133, ﾆ 134, ﾇ 135, ﾈ 136, ﾉ
> > 137, ﾊ 138, ﾋ 139, ﾌ 140, ﾍ 141, ﾎ 142, ﾏ 143, ﾐ 144, ﾑ 145, ﾒ 146, ﾓ 147,
> > ﾔ 148, ﾕ 149, ﾖ 150, ﾗ 151, ﾘ 152, ﾙ 153, ﾚ 154, ﾛ 155, ﾜ 156, ﾝ 157, ﾞ
> > 158, ﾟ 159, ﾠ 160, ﾡ 161, ﾢ 162, ﾣ 163, ﾤ 164, ﾥ 165, ﾦ 166, ﾧ 167, ﾨ 168,
> > ﾩ 169, ﾪ 170, ﾫ 171, ﾬ 172, ﾭ 173, ﾮ 174, ﾯ 175, ﾰ 176, ﾱ 177, ﾲ 178, ﾳ
> > 179, ﾴ 180, ﾵ 181, ﾶ 182, ﾷ 183, ﾸ 184, ﾹ 185, ﾺ 186, ﾻ 187, ﾼ 188, ﾽ 189,
> > ﾾ 190, ﾿ 191, À 192, Á 193, Â 194, Ã 195, Ä 196, Å 197, Æ 198, Ç 199, È
> > 200, É 201, Ê 202, Ë 203, Ì 204, Í 205, Î 206, Ï 207, Ð 208, Ñ 209, Ò 210,
> > Ó 211, Ô 212, Õ 213, Ö 214, × 215, Ø 216, Ù 217, Ú 218, Û 219, Ü 220, Ý
> > 221, Þ 222, ß 223, à 224, á 225, â 226, ã 227, ä 228, å 229, æ 230, ç 231,
> > è 232, é 233, ê 234, ë 235, ì 236, í 237, î 238, ï 239, ð 240, ñ 241, ò
> > 242, ó 243, ô 244, õ 245, ö 246, ÷ 247, ø 248, ù 249, ú 250, û 251, ü 252,
> > ý 253, þ 254, ÿ 255,
> > > > > 
> > 
> > 2. I copied and pasted the IDLE log into a text file and ran a program on
> > it that told me about every byte in the log.
> > 
> > 3. I discovered the following:
> > 
> > Bytes 001 to 127 (01 to 7F hex) inclusive were printed as-is;

Which might mean that they are also UTF-8-encoded (there is no
difference between UTF-8-encoding and ASCII-encoding for these
characters).


> > Bytes 128 to 191 (80 to BF) inclusive were output as UTF-8-encoded
> > characters whose codepoints were FF00 hex more than the byte values (hence
> > the strange glyphs);
> > 
> > Bytes 192 to 255 (C0 to FF) inclusive were output as UTF-8-encoded
> > characters - without any offset being added to their codepoints in the
> > meantime!
> > 
> > I thought you might just be interested in this - there does seem to be some
> > method in IDLE's mind, at least.
> 
> This has nothing to do with IDLE.  The UTF-8 encoding of those code points
> uses two bytes instead of one.  See

That's not the peculiar thing. The peculiar thing is that characters
U+0080 to U+00BF are recoded to U+FF80 to U+FFBF (but U+00C0 to U+00FF
are printed normally).

I have no idea what's happening here. I can only urge Stephen to use
Python 3.x instead of Python 2.7. Python2 has been deprecated for years
has has reached its official end of life 3 years ago. There really
shouldn't be any reason to use Python 2.7 any more except
reverse-engineering old applications in order to port them to Python 3.

In particular, the type "str" is very different in Python2 and Python3.
In Python2 it is a sequence of bytes (similar to the Python3 type
"bytes") and in Python3 it is a sequence of (Unicode) characters
(similar to the Python2 type "unicode").

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: IDLE "Codepage" Switching?

2023-01-18 Thread Thomas Passin


On 1/18/2023 5:43 AM, Stephen Tucker wrote:

Thanks for these responses.

I was encouraged to read that I'm not the only one to find this all
confusing.

I have investigated a little further.

1. I produced the following IDLE log:


mylongstr = ""
for thisCP in range (1, 256):

mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "



print mylongstr

1, 2, 3, 4, 5, 6, 7, 8, 9,
  10, 11, 12,
  13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31,   32, ! 33, " 34, # 35, $ 36, % 37, & 38, ' 39, ( 40, ) 41, * 42, + 43,
, 44, - 45, . 46, / 47, 0 48, 1 49, 2 50, 3 51, 4 52, 5 53, 6 54, 7 55, 8
56, 9 57, : 58, ; 59, < 60, = 61, > 62, ? 63, @ 64, A 65, B 66, C 67, D 68,
E 69, F 70, G 71, H 72, I 73, J 74, K 75, L 76, M 77, N 78, O 79, P 80, Q
81, R 82, S 83, T 84, U 85, V 86, W 87, X 88, Y 89, Z 90, [ 91, \ 92, ] 93,
^ 94, _ 95, ` 96, a 97, b 98, c 99, d 100, e 101, f 102, g 103, h 104, i
105, j 106, k 107, l 108, m 109, n 110, o 111, p 112, q 113, r 114, s 115,
t 116, u 117, v 118, w 119, x 120, y 121, z 122, { 123, | 124, } 125, ~
126, 127, ﾀ 128, ﾁ 129, ﾂ 130, ﾃ 131, ﾄ 132, ﾅ 133, ﾆ 134, ﾇ 135, ﾈ 136, ﾉ
137, ﾊ 138, ﾋ 139, ﾌ 140, ﾍ 141, ﾎ 142, ﾏ 143, ﾐ 144, ﾑ 145, ﾒ 146, ﾓ 147,
ﾔ 148, ﾕ 149, ﾖ 150, ﾗ 151, ﾘ 152, ﾙ 153, ﾚ 154, ﾛ 155, ﾜ 156, ﾝ 157, ﾞ
158, ﾟ 159, ﾠ 160, ﾡ 161, ﾢ 162, ﾣ 163, ﾤ 164, ﾥ 165, ﾦ 166, ﾧ 167, ﾨ 168,
ﾩ 169, ﾪ 170, ﾫ 171, ﾬ 172, ﾭ 173, ﾮ 174, ﾯ 175, ﾰ 176, ﾱ 177, ﾲ 178, ﾳ
179, ﾴ 180, ﾵ 181, ﾶ 182, ﾷ 183, ﾸ 184, ﾹ 185, ﾺ 186, ﾻ 187, ﾼ 188, ﾽ 189,
ﾾ 190, ﾿ 191, À 192, Á 193, Â 194, Ã 195, Ä 196, Å 197, Æ 198, Ç 199, È
200, É 201, Ê 202, Ë 203, Ì 204, Í 205, Î 206, Ï 207, Ð 208, Ñ 209, Ò 210,
Ó 211, Ô 212, Õ 213, Ö 214, × 215, Ø 216, Ù 217, Ú 218, Û 219, Ü 220, Ý
221, Þ 222, ß 223, à 224, á 225, â 226, ã 227, ä 228, å 229, æ 230, ç 231,
è 232, é 233, ê 234, ë 235, ì 236, í 237, î 238, ï 239, ð 240, ñ 241, ò
242, ó 243, ô 244, õ 245, ö 246, ÷ 247, ø 248, ù 249, ú 250, û 251, ü 252,
ý 253, þ 254, ÿ 255,




2. I copied and pasted the IDLE log into a text file and ran a program on
it that told me about every byte in the log.

3. I discovered the following:

Bytes 001 to 127 (01 to 7F hex) inclusive were printed as-is;

Bytes 128 to 191 (80 to BF) inclusive were output as UTF-8-encoded
characters whose codepoints were FF00 hex more than the byte values (hence
the strange glyphs);

Bytes 192 to 255 (C0 to FF) inclusive were output as UTF-8-encoded
characters - without any offset being added to their codepoints in the
meantime!

I thought you might just be interested in this - there does seem to be some
method in IDLE's mind, at least.


This has nothing to do with IDLE.  The UTF-8 encoding of those code 
points uses two bytes instead of one.  See


https://stackoverflow.com/questions/8732025/why-degree-symbol-differs-from-utf-8-from-unicode#:~:text=UTF-8%20encodes%20the%20value%200xB0%20as%20two%20consecutive,on%20endianness%20(I%20suppose%20other%20orderings%20are%20possible).coding-in-vs-code-on-ubuntu-leading-to-unicode-error/62652695#62652695





Stephen Tucker.








On Wed, Jan 18, 2023 at 9:41 AM Peter J. Holzer  wrote:


On 2023-01-17 22:58:53 -0500, Thomas Passin wrote:

On 1/17/2023 8:46 PM, rbowman wrote:

On Tue, 17 Jan 2023 12:47:29 +, Stephen Tucker wrote:

2. Does the IDLE in Python 3.x behave the same way?


fwiw

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license()" for more

information.

str = ""
for c in range(140, 169):
  str += chr(c) + " "

print(str)
Œ   Ž ‘ ’ “ ” • – — ˜ ™ š › œ   ž Ÿ   ¡ ¢ £ ¤ ¥
¦ § ¨


I don't know how this will appear since Pan is showing the icon for a
character not in its set.  However, even with more undefined characters
the printable one do not change. I get the same output running Python3
from the terminal so it's not an IDLE thing.


I'm not sure what explanation is being asked for here.  Let's take

Python3,

so we can be sure that the strings are in unicode.  The font being used

by

the console isn't mentioned, but there's no reason it should have glyphs

for

any random unicode character.


Also note that the characters between 128 (U+0080) and 159 (U+009F)
inclusive aren't printable characters. They are control characters.

 hp

--
_  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"
--
https://mail.python.org/mailman/listinfo/python-list



--
https://mail.python.org/mailman/listinfo/python-list

Re: IDLE "Codepage" Switching?

2023-01-18 Thread Stephen Tucker

Thanks for these responses.

I was encouraged to read that I'm not the only one to find this all
confusing.

I have investigated a little further.

1. I produced the following IDLE log:

>>> mylongstr = ""
>>> for thisCP in range (1, 256):
mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "


>>> print mylongstr
1, 2, 3, 4, 5, 6, 7, 8, 9,
 10, 11, 12,
 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31,   32, ! 33, " 34, # 35, $ 36, % 37, & 38, ' 39, ( 40, ) 41, * 42, + 43,
, 44, - 45, . 46, / 47, 0 48, 1 49, 2 50, 3 51, 4 52, 5 53, 6 54, 7 55, 8
56, 9 57, : 58, ; 59, < 60, = 61, > 62, ? 63, @ 64, A 65, B 66, C 67, D 68,
E 69, F 70, G 71, H 72, I 73, J 74, K 75, L 76, M 77, N 78, O 79, P 80, Q
81, R 82, S 83, T 84, U 85, V 86, W 87, X 88, Y 89, Z 90, [ 91, \ 92, ] 93,
^ 94, _ 95, ` 96, a 97, b 98, c 99, d 100, e 101, f 102, g 103, h 104, i
105, j 106, k 107, l 108, m 109, n 110, o 111, p 112, q 113, r 114, s 115,
t 116, u 117, v 118, w 119, x 120, y 121, z 122, { 123, | 124, } 125, ~
126, 127, ﾀ 128, ﾁ 129, ﾂ 130, ﾃ 131, ﾄ 132, ﾅ 133, ﾆ 134, ﾇ 135, ﾈ 136, ﾉ
137, ﾊ 138, ﾋ 139, ﾌ 140, ﾍ 141, ﾎ 142, ﾏ 143, ﾐ 144, ﾑ 145, ﾒ 146, ﾓ 147,
ﾔ 148, ﾕ 149, ﾖ 150, ﾗ 151, ﾘ 152, ﾙ 153, ﾚ 154, ﾛ 155, ﾜ 156, ﾝ 157, ﾞ
158, ﾟ 159, ﾠ 160, ﾡ 161, ﾢ 162, ﾣ 163, ﾤ 164, ﾥ 165, ﾦ 166, ﾧ 167, ﾨ 168,
ﾩ 169, ﾪ 170, ﾫ 171, ﾬ 172, ﾭ 173, ﾮ 174, ﾯ 175, ﾰ 176, ﾱ 177, ﾲ 178, ﾳ
179, ﾴ 180, ﾵ 181, ﾶ 182, ﾷ 183, ﾸ 184, ﾹ 185, ﾺ 186, ﾻ 187, ﾼ 188, ﾽ 189,
ﾾ 190, ﾿ 191, À 192, Á 193, Â 194, Ã 195, Ä 196, Å 197, Æ 198, Ç 199, È
200, É 201, Ê 202, Ë 203, Ì 204, Í 205, Î 206, Ï 207, Ð 208, Ñ 209, Ò 210,
Ó 211, Ô 212, Õ 213, Ö 214, × 215, Ø 216, Ù 217, Ú 218, Û 219, Ü 220, Ý
221, Þ 222, ß 223, à 224, á 225, â 226, ã 227, ä 228, å 229, æ 230, ç 231,
è 232, é 233, ê 234, ë 235, ì 236, í 237, î 238, ï 239, ð 240, ñ 241, ò
242, ó 243, ô 244, õ 245, ö 246, ÷ 247, ø 248, ù 249, ú 250, û 251, ü 252,
ý 253, þ 254, ÿ 255,
>>>

2. I copied and pasted the IDLE log into a text file and ran a program on
it that told me about every byte in the log.

3. I discovered the following:

Bytes 001 to 127 (01 to 7F hex) inclusive were printed as-is;

Bytes 128 to 191 (80 to BF) inclusive were output as UTF-8-encoded
characters whose codepoints were FF00 hex more than the byte values (hence
the strange glyphs);

Bytes 192 to 255 (C0 to FF) inclusive were output as UTF-8-encoded
characters - without any offset being added to their codepoints in the
meantime!

I thought you might just be interested in this - there does seem to be some
method in IDLE's mind, at least.

Stephen Tucker.








On Wed, Jan 18, 2023 at 9:41 AM Peter J. Holzer  wrote:

> On 2023-01-17 22:58:53 -0500, Thomas Passin wrote:
> > On 1/17/2023 8:46 PM, rbowman wrote:
> > > On Tue, 17 Jan 2023 12:47:29 +, Stephen Tucker wrote:
> > > > 2. Does the IDLE in Python 3.x behave the same way?
> > >
> > > fwiw
> > >
> > > Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
> > > Type "help", "copyright", "credits" or "license()" for more
> information.
> > > str = ""
> > > for c in range(140, 169):
> > >  str += chr(c) + " "
> > >
> > > print(str)
> > > Œ   Ž ‘ ’ “ ” • – — ˜ ™ š › œ   ž Ÿ   ¡ ¢ £ ¤ ¥
> > > ¦ § ¨
> > >
> > >
> > > I don't know how this will appear since Pan is showing the icon for a
> > > character not in its set.  However, even with more undefined characters
> > > the printable one do not change. I get the same output running Python3
> > > from the terminal so it's not an IDLE thing.
> >
> > I'm not sure what explanation is being asked for here.  Let's take
> Python3,
> > so we can be sure that the strings are in unicode.  The font being used
> by
> > the console isn't mentioned, but there's no reason it should have glyphs
> for
> > any random unicode character.
>
> Also note that the characters between 128 (U+0080) and 159 (U+009F)
> inclusive aren't printable characters. They are control characters.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: IDLE "Codepage" Switching?

2023-01-18 Thread Peter J. Holzer

On 2023-01-17 22:58:53 -0500, Thomas Passin wrote:
> On 1/17/2023 8:46 PM, rbowman wrote:
> > On Tue, 17 Jan 2023 12:47:29 +, Stephen Tucker wrote:
> > > 2. Does the IDLE in Python 3.x behave the same way?
> > 
> > fwiw
> > 
> > Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
> > Type "help", "copyright", "credits" or "license()" for more information.
> > str = ""
> > for c in range(140, 169):
> >  str += chr(c) + " "
> > 
> > print(str)
> >                       ¡ ¢ £ ¤ ¥
> > ¦ § ¨
> > 
> > 
> > I don't know how this will appear since Pan is showing the icon for a
> > character not in its set.  However, even with more undefined characters
> > the printable one do not change. I get the same output running Python3
> > from the terminal so it's not an IDLE thing.
> 
> I'm not sure what explanation is being asked for here.  Let's take Python3,
> so we can be sure that the strings are in unicode.  The font being used by
> the console isn't mentioned, but there's no reason it should have glyphs for
> any random unicode character.

Also note that the characters between 128 (U+0080) and 159 (U+009F)
inclusive aren't printable characters. They are control characters.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: IDLE "Codepage" Switching?

2023-01-17 Thread Thomas Passin


On 1/17/2023 8:46 PM, rbowman wrote:

On Tue, 17 Jan 2023 12:47:29 +, Stephen Tucker wrote:


2. Does the IDLE in Python 3.x behave the same way?


fwiw

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license()" for more information.
str = ""
for c in range(157, 169):
 str += chr(c) + ""

 
print(str)

 ¡¢£¤¥¦§¨
str = ""
for c in range(140, 169):
 str += chr(c) + " "

 
print(str)

                      ¡ ¢ £ ¤ ¥
¦ § ¨


I don't know how this will appear since Pan is showing the icon for a
character not in its set.  However, even with more undefined characters
the printable one do not change. I get the same output running Python3
from the terminal so it's not an IDLE thing.


I'm not sure what explanation is being asked for here.  Let's take 
Python3, so we can be sure that the strings are in unicode.  The font 
being used by the console isn't mentioned, but there's no reason it 
should have glyphs for any random unicode character.  In my case, I see 
the same missing and printable characters as in the previous post 
(above).  The font is Source Code Pro Medium.


Changing the console's code page won't magically provide the missing glyphs.

I wrote these characters to a file using utf-8 encoding and opened it in 
an editor that recognized the content as utf-8 (EditPlus).  It displayed 
the same characters but had fewer leading spaces (i.e., missing glyphs), 
and did not show any default "missing-character" glyphs.  The editor is 
using the Cousine font.


The second factor that could be in play is what the default character 
encoding is, which is set by Windows and could be different in different 
places (locales).  I don't recall just now how Python3 handles this. 
Since Python2 strings are not unicode unless specified, and Python2 
probably handles the locale/default encoding differently from Python3, 
it would not be a surprise if the two give different results.


If you print such a Python2 string, you will get glyphs for (non-ascii) 
ord(chr) > 127 that come from the Windows code page table, which will be 
different from what Python3 will display.


Python3 uses Windows Unicode API functions, and isn't subject to the 
same limitations as Python2 was - Python2 had to go though the Windows 
code page apparatus and didn't use the Unicode API.  See PEP 528 - 
https://peps.python.org/pep-0528/)


IDLE sets up its own window itself, and probably uses a different font 
from the default Windows console, so there could be some differences 
there too, especially as to whether missing glyphs show a visible symbol 
or not.


Code Page 65001 was often claimed to be for utf-8.  It's not really 
correct in general, but it's OK for many utf-8 characters.  But in 
Python2, the codecs module does not know about code page 65001 - unless 
you apply a simple patch - so if you try to set the console to cp65001, 
you cannot get anything printed.  You get an exception raised instead.


Yes, it's all confusing, and especially with Python2.


--
https://mail.python.org/mailman/listinfo/python-list

Re: IDLE "Codepage" Switching?

2023-01-17 Thread rbowman

On Tue, 17 Jan 2023 12:47:29 +, Stephen Tucker wrote:

> 2. Does the IDLE in Python 3.x behave the same way?

fwiw

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license()" for more information.
str = ""
for c in range(157, 169):
str += chr(c) + ""


print(str)
 ¡¢£¤¥¦§¨
str = ""
for c in range(140, 169):
str += chr(c) + " "


print(str)
                      ¡ ¢ £ ¤ ¥ 
¦ § ¨ 


I don't know how this will appear since Pan is showing the icon for a 
character not in its set.  However, even with more undefined characters 
the printable one do not change. I get the same output running Python3 
from the terminal so it's not an IDLE thing.
-- 
https://mail.python.org/mailman/listinfo/python-list

IDLE "Codepage" Switching?

2023-01-17 Thread Stephen Tucker

I have four questions.

1. Can anybody explain the behaviour in IDLE (Python version 2.7.10)
reported below? (It seems that the way it renders a given sequence of bytes
depends on the sequence.)

2. Does the IDLE in Python 3.x behave the same way?

3. If it does, is this as it should behave?

4. If it is, then why is it as it should behave?
==
>>> mylongstr = ""
>>> for thisCP in range (157, 169):
mylongstr += chr (thisCP) + " "


>>> print mylongstr
ﾝ ﾞ ﾟ ﾠ ﾡ ﾢ ﾣ ﾤ ﾥ ﾦ ﾧ ﾨ
>>> mylongstr = ""
>>> for thisCP in range (158, 169):
mylongstr += chr (thisCP) + " "


>>> print mylongstr
ž Ÿ   ¡ ¢ £ ¤ ¥ ¦ § ¨
>>> mylongstr = ""
>>> for thisCP in range (157, 169):
mylongstr += chr (thisCP) + " "


>>> print mylongstr
ﾝ ﾞ ﾟ ﾠ ﾡ ﾢ ﾣ ﾤ ﾥ ﾦ ﾧ ﾨ
==

Stephen Tucker.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: IDLE "Codepage" Switching?

Re: IDLE "Codepage" Switching?

Re: IDLE "Codepage" Switching?

Re: IDLE "Codepage" Switching?

Re: IDLE "Codepage" Switching?

Re: IDLE "Codepage" Switching?

Re: IDLE "Codepage" Switching?

IDLE "Codepage" Switching?

8 matches

Site Navigation

Mail list logo

Footer information