On 31/10/17 17:23, Stefan Ram wrote:
Ned Batchelder <n...@nedbatchelder.com> writes:
    def wrapped_join(values, sep):

   Ok, here's a report on me seing non-breaking spaces in
   posts in this NG. I have written this report so that you
   can see that it's not my newsreader that is converting
   something, because there is no newsreader involved.

   Here are some relevant lines from Ned's above post:

|From: Ned Batchelder <n...@nedbatchelder.com>
|Newsgroups: comp.lang.python
|Subject: Re: How to join elements at the beginning and end of the list
|Message-ID: <mailman.95.1509464977.1490.python-l...@python.org>

Hm.  That suggests the mail-to-news gateway has a hand in things.

|Content-Type: text/plain; charset=utf-8; format=flowed
|Content-Transfer-Encoding: 8bit
|     def wrapped_join(values, sep):

[snippety snip]

|od -c tmp.txt
|...
|0012620   s   u   l   a   t   e       i   t   :  \n  \n       Â       Â
|0012640       Â           d   e   f       w   r   a   p   p   e   d   _
|...
|
|od -x tmp.txt
|...
|0012620 7573 616c 6574 6920 3a74 0a0a c220 c2a0
|0012640 c2a0 20a0 6564 2066 7277 7061 6570 5f64
|...

   And you can see, there are two octet pairs »c220« and
   »c2a0« in the post (directly preceding »def wrapped«).
   (Compare with the Content-Type and Content-Transfer-Encoding
   given above.) (Read table with a monospaced font:)

                         corresponding
Codepoint      UTF-8    ISO-8859-1      interpretation

U+0020?        c2 20    20?             SPACE?
U+00A0         c2 a0    a0              NON-BREAKING SPACE

   This makes it clear that there really are codepoints
   U+00A0 in what I get from the server, i.e., non-breaking
   spaces directly in front of »def wrapped«.

And? Why does that bother you? A non-breaking space is a perfectly valid thing to put into a UTF-8 encoded message. The 0xc2 0x20 byte pair that you misidentify as a space is another matter entirely.

0xc2 0x20 is not a space in UTF-8. It is an invalid code sequence. I don't know how or where it was generated, but it really shouldn't have been. It might have been Ned's MUA, or some obscure bug in the mail-to-news gateway. Does anyone in a position to know have any opinions?

--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to