On Thu, 11 Jun 2015 10:09 am, Chris Angelico wrote: > On Thu, Jun 11, 2015 at 3:11 AM, Steven D'Aprano <st...@pearwood.info> > wrote: >> (Oh, and for the record, there are at least two non-breaking spaces in >> Unicode, U+00A0 "NO-BREAK SPACE" and U+202F "NARROW NO-BREAK SPACE".) >> >> http://www.unicode.org/charts/PDF/U0080.pdf >> http://www.unicode.org/charts/PDF/U2000.pdf > > And U+FEFF "ZERO WIDTH NO-BREAK SPACE",
No, despite the name, that is not a space character, it is a formatting character. Due to Unicode's stability policy, the name is stuck forever, but it should not be treated as a space character: py> unicodedata.category(' ') 'Zs' py> unicodedata.category('\u00A0') # NBSP 'Zs' py> unicodedata.category('\uFEFF') # ZWNBSP 'Cf' Ideally, outside of the BOM, you should never come across a ZWNBSP. You should use U+2060 WORD JOINER instead. But if you do come across one outside of the BOM, it should be treated as a legitimate non-space character: http://www.unicode.org/faq/utf_bom.html#bom6 Although ZWNBSP is a "default ignorable" code point, I believe that the font is well within its rights to show it with a visible glyph: "Fonts can contain glyphs intended for visible display of default ignorable code points that would otherwise be rendered invisibly when not supported." http://www.unicode.org/faq/unsup_char.html > notable because it's also used as > the byte-order mark (as its counterpart, U+FFFE, is unallocated). I've > been fighting with VLC Media Player over the font it uses for subtitles; > for some bizarre reason, that font represents U+FEFF not with zero pixels > of emptiness, but with a box containing the letters "ZWN" "BSP" on two > lines. Yeah, because that totally takes up zero width and looks like blank > space. Why do the subtitles contain ZWNBSP in the first place? Surely they're not English subtitles? -- Steven -- https://mail.python.org/mailman/listinfo/python-list