Re: Python NBSP DWIM

2015-06-10 Thread Chris Angelico
On Thu, Jun 11, 2015 at 1:27 PM, Steven D'Aprano wrote: > On Thu, 11 Jun 2015 01:05 pm, Chris Angelico wrote: > [...] >>> Why do the subtitles contain ZWNBSP in the first place? Surely they're >>> not English subtitles? >> >> No, they're not :) The character comes up in the Cantonese and >> Japane

Re: Python NBSP DWIM

2015-06-10 Thread Steven D'Aprano
On Thu, 11 Jun 2015 01:05 pm, Chris Angelico wrote: [...] >> Why do the subtitles contain ZWNBSP in the first place? Surely they're >> not English subtitles? > > No, they're not :) The character comes up in the Cantonese and > Japanese subs for Once Upon A December. > > http://youtu.be/CEpcUeWP0b

Re: Python NBSP DWIM

2015-06-10 Thread Chris Angelico
On Thu, Jun 11, 2015 at 1:18 PM, wrote: > On Wed, Jun 10, 2015, at 23:05, Chris Angelico wrote: >> http://youtu.be/CEpcUeWP0bg >> http://youtu.be/WFZAaHrHens > > An example of the actual subtitle text would be more useful than a > youtube link to the video, since we're unlikely to be able to see

Re: Python NBSP DWIM

2015-06-10 Thread random832
On Wed, Jun 10, 2015, at 23:05, Chris Angelico wrote: > http://youtu.be/CEpcUeWP0bg > http://youtu.be/WFZAaHrHens An example of the actual subtitle text would be more useful than a youtube link to the video, since we're unlikely to be able to see what context the character appears in if our client

Re: Python NBSP DWIM

2015-06-10 Thread Chris Angelico
On Thu, Jun 11, 2015 at 12:26 PM, Steven D'Aprano wrote: > No, despite the name, that is not a space character, it is a formatting > character. Due to Unicode's stability policy, the name is stuck forever, > but it should not be treated as a space character: > > py> unicodedata.category(' ') > 'Zs

Re: Python NBSP DWIM

2015-06-10 Thread Steven D'Aprano
On Thu, 11 Jun 2015 10:09 am, Chris Angelico wrote: > On Thu, Jun 11, 2015 at 3:11 AM, Steven D'Aprano > wrote: >> (Oh, and for the record, there are at least two non-breaking spaces in >> Unicode, U+00A0 "NO-BREAK SPACE" and U+202F "NARROW NO-BREAK SPACE".) >> >> http://www.unicode.org/charts/PD

Re: Python NBSP DWIM

2015-06-10 Thread Chris Angelico
On Thu, Jun 11, 2015 at 11:02 AM, wrote: > > On Wed, Jun 10, 2015, at 20:09, Chris Angelico wrote: > > And U+FEFF "ZERO WIDTH NO-BREAK SPACE", notable because it's also used as > > the byte-order mark (as its counterpart, U+FFFE, is unallocated). I've > > been > > fighting with VLC Media Player ov

Re: Python NBSP DWIM

2015-06-10 Thread random832
On Wed, Jun 10, 2015, at 20:09, Chris Angelico wrote: > And U+FEFF "ZERO WIDTH NO-BREAK SPACE", notable because it's also used as > the byte-order mark (as its counterpart, U+FFFE, is unallocated). I've > been > fighting with VLC Media Player over the font it uses for subtitles; for > some bizarre

Re: Python NBSP DWIM

2015-06-10 Thread Chris Angelico
On Thu, Jun 11, 2015 at 3:11 AM, Steven D'Aprano wrote: > (Oh, and for the record, there are at least two non-breaking spaces in > Unicode, U+00A0 "NO-BREAK SPACE" and U+202F "NARROW NO-BREAK SPACE".) > > http://www.unicode.org/charts/PDF/U0080.pdf > http://www.unicode.org/charts/PDF/U2000.pdf An

Re: Python NBSP DWIM

2015-06-10 Thread Steven D'Aprano
On Thu, 11 Jun 2015 12:28 am, Skip Montanaro wrote: > On Wed, Jun 10, 2015 at 8:28 AM, Tim Chase > wrote: >> Is this a bug? > > Looks like it's been reported a few times with slightly different context: > > https://bugs.python.org/issue6537 > https://bugs.python.org/issue16623 > https://bugs.py

Re: Python NBSP DWIM

2015-06-10 Thread random832
On Wed, Jun 10, 2015, at 11:03, Laura Creighton wrote: > In these unicode days, this thinking may need to be revisited. There > are many languages where whitespace does not separate words -- either > words aren't separated, or in Vietnamese, spaces separate syllables, > so entire words have spaces

Re: Python NBSP DWIM

2015-06-10 Thread Laura Creighton
In a message of Wed, 10 Jun 2015 09:28:24 -0500, Skip Montanaro writes: >On Wed, Jun 10, 2015 at 8:28 AM, Tim Chase > wrote: >> Is this a bug? > >Looks like it's been reported a few times with slightly different context: > >https://bugs.python.org/issue6537 >https://bugs.python.org/issue16623 >http

Re: Python NBSP DWIM

2015-06-10 Thread Skip Montanaro
On Wed, Jun 10, 2015 at 8:28 AM, Tim Chase wrote: > Is this a bug? Looks like it's been reported a few times with slightly different context: https://bugs.python.org/issue6537 https://bugs.python.org/issue16623 https://bugs.python.org/issue20491 https://bugs.python.org/issue1390608 The couple t

Re: Python NBSP DWIM

2015-06-10 Thread Mark Lawrence
On 10/06/2015 14:28, Tim Chase wrote: str.split() doesn't seem to respect non-breaking space: Python 3.4.2 (default, Oct 8 2014, 10:45:20) [GCC 4.9.1] on linux Type "help", "copyright", "credits" or "license" for more information. >>> print(repr("hello\N{NO-BREAK SPACE}world".split(

Python NBSP DWIM

2015-06-10 Thread Tim Chase
str.split() doesn't seem to respect non-breaking space: Python 3.4.2 (default, Oct 8 2014, 10:45:20) [GCC 4.9.1] on linux Type "help", "copyright", "credits" or "license" for more information. >>> print(repr("hello\N{NO-BREAK SPACE}world".split())) ['hello', 'world'] What's the purpos