D'uh! stupid bug:

> Is this the same code points identified by `str.isspace`?

>
> I haven't checked -- so I will:
>
> and the answer is no:
>
> wrong, the answer is yes:

$ python weird_spaces.py
x x x x᠎x x x x x x x x x x x xx x x xx
['x', 'x', 'x', 'x\u180ex', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x',
'x', 'x\u200bx', 'x', 'x', 'x\ufeffx']
out of 20, 17 were used as split chars
out of 20, 17 were True according to .isspace

That makes far more sense.

Since I'm doing this, the three that aren't are:

U+180E MONGOLIAN VOWEL SEPARATOR
U+200B ZERO WIDTH SPACE
U+FEFF ZERO WIDTH NO-BREAK SPACE

The Mongolian vowel separator makes some sense (not knowing Mongolian in
the least). Though I wonder what the point of a zero-width space is if it's
NOT going to be a separator?

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
#!/usr/bin/env python

weird_spaces = ("x\u0020x\u00A0x\u1680x\u180Ex\u2000x\u2001x\u2002"
                "x\u2003x\u2004x\u2005x\u2006x\u2007x\u2008x\u2009"
                "x\u200Ax\u200Bx\u202Fx\u205Fx\u3000x\uFEFFx")

print(weird_spaces)
splitted = weird_spaces.split()
print(splitted)

total_spacelike = (len(weird_spaces) - 1) // 2
num_split = len(splitted) - 1

print(f"out of {total_spacelike}, {num_split} were used as split chars")

isspace = [c.isspace() for c in weird_spaces if c != 'x']

# print(isspace)

print(f"out of {total_spacelike}, {sum(isspace)} were True according to .isspace")


_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/S3QCQSB2IZJ6CSR4IGXMJBL6NZN6YT6A/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to