Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007: > On Sun, 2007-07-22 at 17:44 +0200, Peter Kleiweg wrote: > > > It's a feature. See help(str.split): "If sep is not specified or is > > > None, any whitespace string is a separator." > > > > Define "any whitespace". > > Any string for which isspace returns True.
Define white space to isspace() > > Why is it different in <type 'str'> and <type 'unicode'>? > > >>> '\xa0'.isspace() > False > >>> u'\xa0'.isspace() > True Here is another "space": >>> u'\uFEFF'.isspace() False isspace() is inconsistent > For byte strings, Python doesn't know whether 0xA0 is a whitespace > because it depends on the encoding whether the number 160 corresponds to > a whitespace character. For unicode strings, code point 160 is > unquestionably a whitespace, because it is a no-break SPACE. I question it. And so does the sre module: \s Matches any whitespace character; equivalent to [ \t\n\r\f\v] Where is the NO-BREAK SPACE in there? > > Why does split() split when it says NO-BREAK? > > Precisely. It says NO-BREAK. It doesn't say NO-SPLIT. That is a stupid answer. -- Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia) info: http://www.let.rug.nl/kleiweg/ls.html -- http://mail.python.org/mailman/listinfo/python-list