Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007: > On Sun, 2007-07-22 at 17:15 +0200, Peter Kleiweg wrote: > > Is this a bug or a feature? > > > > > > Python 2.4.4 (#1, Oct 19 2006, 11:55:22) > > [GCC 2.95.3 20010315 (SuSE)] on linux2 > > > > >>> a = 'a b c\240d e' > > >>> a > > 'a b c\xa0d e' > > >>> a.split() > > ['a', 'b', 'c\xa0d', 'e'] > > >>> a = a.decode('latin-1') > > >>> a > > u'a b c\xa0d e' > > >>> a.split() > > [u'a', u'b', u'c', u'd', u'e'] > > It's a feature. See help(str.split): "If sep is not specified or is > None, any whitespace string is a separator."
Define "any whitespace". Why is it different in <type 'str'> and <type 'unicode'>? Why does split() split when it says NO-BREAK? -- Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia) info: http://www.let.rug.nl/kleiweg/ls.html -- http://mail.python.org/mailman/listinfo/python-list