2010/11/23 Alexander Belopolsky <alexander.belopol...@gmail.com>: > This discussion motivated me to start looking into how well Python > library itself is prepared to deal with len(chr(i)) = 2. I was not > surprised to find that textwrap does not handle the issue that well: > >>>> len(wrap(' \U00010140' * 80, 20)) > 12 >>>> len(wrap(' \U00000140' * 80, 20)) > 8 > > That module should probably be rewritten to properly implement the > Unicode line breaking algorithm > <http://unicode.org/reports/tr14/tr14-22.html>. > > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > >>>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: The fill character must be exactly one character long > > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them.
This was already discussed two years ago: http://mail.python.org/pipermail/python-dev/2008-July/080900.html So yes, wrap() and center() should be fixed. -- Amaury Forgeot d'Arc _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com