New submission from Gregory P. Smith: for bytes, \v (0x0b) is not considered a line break. for unicode, it is.
this traces back to the Objects/stringlib/ code where unicode defers to the decision made by Objects/unicodeobject.c's ascii_linebreak table which contains 7 line breaks in the 0..127 character range: static unsigned char ascii_linebreak[] = { 0, 0, 0, 0, 0, 0, 0, 0, /* 0x000A, * LINE FEED */ /* 0x000B, * LINE TABULATION */ /* 0x000C, * FORM FEED */ /* 0x000D, * CARRIAGE RETURN */ 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x001C, * FILE SEPARATOR */ /* 0x001D, * GROUP SEPARATOR */ /* 0x001E, * RECORD SEPARATOR */ 0, 0, 0, 0, 1, 1, 1, 0, Whereas Objects/stringlib/stringdefs.h used by only considers \r and \n. I think these should be consistent. But making this change likely breaks existing code in weird ways. This does come up when porting from 2 to 3 as a str '' type with one of those other characters in it was not broken by splitlines in 2.x but is broken by splitlines in 3.x. ---------- messages: 246538 nosy: gregory.p.smith priority: normal severity: normal status: open title: bytes and unicode splitlines() methods differ on what is a line break _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24601> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com