From: "Doug Ewell" <[EMAIL PROTECTED]>
> Jill Ramonsky <Jill dot Ramonsky at Aculab dot com> wrote: > > > Here's a better idea. > > Let's just stick with the idea that ANY C0 or C1 control has no place > > being anywhere in a line of text, and so any sequence of one or more > of > > them will be interpretted as a line-break! > > Tab? And <escape> ? (think about ANSI coloring sequences générated by your colored version of "ls" or "man" in Linux, or to ISO2022 charsets selectors). And <bell> ? And <so>, <si>, <dle> ? (think about ISO646 extension mechanisms, or about SJIS) And <us> ? (think about tabular text data in record sets: is a data-cell delimiter in a text data file a line-break?) There are quite a lot of encoding rules using controls which do not (and must not) imply a line break for these controls. An application may need to handle the conversion of these sequences using internal Unicode parsing and generation even if the resulting string is downcasted to a final 7bit or 8bit subset, or to insert non-textual sequences within Unicode strings (for example in attributed text). I also think this would be excessive to handle all C0 and C1 characters as line-breaks.

