Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
2017-07-25 0:35 GMT+02:00 Doug Ewell via Unicode : > J Decker wrote: > > > I generally accepted any utf-8 encoding up to 31 bits though ( since > > I was going from the original spec, and not what was effective limit > > based on unicode codepoint space) > > Hey, everybody:

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Doug Ewell via Unicode
J Decker wrote: > I generally accepted any utf-8 encoding up to 31 bits though ( since > I was going from the original spec, and not what was effective limit > based on unicode codepoint space) Hey, everybody: Don't do that. UTF-8 has been constrained to the Unicode code space (maximum

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread J Decker via Unicode
On Mon, Jul 24, 2017 at 1:50 PM, Philippe Verdy wrote: > 2017-07-24 21:12 GMT+02:00 J Decker via Unicode : > >> >> >> If you don't have that last position in a variable, just use 3 tests but > NO loop at all: if all 3 tests are failing, you know the input

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
2017-07-24 22:50 GMT+02:00 Philippe Verdy : > 2017-07-24 21:12 GMT+02:00 J Decker via Unicode : > >> >> >> On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode < >> unicode@unicode.org> wrote: >> >>> Hi Folks, >>> >>> 2. (Bug) The sending

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
2017-07-24 21:12 GMT+02:00 J Decker via Unicode : > > > On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode < > unicode@unicode.org> wrote: > >> Hi Folks, >> >> 2. (Bug) The sending application performs the folding process - inserts >> CRLF plus white space

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread J Decker via Unicode
On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode < unicode@unicode.org> wrote: > Hi Folks, > > 2. (Bug) The sending application performs the folding process - inserts > CRLF plus white space characters - and the receiving application does the > unfolding process but doesn't

RE: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Costello, Roger L. via Unicode
Hi Folks, Thank you very much for your fantastic comments! Below I summarized the issue and your comments. At the bottom is a set of proposed requirements (for my clients) on applications that receive iCalendar files. Some questions: - Have I captured all your comments? Any more comments? -

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Doug Ewell via Unicode
Costello, Roger L. wrote: > Suppose an application splits a UTF-8 multi-octet sequence. The > application then sends the split sequence to a client. The client must > restore the original sequence. > > Question: is it possible to split a UTF-8 multi-octet sequence in such > a way that the client

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
Also note that the maximum line-length in that RFC is a SHOULD and not a MUST. This is intended to give a reasonable hint for the limit used in implementations that process data in the given format: The RFC suggests a maximum line length of 75 "characters", excluding the CRLF+SPACE continuation

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
But at the same time that RFC makes a direct reference as UTF-8 as being the default charset, so an implementation of the RFC cannot be agnostic to what is UTF-8 and will not break in the middle of a conforming UTF-8 sequence. When the limit is reached, that implementations knows that it cannot

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Steffen Nurpmeso via Unicode
"Costello, Roger L. via Unicode" wrote: |Suppose an application splits a UTF-8 multi-octet sequence. The application \ |then sends the split sequence to a client. The client must restore \ |the original sequence. | |Question: is it possible to split a UTF-8 multi-octet

Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Costello, Roger L. via Unicode
Hello Unicode Experts! Suppose an application splits a UTF-8 multi-octet sequence. The application then sends the split sequence to a client. The client must restore the original sequence. Question: is it possible to split a UTF-8 multi-octet sequence in such a way that the client cannot