What to backup after corruption of code units?

2013-08-27 Thread Xue Fuqiao
Hi list, I'm reading Unicode 6.2.0 and have a question. In Section 2.5, Encoding Forms: For example, when randomly accessing a string, a program can find the boundary of a character with limited backup. In UTF-16, if a pointer points to a leading surrogate, a single backup is required. In

Re: What to backup after corruption of code units?

2013-08-27 Thread Bill Poser
backup in this context refers to moving to previous bytes in order to find the boundary between the previous, valid character, and the corrupted character that you have encountered. In other words if you have a string consisting of N bytes and at byte K you determine that the current sequence of

Re: What to backup after corruption of code units?

2013-08-27 Thread Stephan Stiller
All good replies It means the program needs to go back (a.k.a. back up) but I'd say backtracking would make for better wording in TUS. Stephan

RE: What to backup after corruption of code units?

2013-08-27 Thread Phillips, Addison
Back up here refers to decrementing the pointer in the string. If you have a string consisting of the following UTF-16 code units, for example: 00C0 0020 20AC D800 DC00 00C5 0 12 3 4 5 If you set the pointer to code unit number 4 (counting from 0), you'll be

Re: What to backup after corruption of code units?

2013-08-27 Thread Philippe Verdy
The term is probably badly chosen but it means that you must read backward from the start position. The term backup is not related to any data copying/saving operation. - in UTF-16 there's an error in your citation: if you find a leading surrogate (in 0xD800..0xDBFF), you are already at thecorrct