Hi all I have just joined this list, and have not searched the archives, so apologies if this has been discussed.
It happens in v168 and in v172, as far as i have seen. This does not happen with plain ascii, nor unicode files saved as UTF-8, only with files saved as UTF-16. It looks as though the buffer stores unicode as utf-8, and i guess the error may be in conversion from utf-8 to utf-16 when saving. It does not happen in the buffer; you do not see the effect until you make a change, save the file then reload it. Here are two examples; the char on the 1st line (Cyrillic) was changed to the char on the 2nd line. char codepoint utf8 bytes м x043C D0BC ¼ x00BC C2BC char codepoint utf8 bytes в x0432 D0B2 ² x00B2 C2B2 The character that changed was at offset 131072 ie, 128kB into the file. When i make the file bigger, it happens in 2 places: at 128k and also at 256k. This change seems to happen only if the 2 bytes of the character straddle the 128kB boundary. If the boundary is between characters the change does not happen. The 128kB is of bytes in the utf-8 buffer, not bytes in the utf-16 file on disk. However, in v172 another change occurs at the point of 128kB of bytes on disk. I did not /notice/ this in v168. This change is not dependent on a character straddling the 128kB boundary, it always happens. It may be that these changes only happen if characters at the boundary are multi-byte, and not if they are single-byte. I used SciTE to manipulate some text that was to be published as subtitles on videos. Fortunately a proof-reader noticed the oddities before too much damage was done. It happened in 4 different files. Regards Jim PS How i figure SciTE is using UTF8 in its buffer: select a curly quote “ and it says 3 selected, select a cyrillic char, eg в, and it says 2 selected, select an ascii char and it says 1 selected; the selection size shown matches the bytes of UTF8. How i figured the offset of the bad character: put cursor before bad char, do shift+control+home, and read the "Selected" number in the status bar. it showed 131071. ( == x1FFFF; +1 -> x20000 == 128k ) To find the byte offset in the utf-16 format on disk, I opened the file in UltraEdit v9, and viewed as Hex. jh Send instant messages to your online friends http://au.messenger.yahoo.com _______________________________________________ Scite-interest mailing list [email protected] http://mailman.lyra.org/mailman/listinfo/scite-interest
