--- In [email protected], "brucexs" <bswit...@...> wrote: > > I thought that Korean and other similar languages could have > thousands of characters. How are these represented if there are > only around 10 lead bytes and around 200 usable following bytes. > Is 2000 enough?
Potential lead bytes are between decimal 128 and 255, a lot more than 10 of them. Potential trail bytes are between decimal 64 and 255. Lead byte decimal "characters" don't exist and can't be used in a single byte context. I've been trying to understand what happens with the smart quote when forxtra copies '| from the hlp file to the clipboard. None of the double-byte character pages for code page 949 actually use trail bytes within the decimal range of 91-96 or 123-128. So ' (decimal 146, a potential lead byte) has a trail byte | (decimal 124) that is not actually used in this particular code page. Since no double byte character of '| actually exists in the code page, I guess Windows sees that vertical bar in the file name. But the ' (lead byte) by itself would also be unacceptable, and while '_ would be acceptable it would be some other character. Ideally I guess you would substitute __ (two underscores) for '| Japanese characters do not exempt the same decimal trail byte ranges as Korean. But for Korean, if there is a backslash or vertical bar anywhere in your result string, I think it will always still be an invalid file name even after completing your dbcs processing. Regards, Sheri
