--- In [email protected], "brucexs" <bswit...@...> wrote:
>
> I thought that Korean and other similar languages could have
> thousands of characters. How are these represented if there are
> only around 10 lead bytes and around 200 usable following bytes.
> Is 2000 enough? 

Potential lead bytes are between decimal 128 and 255, a lot more than 10 of 
them. Potential trail bytes are between decimal 64 and 255.  Lead byte decimal 
"characters" don't exist and can't be used in a single byte context.

I've been trying to understand what happens with the smart quote when forxtra 
copies '| from the hlp file to the clipboard. None of the double-byte character 
pages for code page 949 actually use trail bytes within the decimal range of 
91-96 or 123-128. So ' (decimal 146, a potential lead byte) has a trail byte | 
(decimal 124) that is not actually used in this particular code page. Since no 
double byte character of '| actually exists in the code page, I guess Windows 
sees that vertical bar in the file name. But the ' (lead byte) by itself would 
also be unacceptable, and while '_ would be acceptable it would be some other 
character. Ideally I guess you would substitute __ (two underscores) for '|

Japanese characters do not exempt the same decimal trail byte ranges as Korean. 
But for Korean, if there is a backslash or vertical bar anywhere in your result 
string, I think it will always still be an invalid file name even after 
completing your dbcs processing.

Regards,
Sheri

Reply via email to