2006/8/21, bill lam <[EMAIL PROTECTED]>:
June Kim wrote:
> There were a few postings on box drawing with non-ascii characters,
> especially those whose font width shown on the screen is different
> from its utf-8 encoded character's byte length. (for example, Hangul
> is shown as two-ascii-character wide and its utf-8 encoded byte length
> is 3, so the boxing is broken)
>
> It looks like solving this problem wouldn't happen quickly. Hence, I'd
> like to have a pure J boxdraw verb and modify it as to handle those
> special cases(when the character is Korean, treat it like length 2
> character, and etc). Do you have a pure J boxdraw verb?
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
I guess it would be a rather difficult problem if you have text of korean mixed
with numerals/latin alphabets whose width is 1 so that alignment problem is
still there for both ascii or unicode drawing box.
--
regards,
bill
You are right, bill. However, there are two options:
1. I usually never handle other kinds of high-bit characters other
than Hangul. So I can easily test(if value > 128) if it's Hangul or
others.
2. If I decode the characters into unicode, I can easily decide from
checking the codepoint residing in the Hangul range(ac00 - d79f).
Once I know whether it's Hangul or the others, and the fact that a
Korean character is two-character wide in ascii letters, it's easy to
do the rest.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm