Re: Is \p{EastAsianFullwidth} worth implementing?

Dan Kogai Wed, 18 Sep 2002 19:53:44 -0700

On Thursday, Sep 19, 2002, at 11:39 Asia/Tokyo, Autrijus Tang wrote:
> Hi there.  Recently I need to do some hacking based on the 
> EastAsianWidth
> property (cf. http://www.unicode.org/unicode/reports/tr11/) of unicode
> characters.  Naturally, I tried the regular expression \p{} and \P{} 
> syntax,
> with no avail.


Come to think of EastAsianWidth,  I needed that property when I wrote 
unidump (under Encode/bin, not installed by default).  It looks like as 
follows;

     # Generated out of lib/unicore/EastAsianWidth.txt
     # will it work ?
     #
     our $IsFullWidth =
         qr/^[
              \x{1100}-\x{1159}
              \x{115F}-\x{115F}
              \x{2329}-\x{232A}
              \x{2E80}-\x{2E99}
              \x{2E9B}-\x{2EF3}
              \x{2F00}-\x{2FD5}
              \x{2FF0}-\x{2FFB}
              \x{3000}-\x{303E}
              \x{3041}-\x{3096}
              \x{3099}-\x{30FF}
              \x{3105}-\x{312C}
              \x{3131}-\x{318E}
              \x{3190}-\x{31B7}
              \x{31F0}-\x{321C}
              \x{3220}-\x{3243}
              \x{3251}-\x{327B}
              \x{327F}-\x{32CB}
              \x{32D0}-\x{32FE}
              \x{3300}-\x{3376}
              \x{337B}-\x{33DD}
              \x{3400}-\x{4DB5}
              \x{4E00}-\x{9FA5}
              \x{33E0}-\x{33FE}
              \x{A000}-\x{A48C}
              \x{AC00}-\x{D7A3}
              \x{A490}-\x{A4C6}
              \x{F900}-\x{FA2D}
              \x{FA30}-\x{FA6A}
              \x{FE30}-\x{FE46}
              \x{FE49}-\x{FE52}
              \x{FE54}-\x{FE66}
              \x{FE68}-\x{FE6B}
              \x{FF01}-\x{FF60}
              \x{FFE0}-\x{FFE6}
              \x{20000}-\x{2A6D6}
          ]$/xo;

> Naturally, I can hack up a local patch to unicore/{Canonical,Exact}.pl
> and parse the yet-unused unicore/EastAsianWidth.txt to add the desired
> properties in, namely (better names welcome):
>
>       \p{En}          \p{EastAsianNeutral}
>       \p{Ea}          \p{EastAsianAmbiguous}
>       \p{Eh}          \p{EastAsianHalfwidth}
>       \p{Ew}          \p{EastAsianWide}
>       \p{Ef}          \p{EastAsianFullwidth}
>       \p{Ena}         \p{EastAsianNarrow}
>
> But as it overrides core modules's behaviours, I'd hesitate to release 
> it
> as a CPAN module (Unicode::EastAsianWidth), but rather suggest it to
> be included in core perl.
>
> Are there any hidden drawbacks or other problems with this idea?

Full/Half width was not supposed to be a part of character encoding 
ideally but we all know we need that in practice, especially when you 
need to render those chars nice and tidy in fixed-width fonts (that's 
why I came up w/ a quick and dirty hack above -- it's a unicode-savvy 
hexdump).  So I second the idea of adding East Asian Width properties 
SOMEHOW.

I said somehow because I am not so sure if it requires tweaking the 
core.  I think we can reached the goal in a same manner as my humble 
Encode::InCharset, a module I declined to add to Encode.

Dan the Man with Too Many Character Properties to Remember, Too Few to 
Feel Practical

Re: Is \p{EastAsianFullwidth} worth implementing?

Reply via email to