On Thursday, Sep 19, 2002, at 11:39 Asia/Tokyo, Autrijus Tang wrote: > Hi there. Recently I need to do some hacking based on the > EastAsianWidth > property (cf. http://www.unicode.org/unicode/reports/tr11/) of unicode > characters. Naturally, I tried the regular expression \p{} and \P{} > syntax, > with no avail.
Come to think of EastAsianWidth, I needed that property when I wrote unidump (under Encode/bin, not installed by default). It looks like as follows; # Generated out of lib/unicore/EastAsianWidth.txt # will it work ? # our $IsFullWidth = qr/^[ \x{1100}-\x{1159} \x{115F}-\x{115F} \x{2329}-\x{232A} \x{2E80}-\x{2E99} \x{2E9B}-\x{2EF3} \x{2F00}-\x{2FD5} \x{2FF0}-\x{2FFB} \x{3000}-\x{303E} \x{3041}-\x{3096} \x{3099}-\x{30FF} \x{3105}-\x{312C} \x{3131}-\x{318E} \x{3190}-\x{31B7} \x{31F0}-\x{321C} \x{3220}-\x{3243} \x{3251}-\x{327B} \x{327F}-\x{32CB} \x{32D0}-\x{32FE} \x{3300}-\x{3376} \x{337B}-\x{33DD} \x{3400}-\x{4DB5} \x{4E00}-\x{9FA5} \x{33E0}-\x{33FE} \x{A000}-\x{A48C} \x{AC00}-\x{D7A3} \x{A490}-\x{A4C6} \x{F900}-\x{FA2D} \x{FA30}-\x{FA6A} \x{FE30}-\x{FE46} \x{FE49}-\x{FE52} \x{FE54}-\x{FE66} \x{FE68}-\x{FE6B} \x{FF01}-\x{FF60} \x{FFE0}-\x{FFE6} \x{20000}-\x{2A6D6} ]$/xo; > Naturally, I can hack up a local patch to unicore/{Canonical,Exact}.pl > and parse the yet-unused unicore/EastAsianWidth.txt to add the desired > properties in, namely (better names welcome): > > \p{En} \p{EastAsianNeutral} > \p{Ea} \p{EastAsianAmbiguous} > \p{Eh} \p{EastAsianHalfwidth} > \p{Ew} \p{EastAsianWide} > \p{Ef} \p{EastAsianFullwidth} > \p{Ena} \p{EastAsianNarrow} > > But as it overrides core modules's behaviours, I'd hesitate to release > it > as a CPAN module (Unicode::EastAsianWidth), but rather suggest it to > be included in core perl. > > Are there any hidden drawbacks or other problems with this idea? Full/Half width was not supposed to be a part of character encoding ideally but we all know we need that in practice, especially when you need to render those chars nice and tidy in fixed-width fonts (that's why I came up w/ a quick and dirty hack above -- it's a unicode-savvy hexdump). So I second the idea of adding East Asian Width properties SOMEHOW. I said somehow because I am not so sure if it requires tweaking the core. I think we can reached the goal in a same manner as my humble Encode::InCharset, a module I declined to add to Encode. Dan the Man with Too Many Character Properties to Remember, Too Few to Feel Practical