[FYI] locales resources and issues

2002-09-18 Thread Jarkko Hietaniemi

Found this nice resource, I especially like the list of issues
(which is much longer than the list of advantages...)

http://www.i18nguy.com/locales/locale-resources.html
http://www.i18nguy.com/locales/index.html

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen



Is \p{EastAsianFullwidth} worth implementing?

2002-09-18 Thread Autrijus Tang

Hi there.  Recently I need to do some hacking based on the EastAsianWidth
property (cf. http://www.unicode.org/unicode/reports/tr11/) of unicode
characters.  Naturally, I tried the regular expression \p{} and \P{} syntax,
with no avail.

Naturally, I can hack up a local patch to unicore/{Canonical,Exact}.pl
and parse the yet-unused unicore/EastAsianWidth.txt to add the desired
properties in, namely (better names welcome):

\p{En}  \p{EastAsianNeutral}
\p{Ea}  \p{EastAsianAmbiguous}
\p{Eh}  \p{EastAsianHalfwidth}
\p{Ew}  \p{EastAsianWide}
\p{Ef}  \p{EastAsianFullwidth}
\p{Ena} \p{EastAsianNarrow}

But as it overrides core modules's behaviours, I'd hesitate to release it
as a CPAN module (Unicode::EastAsianWidth), but rather suggest it to
be included in core perl.

Are there any hidden drawbacks or other problems with this idea?

Thanks,
/Autrijus/



msg01446/pgp0.pgp
Description: PGP signature


Re: Is \p{EastAsianFullwidth} worth implementing?

2002-09-18 Thread Dan Kogai

On Thursday, Sep 19, 2002, at 11:39 Asia/Tokyo, Autrijus Tang wrote:
> Hi there.  Recently I need to do some hacking based on the 
> EastAsianWidth
> property (cf. http://www.unicode.org/unicode/reports/tr11/) of unicode
> characters.  Naturally, I tried the regular expression \p{} and \P{} 
> syntax,
> with no avail.

Come to think of EastAsianWidth,  I needed that property when I wrote 
unidump (under Encode/bin, not installed by default).  It looks like as 
follows;

 # Generated out of lib/unicore/EastAsianWidth.txt
 # will it work ?
 #
 our $IsFullWidth =
 qr/^[
  \x{1100}-\x{1159}
  \x{115F}-\x{115F}
  \x{2329}-\x{232A}
  \x{2E80}-\x{2E99}
  \x{2E9B}-\x{2EF3}
  \x{2F00}-\x{2FD5}
  \x{2FF0}-\x{2FFB}
  \x{3000}-\x{303E}
  \x{3041}-\x{3096}
  \x{3099}-\x{30FF}
  \x{3105}-\x{312C}
  \x{3131}-\x{318E}
  \x{3190}-\x{31B7}
  \x{31F0}-\x{321C}
  \x{3220}-\x{3243}
  \x{3251}-\x{327B}
  \x{327F}-\x{32CB}
  \x{32D0}-\x{32FE}
  \x{3300}-\x{3376}
  \x{337B}-\x{33DD}
  \x{3400}-\x{4DB5}
  \x{4E00}-\x{9FA5}
  \x{33E0}-\x{33FE}
  \x{A000}-\x{A48C}
  \x{AC00}-\x{D7A3}
  \x{A490}-\x{A4C6}
  \x{F900}-\x{FA2D}
  \x{FA30}-\x{FA6A}
  \x{FE30}-\x{FE46}
  \x{FE49}-\x{FE52}
  \x{FE54}-\x{FE66}
  \x{FE68}-\x{FE6B}
  \x{FF01}-\x{FF60}
  \x{FFE0}-\x{FFE6}
  \x{2}-\x{2A6D6}
  ]$/xo;

> Naturally, I can hack up a local patch to unicore/{Canonical,Exact}.pl
> and parse the yet-unused unicore/EastAsianWidth.txt to add the desired
> properties in, namely (better names welcome):
>
>   \p{En}  \p{EastAsianNeutral}
>   \p{Ea}  \p{EastAsianAmbiguous}
>   \p{Eh}  \p{EastAsianHalfwidth}
>   \p{Ew}  \p{EastAsianWide}
>   \p{Ef}  \p{EastAsianFullwidth}
>   \p{Ena} \p{EastAsianNarrow}
>
> But as it overrides core modules's behaviours, I'd hesitate to release 
> it
> as a CPAN module (Unicode::EastAsianWidth), but rather suggest it to
> be included in core perl.
>
> Are there any hidden drawbacks or other problems with this idea?

Full/Half width was not supposed to be a part of character encoding 
ideally but we all know we need that in practice, especially when you 
need to render those chars nice and tidy in fixed-width fonts (that's 
why I came up w/ a quick and dirty hack above -- it's a unicode-savvy 
hexdump).  So I second the idea of adding East Asian Width properties 
SOMEHOW.

I said somehow because I am not so sure if it requires tweaking the 
core.  I think we can reached the goal in a same manner as my humble 
Encode::InCharset, a module I declined to add to Encode.

Dan the Man with Too Many Character Properties to Remember, Too Few to 
Feel Practical