On Mon, Mar 26, 2012 at 12:57 PM, Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 wrote:
> Let the regex engine help you advance the character counter.
>
> $ cat langs
> ΕλληνικάEnglish한국어日本語Русскийไทย
>
>
>
> $ cat langs.pl
> use 5.010;
> use strictures;
> use Unicode::UCD qw(charinfo);
>
> sub script
> How can I check what script a character belongs to?
$ perl -Mutf8 -MUnicode::UCD=charinfo -E'say charinfo(ord
"为")->{script}'
Han
Sanity checks:
$ perl -Mutf8 -E'say "为" =~ /\p{Han}/'
1
$ uniprops -a1 为 | ack Script
Script=Han
Script=Hani
> check if it is the same