Re: Word boundaries

2012-03-27 Thread Zbigniew Łukasiak
On Mon, Mar 26, 2012 at 12:57 PM, Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 wrote: > Let the regex engine help you advance the character counter. > >    $ cat langs >    ΕλληνικάEnglish한국어日本語Русскийไทย > > > >    $ cat langs.pl >    use 5.010; >    use strictures; >    use Unicode::UCD qw(charinfo); > >    sub script

Re: Word boundaries

2012-03-26 Thread Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯
> How can I check what script a character belongs to? $ perl -Mutf8 -MUnicode::UCD=charinfo -E'say charinfo(ord "为")->{script}' Han Sanity checks: $ perl -Mutf8 -E'say "为" =~ /\p{Han}/' 1 $ uniprops -a1 为 | ack Script Script=Han Script=Hani > check if it is the same