Re: update ctype data to unicode 10

Ingo Schwarze Fri, 22 Feb 2019 04:29:08 -0800

Hi Andrew,

Lauri Tirkkonen wrote on Fri, Feb 22, 2019 at 01:57:01AM +0200:


> Hi, the recent perl-5.28.1 and related unicore update brought the
> unicode data from version 8.0.0 to version 10.0.0. That fixes some
> character classifications (eg. emoji characters gained East_Asian_Width
> value 'Wide', which causes them to correctly get a wcwidth() of 2). But
> the ctype source data needs to be regenerated with this new perl/unicore
> to gain the benefits.
> 
> So I've done just that:
>     cd /usr/src/share/locale/ctype && ./gen_ctype_utf8.pl > en_US.UTF-8.src
> and the resulting diff is below. You could obviously run this yourself -
> I'm only including the diff because it took quite a long time to run the
> script (177m08.01s real).
> 
> The resulting LC_CTYPE generated from this new source gives a wcwidth()
> of 2 to eg. U+1F3EE, as expected (it used to be 1 with unicode 8.0
> data).

I looked through the diff (not checking each and every line thoroughly,
but watching out for stuff that looks suspicious, and doing occasional
random sampling tests), and i did not spot anything that looks wrong.

Much of it is simply additions of new characters, which are unlikely
to cause regressions in the first place, and some parts change width
one to width two, which appears to be intended by what Lauri says above
and which certainly isn't very dangerous either.

So if the diff agrees with what you got, Andrew, it's OK schwarze@.

Yours,
  Ingo

Re: update ctype data to unicode 10

Reply via email to