Re: [PATCH] libcpp: Update cpp_wcwidth() to Unicode 13.0.0

2020-11-07 Thread Lewis Hyatt via Gcc-patches
On Fri, Nov 6, 2020 at 12:46 PM Jeff Law  wrote:
>
>
> On 10/23/20 9:01 AM, Lewis Hyatt via Gcc-patches wrote:
> > Hello-
> >
> > The attached patch updates cpp_wcwidth() (for computation of display
> > widths needed to calculate column numbers in diagnostics) from Unicode 12
> > to Unicode 13. The patch was purely mechanical, following the directions
> > in contrib/unicode/README without any unexpected hiccups. A couple
> > questions please:
> >
> > -Is it OK for master?
>
> Yes, it is OK for the trunk.  Please go ahead and commit it.
>
>
> >
> > -Unicode 13 actually came out just immediately before GCC 10 was
> >  released. Would it make sense to put this on GCC 10 branch as well?
>
> I wouldn't.  The general guidance is that we fix regressions on the
> release branches and this wouldn't qualify.
>

Got it, thanks! All set now.

-Lewis


-Lewis


Re: [PATCH] libcpp: Update cpp_wcwidth() to Unicode 13.0.0

2020-11-06 Thread Jeff Law via Gcc-patches


On 10/23/20 9:01 AM, Lewis Hyatt via Gcc-patches wrote:
> Hello-
>
> The attached patch updates cpp_wcwidth() (for computation of display
> widths needed to calculate column numbers in diagnostics) from Unicode 12
> to Unicode 13. The patch was purely mechanical, following the directions
> in contrib/unicode/README without any unexpected hiccups. A couple
> questions please:
>
> -Is it OK for master?

Yes, it is OK for the trunk.  Please go ahead and commit it.


>
> -Unicode 13 actually came out just immediately before GCC 10 was
>  released. Would it make sense to put this on GCC 10 branch as well?

I wouldn't.  The general guidance is that we fix regressions on the
release branches and this wouldn't qualify. 


Jeff




[PATCH] libcpp: Update cpp_wcwidth() to Unicode 13.0.0

2020-10-23 Thread Lewis Hyatt via Gcc-patches
Hello-

The attached patch updates cpp_wcwidth() (for computation of display
widths needed to calculate column numbers in diagnostics) from Unicode 12
to Unicode 13. The patch was purely mechanical, following the directions
in contrib/unicode/README without any unexpected hiccups. A couple
questions please:

-Is it OK for master?

-Unicode 13 actually came out just immediately before GCC 10 was
 released. Would it make sense to put this on GCC 10 branch as well?

Thanks!

-Lewis
generated_cpp_wcwidth.h was regenerated using Unicode 13.0.0 data files. No
material changes to the parsing scripts (either GCC- or glibc-sourced) were
necessary; glibc's utf8_gen.py was tweaked slightly by glibc and matched here.

contrib/ChangeLog:

* unicode/EastAsianWidth.txt: Update to Unicode 13.0.0.
* unicode/PropList.txt: Likewise.
* unicode/README: Likewise.
* unicode/UnicodeData.txt: Likewise.
* unicode/from_glibc/unicode_utils.py: Update to latest glibc version.
* unicode/from_glibc/utf8_gen.py: Likewise.

libcpp/ChangeLog:

* generated_cpp_wcwidth.h: Regenerated from Unicode 13.0.0 data.

diff --git a/contrib/unicode/EastAsianWidth.txt 
b/contrib/unicode/EastAsianWidth.txt
index 94d55d6654a..b43aec92738 100644
--- a/contrib/unicode/EastAsianWidth.txt
+++ b/contrib/unicode/EastAsianWidth.txt
@@ -1,6 +1,6 @@
-# EastAsianWidth-12.1.0.txt
-# Date: 2019-03-31, 22:01:58 GMT [KW, LI]
-# © 2019 Unicode®, Inc.
+# EastAsianWidth-13.0.0.txt
+# Date: 2029-01-21, 18:14:00 GMT [KW, LI]
+# © 2020 Unicode®, Inc.
 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in 
the U.S. and other countries.
 # For terms of use, see http://www.unicode.org/terms_of_use.html
 #
@@ -9,7 +9,7 @@
 #
 # East_Asian_Width Property
 #
-# This file is an informative contributory data file in the
+# This file is a normative contributory data file in the
 # Unicode Character Database.
 #
 # The format is two fields separated by a semicolon.
@@ -332,7 +332,7 @@
 085E;N   # Po MANDAIC PUNCTUATION
 0860..086A;N # Lo[11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER 
MALAYALAM SSA
 08A0..08B4;N # Lo[21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC 
LETTER KAF WITH DOT BELOW
-08B6..08BD;N # Lo [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC 
LETTER AFRICAN NOON
+08B6..08C7;N # Lo[18] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC 
LETTER LAM WITH SMALL ARABIC LETTER TAH ABOVE
 08D3..08E1;N # Mn[15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN 
SAFHA
 08E2;N   # Cf ARABIC DISPUTED END OF AYAH
 08E3..08FF;N # Mn[29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS 
NOON GHUNNA
@@ -450,7 +450,7 @@
 0B47..0B48;N # Mc [2] ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
 0B4B..0B4C;N # Mc [2] ORIYA VOWEL SIGN O..ORIYA VOWEL SIGN AU
 0B4D;N   # Mn ORIYA SIGN VIRAMA
-0B56;N   # Mn ORIYA AI LENGTH MARK
+0B55..0B56;N # Mn [2] ORIYA SIGN OVERLINE..ORIYA AI LENGTH MARK
 0B57;N   # Mc ORIYA AU LENGTH MARK
 0B5C..0B5D;N # Lo [2] ORIYA LETTER RRA..ORIYA LETTER RHA
 0B5F..0B61;N # Lo [3] ORIYA LETTER YYA..ORIYA LETTER VOCALIC LL
@@ -529,7 +529,7 @@
 0CF1..0CF2;N # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN 
UPADHMANIYA
 0D00..0D01;N # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA 
ABOVE..MALAYALAM SIGN CANDRABINDU
 0D02..0D03;N # Mc [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA
-0D05..0D0C;N # Lo [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L
+0D04..0D0C;N # Lo [9] MALAYALAM LETTER VEDIC ANUSVARA..MALAYALAM 
LETTER VOCALIC L
 0D0E..0D10;N # Lo [3] MALAYALAM LETTER E..MALAYALAM LETTER AI
 0D12..0D3A;N # Lo[41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
 0D3B..0D3C;N # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM 
SIGN CIRCULAR VIRAMA
@@ -550,6 +550,7 @@
 0D70..0D78;N # No [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE 
SIXTEENTHS
 0D79;N   # So MALAYALAM DATE MARK
 0D7A..0D7F;N # Lo [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER 
CHILLU K
+0D81;N   # Mn SINHALA SIGN CANDRABINDU
 0D82..0D83;N # Mc [2] SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARGAYA
 0D85..0D96;N # Lo[18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA
 0D9A..0DB1;N # Lo[24] SINHALA LETTER ALPAPRAANA KAYANNA..SINHALA 
LETTER DANTAJA NAYANNA
@@ -795,6 +796,7 @@
 1AA8..1AAD;N # Po [6] TAI THAM SIGN KAAN..TAI THAM SIGN CAANG
 1AB0..1ABD;N # Mn[14] COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINING 
PARENTHESES BELOW
 1ABE;N   # Me COMBINING PARENTHESES OVERLAY
+1ABF..1AC0;N # Mn [2] COMBINING LATIN SMALL LETTER W BELOW..COMBINING 
LATIN SMALL LETTER TURNED W BELOW
 1B00..1B03;N # Mn [4] BALINESE SIGN ULU RICEM..BALINESE SIGN SURANG
 1B04;N   # Mc BALINESE SIGN BISAH
 1B05.