Hello-
The attached patch updates cpp_wcwidth() (for computation of display
widths needed to calculate column numbers in diagnostics) from Unicode 12
to Unicode 13. The patch was purely mechanical, following the directions
in contrib/unicode/README without any unexpected hiccups. A couple
questions please:
-Is it OK for master?
-Unicode 13 actually came out just immediately before GCC 10 was
released. Would it make sense to put this on GCC 10 branch as well?
Thanks!
-Lewis
generated_cpp_wcwidth.h was regenerated using Unicode 13.0.0 data files. No
material changes to the parsing scripts (either GCC- or glibc-sourced) were
necessary; glibc's utf8_gen.py was tweaked slightly by glibc and matched here.
contrib/ChangeLog:
* unicode/EastAsianWidth.txt: Update to Unicode 13.0.0.
* unicode/PropList.txt: Likewise.
* unicode/README: Likewise.
* unicode/UnicodeData.txt: Likewise.
* unicode/from_glibc/unicode_utils.py: Update to latest glibc version.
* unicode/from_glibc/utf8_gen.py: Likewise.
libcpp/ChangeLog:
* generated_cpp_wcwidth.h: Regenerated from Unicode 13.0.0 data.
diff --git a/contrib/unicode/EastAsianWidth.txt
b/contrib/unicode/EastAsianWidth.txt
index 94d55d6654a..b43aec92738 100644
--- a/contrib/unicode/EastAsianWidth.txt
+++ b/contrib/unicode/EastAsianWidth.txt
@@ -1,6 +1,6 @@
-# EastAsianWidth-12.1.0.txt
-# Date: 2019-03-31, 22:01:58 GMT [KW, LI]
-# © 2019 Unicode®, Inc.
+# EastAsianWidth-13.0.0.txt
+# Date: 2029-01-21, 18:14:00 GMT [KW, LI]
+# © 2020 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in
the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
@@ -9,7 +9,7 @@
#
# East_Asian_Width Property
#
-# This file is an informative contributory data file in the
+# This file is a normative contributory data file in the
# Unicode Character Database.
#
# The format is two fields separated by a semicolon.
@@ -332,7 +332,7 @@
085E;N # Po MANDAIC PUNCTUATION
0860..086A;N # Lo[11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER
MALAYALAM SSA
08A0..08B4;N # Lo[21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC
LETTER KAF WITH DOT BELOW
-08B6..08BD;N # Lo [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC
LETTER AFRICAN NOON
+08B6..08C7;N # Lo[18] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC
LETTER LAM WITH SMALL ARABIC LETTER TAH ABOVE
08D3..08E1;N # Mn[15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN
SAFHA
08E2;N # Cf ARABIC DISPUTED END OF AYAH
08E3..08FF;N # Mn[29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS
NOON GHUNNA
@@ -450,7 +450,7 @@
0B47..0B48;N # Mc [2] ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
0B4B..0B4C;N # Mc [2] ORIYA VOWEL SIGN O..ORIYA VOWEL SIGN AU
0B4D;N # Mn ORIYA SIGN VIRAMA
-0B56;N # Mn ORIYA AI LENGTH MARK
+0B55..0B56;N # Mn [2] ORIYA SIGN OVERLINE..ORIYA AI LENGTH MARK
0B57;N # Mc ORIYA AU LENGTH MARK
0B5C..0B5D;N # Lo [2] ORIYA LETTER RRA..ORIYA LETTER RHA
0B5F..0B61;N # Lo [3] ORIYA LETTER YYA..ORIYA LETTER VOCALIC LL
@@ -529,7 +529,7 @@
0CF1..0CF2;N # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN
UPADHMANIYA
0D00..0D01;N # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA
ABOVE..MALAYALAM SIGN CANDRABINDU
0D02..0D03;N # Mc [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA
-0D05..0D0C;N # Lo [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L
+0D04..0D0C;N # Lo [9] MALAYALAM LETTER VEDIC ANUSVARA..MALAYALAM
LETTER VOCALIC L
0D0E..0D10;N # Lo [3] MALAYALAM LETTER E..MALAYALAM LETTER AI
0D12..0D3A;N # Lo[41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
0D3B..0D3C;N # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM
SIGN CIRCULAR VIRAMA
@@ -550,6 +550,7 @@
0D70..0D78;N # No [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE
SIXTEENTHS
0D79;N # So MALAYALAM DATE MARK
0D7A..0D7F;N # Lo [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER
CHILLU K
+0D81;N # Mn SINHALA SIGN CANDRABINDU
0D82..0D83;N # Mc [2] SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARGAYA
0D85..0D96;N # Lo[18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA
0D9A..0DB1;N # Lo[24] SINHALA LETTER ALPAPRAANA KAYANNA..SINHALA
LETTER DANTAJA NAYANNA
@@ -795,6 +796,7 @@
1AA8..1AAD;N # Po [6] TAI THAM SIGN KAAN..TAI THAM SIGN CAANG
1AB0..1ABD;N # Mn[14] COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINING
PARENTHESES BELOW
1ABE;N # Me COMBINING PARENTHESES OVERLAY
+1ABF..1AC0;N # Mn [2] COMBINING LATIN SMALL LETTER W BELOW..COMBINING
LATIN SMALL LETTER TURNED W BELOW
1B00..1B03;N # Mn [4] BALINESE SIGN ULU RICEM..BALINESE SIGN SURANG
1B04;N # Mc BALINESE SIGN BISAH
1B05.