Title: [204605] trunk
Revision
204605
Author
[email protected]
Date
2016-08-18 12:53:35 -0700 (Thu, 18 Aug 2016)

Log Message

Align our encoding labels with the encoding specification
https://bugs.webkit.org/show_bug.cgi?id=160931

Reviewed by Darin Adler.

LayoutTests/imported/w3c:

Rebaseline W3C test now that we are passing a lot more checks. For reference,
Firefox 48 passes 624 out of 654, and Chrome 52 passes 651 out of 654.
Before this change, WebKit was only passing 501 out of 654 and is now passing
651. The only checks we're failing is due to "Big5-HKSCS" not being an alias
to "Big5".

* web-platform-tests/dom/nodes/Document-characterSet-normalization-expected.txt:

Source/WebCore:

Align our encoding labels with the encoding specification:
- https://encoding.spec.whatwg.org/#names-and-labels

This also aligns with Firefox and Chrome.

No new tests, rebaselined existing test.

* platform/text/TextCodecICU.cpp:
(WebCore::TextCodecICU::registerEncodingNames):
* platform/text/TextCodecLatin1.cpp:
(WebCore::TextCodecLatin1::registerEncodingNames):
(WebCore::newStreamingTextDecoderWindowsLatin1): Deleted.
(WebCore::TextCodecLatin1::registerCodecs): Deleted.
* platform/text/TextCodecUTF8.cpp:
(WebCore::TextCodecUTF8::registerEncodingNames):

LayoutTests:

Update / rebaseline existing tests to reflect the code change.
The new baselines match Chrome and Firefox.

* fast/encoding/bracket-in-tag-expected.txt:
* fast/encoding/charset-invalid-expected.txt:
* fast/encoding/charset-replacement-expected.txt:
* fast/encoding/misplaced-xml-declaration-expected.txt:
* fast/encoding/pseudo-xml-expected.txt:
* http/tests/misc/char-encoding-bocu-1-blacklisted-expected.txt:
* http/tests/misc/char-encoding-bocu-1-blacklisted.html:
* http/tests/misc/char-encoding-in-hidden-charset-field-default-expected.txt:
* http/tests/misc/char-encoding-scsu-blacklisted-expected.txt:
* http/tests/misc/char-encoding-scsu-blacklisted.html:
* http/tests/misc/frame-default-enc-different-domain-expected.txt:

Modified Paths

Diff

Modified: trunk/LayoutTests/ChangeLog (204604 => 204605)


--- trunk/LayoutTests/ChangeLog	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/ChangeLog	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,3 +1,25 @@
+2016-08-18  Chris Dumez  <[email protected]>
+
+        Align our encoding labels with the encoding specification
+        https://bugs.webkit.org/show_bug.cgi?id=160931
+
+        Reviewed by Darin Adler.
+
+        Update / rebaseline existing tests to reflect the code change.
+        The new baselines match Chrome and Firefox.
+
+        * fast/encoding/bracket-in-tag-expected.txt:
+        * fast/encoding/charset-invalid-expected.txt:
+        * fast/encoding/charset-replacement-expected.txt:
+        * fast/encoding/misplaced-xml-declaration-expected.txt:
+        * fast/encoding/pseudo-xml-expected.txt:
+        * http/tests/misc/char-encoding-bocu-1-blacklisted-expected.txt:
+        * http/tests/misc/char-encoding-bocu-1-blacklisted.html:
+        * http/tests/misc/char-encoding-in-hidden-charset-field-default-expected.txt:
+        * http/tests/misc/char-encoding-scsu-blacklisted-expected.txt:
+        * http/tests/misc/char-encoding-scsu-blacklisted.html:
+        * http/tests/misc/frame-default-enc-different-domain-expected.txt:
+
 2016-08-18  Ryan Haddad  <[email protected]>
 
         Land test expectations for rdar://problem/27723718.

Modified: trunk/LayoutTests/fast/encoding/bracket-in-tag-expected.txt (204604 => 204605)


--- trunk/LayoutTests/fast/encoding/bracket-in-tag-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/fast/encoding/bracket-in-tag-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,2 +1,2 @@
- PASS: ISO-8859-1
+ PASS: windows-1252
 This test baselines charset sniffer behavior where the opening bracket inside a tag is consumed as part of the tag data, causing the meta tag to be missed.

Modified: trunk/LayoutTests/fast/encoding/charset-invalid-expected.txt (204604 => 204605)


--- trunk/LayoutTests/fast/encoding/charset-invalid-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/fast/encoding/charset-invalid-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,3 +1,3 @@
-Should be your browser default encoding: ISO-8859-1
+Should be your browser default encoding: windows-1252
 
 If it's latin-1 (ISO-8859-1), this should be accented e: é

Modified: trunk/LayoutTests/fast/encoding/charset-replacement-expected.txt (204604 => 204605)


--- trunk/LayoutTests/fast/encoding/charset-replacement-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/fast/encoding/charset-replacement-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,4 +1,4 @@
-ALERT: ISO-8859-1
+ALERT: windows-1252
 Test PASSED if the encoding of this document is the default encoding.
 Test FAILED if you see a U+FFFD character in a dumped render tree.
 

Modified: trunk/LayoutTests/fast/encoding/misplaced-xml-declaration-expected.txt (204604 => 204605)


--- trunk/LayoutTests/fast/encoding/misplaced-xml-declaration-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/fast/encoding/misplaced-xml-declaration-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1 +1 @@
-Should be your browser default encoding: ISO-8859-1
+Should be your browser default encoding: windows-1252

Modified: trunk/LayoutTests/fast/encoding/pseudo-xml-expected.txt (204604 => 204605)


--- trunk/LayoutTests/fast/encoding/pseudo-xml-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/fast/encoding/pseudo-xml-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,3 +1,3 @@
 Test for bug 9783: An XML declaration without an explicit encoding incorrectly triggers UTF-8 encoding in an HTML document
 
-Charset: ISO-8859-1 (should be your browser default one)
+Charset: windows-1252 (should be your browser default one)

Modified: trunk/LayoutTests/http/tests/misc/char-encoding-bocu-1-blacklisted-expected.txt (204604 => 204605)


--- trunk/LayoutTests/http/tests/misc/char-encoding-bocu-1-blacklisted-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/http/tests/misc/char-encoding-bocu-1-blacklisted-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -3,7 +3,7 @@
 On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
 
 
-PASS charset is ISO-8859-1
+PASS charset is windows-1252
 PASS successfullyParsed is true
 
 TEST COMPLETE

Modified: trunk/LayoutTests/http/tests/misc/char-encoding-bocu-1-blacklisted.html (204604 => 204605)


--- trunk/LayoutTests/http/tests/misc/char-encoding-bocu-1-blacklisted.html	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/http/tests/misc/char-encoding-bocu-1-blacklisted.html	2016-08-18 19:53:35 UTC (rev 204605)
@@ -14,8 +14,8 @@
     }
     function run() {
         var bocu1Frame = document.getElementById("bocu1Frame");
-        if (bocu1Frame.contentDocument.charset == "ISO-8859-1") {
-            testPassed("charset is ISO-8859-1");
+        if (bocu1Frame.contentDocument.charset == "windows-1252") {
+            testPassed("charset is windows-1252");
         } else {
             testFailed("charset is " + bocu1Frame.contentDocument.charset);
         }

Modified: trunk/LayoutTests/http/tests/misc/char-encoding-in-hidden-charset-field-default-expected.txt (204604 => 204605)


--- trunk/LayoutTests/http/tests/misc/char-encoding-in-hidden-charset-field-default-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/http/tests/misc/char-encoding-in-hidden-charset-field-default-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,3 +1,3 @@
 This is a test for https://bugs.webkit.org/show_bug.cgi?id=19079, it send the submissions character encoding in hidden _charset_ field.
 
-PASSED: _charset_=ISO-8859-1
+PASSED: _charset_=windows-1252

Modified: trunk/LayoutTests/http/tests/misc/char-encoding-scsu-blacklisted-expected.txt (204604 => 204605)


--- trunk/LayoutTests/http/tests/misc/char-encoding-scsu-blacklisted-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/http/tests/misc/char-encoding-scsu-blacklisted-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -3,7 +3,7 @@
 On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
 
 
-PASS charset is ISO-8859-1
+PASS charset is windows-1252
 PASS successfullyParsed is true
 
 TEST COMPLETE

Modified: trunk/LayoutTests/http/tests/misc/char-encoding-scsu-blacklisted.html (204604 => 204605)


--- trunk/LayoutTests/http/tests/misc/char-encoding-scsu-blacklisted.html	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/http/tests/misc/char-encoding-scsu-blacklisted.html	2016-08-18 19:53:35 UTC (rev 204605)
@@ -14,8 +14,8 @@
     }
     function run() {
         var scsuFrame = document.getElementById("scsuFrame");
-        if (scsuFrame.contentDocument.charset === "ISO-8859-1") {
-            testPassed("charset is ISO-8859-1");
+        if (scsuFrame.contentDocument.charset === "windows-1252") {
+            testPassed("charset is windows-1252");
         } else {
             testFailed("charset is " + scsuFrame.contentDocument.charset);
         }

Modified: trunk/LayoutTests/http/tests/misc/frame-default-enc-different-domain-expected.txt (204604 => 204605)


--- trunk/LayoutTests/http/tests/misc/frame-default-enc-different-domain-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/http/tests/misc/frame-default-enc-different-domain-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,2 +1,2 @@
-ALERT: ISO-8859-1
+ALERT: windows-1252
 

Modified: trunk/LayoutTests/imported/w3c/ChangeLog (204604 => 204605)


--- trunk/LayoutTests/imported/w3c/ChangeLog	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/imported/w3c/ChangeLog	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,3 +1,18 @@
+2016-08-18  Chris Dumez  <[email protected]>
+
+        Align our encoding labels with the encoding specification
+        https://bugs.webkit.org/show_bug.cgi?id=160931
+
+        Reviewed by Darin Adler.
+
+        Rebaseline W3C test now that we are passing a lot more checks. For reference,
+        Firefox 48 passes 624 out of 654, and Chrome 52 passes 651 out of 654.
+        Before this change, WebKit was only passing 501 out of 654 and is now passing
+        651. The only checks we're failing is due to "Big5-HKSCS" not being an alias
+        to "Big5".
+
+        * web-platform-tests/dom/nodes/Document-characterSet-normalization-expected.txt:
+
 2016-08-17  Benjamin Poulain  <[email protected]>
 
         [CSS] The parser should not get rid of empty namespace specification in front of element name selectors

Modified: trunk/LayoutTests/imported/w3c/web-platform-tests/dom/nodes/Document-characterSet-normalization-expected.txt (204604 => 204605)


--- trunk/LayoutTests/imported/w3c/web-platform-tests/dom/nodes/Document-characterSet-normalization-expected.txt	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/LayoutTests/imported/w3c/web-platform-tests/dom/nodes/Document-characterSet-normalization-expected.txt	2016-08-18 19:53:35 UTC (rev 204605)
@@ -41,12 +41,12 @@
 PASS Name "ISO-8859-2" has label "iso8859-2" (characterSet) 
 PASS Name "ISO-8859-2" has label "iso8859-2" (inputEncoding) 
 PASS Name "ISO-8859-2" has label "iso8859-2" (charset) 
-FAIL Name "ISO-8859-2" has label "iso88592" (characterSet) assert_equals: expected "ISO-8859-2" but got "ISO-8859-1"
-FAIL Name "ISO-8859-2" has label "iso88592" (inputEncoding) assert_equals: expected "ISO-8859-2" but got "ISO-8859-1"
-FAIL Name "ISO-8859-2" has label "iso88592" (charset) assert_equals: expected "ISO-8859-2" but got "ISO-8859-1"
-FAIL Name "ISO-8859-2" has label "iso_8859-2" (characterSet) assert_equals: expected "ISO-8859-2" but got "ISO-8859-1"
-FAIL Name "ISO-8859-2" has label "iso_8859-2" (inputEncoding) assert_equals: expected "ISO-8859-2" but got "ISO-8859-1"
-FAIL Name "ISO-8859-2" has label "iso_8859-2" (charset) assert_equals: expected "ISO-8859-2" but got "ISO-8859-1"
+PASS Name "ISO-8859-2" has label "iso88592" (characterSet) 
+PASS Name "ISO-8859-2" has label "iso88592" (inputEncoding) 
+PASS Name "ISO-8859-2" has label "iso88592" (charset) 
+PASS Name "ISO-8859-2" has label "iso_8859-2" (characterSet) 
+PASS Name "ISO-8859-2" has label "iso_8859-2" (inputEncoding) 
+PASS Name "ISO-8859-2" has label "iso_8859-2" (charset) 
 PASS Name "ISO-8859-2" has label "iso_8859-2:1987" (characterSet) 
 PASS Name "ISO-8859-2" has label "iso_8859-2:1987" (inputEncoding) 
 PASS Name "ISO-8859-2" has label "iso_8859-2:1987" (charset) 
@@ -68,12 +68,12 @@
 PASS Name "ISO-8859-3" has label "iso8859-3" (characterSet) 
 PASS Name "ISO-8859-3" has label "iso8859-3" (inputEncoding) 
 PASS Name "ISO-8859-3" has label "iso8859-3" (charset) 
-FAIL Name "ISO-8859-3" has label "iso88593" (characterSet) assert_equals: expected "ISO-8859-3" but got "ISO-8859-1"
-FAIL Name "ISO-8859-3" has label "iso88593" (inputEncoding) assert_equals: expected "ISO-8859-3" but got "ISO-8859-1"
-FAIL Name "ISO-8859-3" has label "iso88593" (charset) assert_equals: expected "ISO-8859-3" but got "ISO-8859-1"
-FAIL Name "ISO-8859-3" has label "iso_8859-3" (characterSet) assert_equals: expected "ISO-8859-3" but got "ISO-8859-1"
-FAIL Name "ISO-8859-3" has label "iso_8859-3" (inputEncoding) assert_equals: expected "ISO-8859-3" but got "ISO-8859-1"
-FAIL Name "ISO-8859-3" has label "iso_8859-3" (charset) assert_equals: expected "ISO-8859-3" but got "ISO-8859-1"
+PASS Name "ISO-8859-3" has label "iso88593" (characterSet) 
+PASS Name "ISO-8859-3" has label "iso88593" (inputEncoding) 
+PASS Name "ISO-8859-3" has label "iso88593" (charset) 
+PASS Name "ISO-8859-3" has label "iso_8859-3" (characterSet) 
+PASS Name "ISO-8859-3" has label "iso_8859-3" (inputEncoding) 
+PASS Name "ISO-8859-3" has label "iso_8859-3" (charset) 
 PASS Name "ISO-8859-3" has label "iso_8859-3:1988" (characterSet) 
 PASS Name "ISO-8859-3" has label "iso_8859-3:1988" (inputEncoding) 
 PASS Name "ISO-8859-3" has label "iso_8859-3:1988" (charset) 
@@ -95,12 +95,12 @@
 PASS Name "ISO-8859-4" has label "iso8859-4" (characterSet) 
 PASS Name "ISO-8859-4" has label "iso8859-4" (inputEncoding) 
 PASS Name "ISO-8859-4" has label "iso8859-4" (charset) 
-FAIL Name "ISO-8859-4" has label "iso88594" (characterSet) assert_equals: expected "ISO-8859-4" but got "ISO-8859-1"
-FAIL Name "ISO-8859-4" has label "iso88594" (inputEncoding) assert_equals: expected "ISO-8859-4" but got "ISO-8859-1"
-FAIL Name "ISO-8859-4" has label "iso88594" (charset) assert_equals: expected "ISO-8859-4" but got "ISO-8859-1"
-FAIL Name "ISO-8859-4" has label "iso_8859-4" (characterSet) assert_equals: expected "ISO-8859-4" but got "ISO-8859-1"
-FAIL Name "ISO-8859-4" has label "iso_8859-4" (inputEncoding) assert_equals: expected "ISO-8859-4" but got "ISO-8859-1"
-FAIL Name "ISO-8859-4" has label "iso_8859-4" (charset) assert_equals: expected "ISO-8859-4" but got "ISO-8859-1"
+PASS Name "ISO-8859-4" has label "iso88594" (characterSet) 
+PASS Name "ISO-8859-4" has label "iso88594" (inputEncoding) 
+PASS Name "ISO-8859-4" has label "iso88594" (charset) 
+PASS Name "ISO-8859-4" has label "iso_8859-4" (characterSet) 
+PASS Name "ISO-8859-4" has label "iso_8859-4" (inputEncoding) 
+PASS Name "ISO-8859-4" has label "iso_8859-4" (charset) 
 PASS Name "ISO-8859-4" has label "iso_8859-4:1988" (characterSet) 
 PASS Name "ISO-8859-4" has label "iso_8859-4:1988" (inputEncoding) 
 PASS Name "ISO-8859-4" has label "iso_8859-4:1988" (charset) 
@@ -125,12 +125,12 @@
 PASS Name "ISO-8859-5" has label "iso8859-5" (characterSet) 
 PASS Name "ISO-8859-5" has label "iso8859-5" (inputEncoding) 
 PASS Name "ISO-8859-5" has label "iso8859-5" (charset) 
-FAIL Name "ISO-8859-5" has label "iso88595" (characterSet) assert_equals: expected "ISO-8859-5" but got "ISO-8859-1"
-FAIL Name "ISO-8859-5" has label "iso88595" (inputEncoding) assert_equals: expected "ISO-8859-5" but got "ISO-8859-1"
-FAIL Name "ISO-8859-5" has label "iso88595" (charset) assert_equals: expected "ISO-8859-5" but got "ISO-8859-1"
-FAIL Name "ISO-8859-5" has label "iso_8859-5" (characterSet) assert_equals: expected "ISO-8859-5" but got "ISO-8859-1"
-FAIL Name "ISO-8859-5" has label "iso_8859-5" (inputEncoding) assert_equals: expected "ISO-8859-5" but got "ISO-8859-1"
-FAIL Name "ISO-8859-5" has label "iso_8859-5" (charset) assert_equals: expected "ISO-8859-5" but got "ISO-8859-1"
+PASS Name "ISO-8859-5" has label "iso88595" (characterSet) 
+PASS Name "ISO-8859-5" has label "iso88595" (inputEncoding) 
+PASS Name "ISO-8859-5" has label "iso88595" (charset) 
+PASS Name "ISO-8859-5" has label "iso_8859-5" (characterSet) 
+PASS Name "ISO-8859-5" has label "iso_8859-5" (inputEncoding) 
+PASS Name "ISO-8859-5" has label "iso_8859-5" (charset) 
 PASS Name "ISO-8859-5" has label "iso_8859-5:1988" (characterSet) 
 PASS Name "ISO-8859-5" has label "iso_8859-5:1988" (inputEncoding) 
 PASS Name "ISO-8859-5" has label "iso_8859-5:1988" (charset) 
@@ -140,12 +140,12 @@
 PASS Name "ISO-8859-6" has label "asmo-708" (characterSet) 
 PASS Name "ISO-8859-6" has label "asmo-708" (inputEncoding) 
 PASS Name "ISO-8859-6" has label "asmo-708" (charset) 
-FAIL Name "ISO-8859-6" has label "csiso88596e" (characterSet) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "csiso88596e" (inputEncoding) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "csiso88596e" (charset) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "csiso88596i" (characterSet) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "csiso88596i" (inputEncoding) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "csiso88596i" (charset) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
+PASS Name "ISO-8859-6" has label "csiso88596e" (characterSet) 
+PASS Name "ISO-8859-6" has label "csiso88596e" (inputEncoding) 
+PASS Name "ISO-8859-6" has label "csiso88596e" (charset) 
+PASS Name "ISO-8859-6" has label "csiso88596i" (characterSet) 
+PASS Name "ISO-8859-6" has label "csiso88596i" (inputEncoding) 
+PASS Name "ISO-8859-6" has label "csiso88596i" (charset) 
 PASS Name "ISO-8859-6" has label "csisolatinarabic" (characterSet) 
 PASS Name "ISO-8859-6" has label "csisolatinarabic" (inputEncoding) 
 PASS Name "ISO-8859-6" has label "csisolatinarabic" (charset) 
@@ -167,12 +167,12 @@
 PASS Name "ISO-8859-6" has label "iso8859-6" (characterSet) 
 PASS Name "ISO-8859-6" has label "iso8859-6" (inputEncoding) 
 PASS Name "ISO-8859-6" has label "iso8859-6" (charset) 
-FAIL Name "ISO-8859-6" has label "iso88596" (characterSet) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "iso88596" (inputEncoding) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "iso88596" (charset) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "iso_8859-6" (characterSet) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "iso_8859-6" (inputEncoding) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
-FAIL Name "ISO-8859-6" has label "iso_8859-6" (charset) assert_equals: expected "ISO-8859-6" but got "ISO-8859-1"
+PASS Name "ISO-8859-6" has label "iso88596" (characterSet) 
+PASS Name "ISO-8859-6" has label "iso88596" (inputEncoding) 
+PASS Name "ISO-8859-6" has label "iso88596" (charset) 
+PASS Name "ISO-8859-6" has label "iso_8859-6" (characterSet) 
+PASS Name "ISO-8859-6" has label "iso_8859-6" (inputEncoding) 
+PASS Name "ISO-8859-6" has label "iso_8859-6" (charset) 
 PASS Name "ISO-8859-6" has label "iso_8859-6:1987" (characterSet) 
 PASS Name "ISO-8859-6" has label "iso_8859-6:1987" (inputEncoding) 
 PASS Name "ISO-8859-6" has label "iso_8859-6:1987" (charset) 
@@ -200,12 +200,12 @@
 PASS Name "ISO-8859-7" has label "iso8859-7" (characterSet) 
 PASS Name "ISO-8859-7" has label "iso8859-7" (inputEncoding) 
 PASS Name "ISO-8859-7" has label "iso8859-7" (charset) 
-FAIL Name "ISO-8859-7" has label "iso88597" (characterSet) assert_equals: expected "ISO-8859-7" but got "ISO-8859-1"
-FAIL Name "ISO-8859-7" has label "iso88597" (inputEncoding) assert_equals: expected "ISO-8859-7" but got "ISO-8859-1"
-FAIL Name "ISO-8859-7" has label "iso88597" (charset) assert_equals: expected "ISO-8859-7" but got "ISO-8859-1"
-FAIL Name "ISO-8859-7" has label "iso_8859-7" (characterSet) assert_equals: expected "ISO-8859-7" but got "ISO-8859-1"
-FAIL Name "ISO-8859-7" has label "iso_8859-7" (inputEncoding) assert_equals: expected "ISO-8859-7" but got "ISO-8859-1"
-FAIL Name "ISO-8859-7" has label "iso_8859-7" (charset) assert_equals: expected "ISO-8859-7" but got "ISO-8859-1"
+PASS Name "ISO-8859-7" has label "iso88597" (characterSet) 
+PASS Name "ISO-8859-7" has label "iso88597" (inputEncoding) 
+PASS Name "ISO-8859-7" has label "iso88597" (charset) 
+PASS Name "ISO-8859-7" has label "iso_8859-7" (characterSet) 
+PASS Name "ISO-8859-7" has label "iso_8859-7" (inputEncoding) 
+PASS Name "ISO-8859-7" has label "iso_8859-7" (charset) 
 PASS Name "ISO-8859-7" has label "iso_8859-7:1987" (characterSet) 
 PASS Name "ISO-8859-7" has label "iso_8859-7:1987" (inputEncoding) 
 PASS Name "ISO-8859-7" has label "iso_8859-7:1987" (charset) 
@@ -212,9 +212,9 @@
 PASS Name "ISO-8859-7" has label "sun_eu_greek" (characterSet) 
 PASS Name "ISO-8859-7" has label "sun_eu_greek" (inputEncoding) 
 PASS Name "ISO-8859-7" has label "sun_eu_greek" (charset) 
-FAIL Name "ISO-8859-8" has label "csiso88598e" (characterSet) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
-FAIL Name "ISO-8859-8" has label "csiso88598e" (inputEncoding) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
-FAIL Name "ISO-8859-8" has label "csiso88598e" (charset) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
+PASS Name "ISO-8859-8" has label "csiso88598e" (characterSet) 
+PASS Name "ISO-8859-8" has label "csiso88598e" (inputEncoding) 
+PASS Name "ISO-8859-8" has label "csiso88598e" (charset) 
 PASS Name "ISO-8859-8" has label "csisolatinhebrew" (characterSet) 
 PASS Name "ISO-8859-8" has label "csisolatinhebrew" (inputEncoding) 
 PASS Name "ISO-8859-8" has label "csisolatinhebrew" (charset) 
@@ -233,12 +233,12 @@
 PASS Name "ISO-8859-8" has label "iso8859-8" (characterSet) 
 PASS Name "ISO-8859-8" has label "iso8859-8" (inputEncoding) 
 PASS Name "ISO-8859-8" has label "iso8859-8" (charset) 
-FAIL Name "ISO-8859-8" has label "iso88598" (characterSet) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
-FAIL Name "ISO-8859-8" has label "iso88598" (inputEncoding) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
-FAIL Name "ISO-8859-8" has label "iso88598" (charset) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
-FAIL Name "ISO-8859-8" has label "iso_8859-8" (characterSet) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
-FAIL Name "ISO-8859-8" has label "iso_8859-8" (inputEncoding) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
-FAIL Name "ISO-8859-8" has label "iso_8859-8" (charset) assert_equals: expected "ISO-8859-8" but got "ISO-8859-1"
+PASS Name "ISO-8859-8" has label "iso88598" (characterSet) 
+PASS Name "ISO-8859-8" has label "iso88598" (inputEncoding) 
+PASS Name "ISO-8859-8" has label "iso88598" (charset) 
+PASS Name "ISO-8859-8" has label "iso_8859-8" (characterSet) 
+PASS Name "ISO-8859-8" has label "iso_8859-8" (inputEncoding) 
+PASS Name "ISO-8859-8" has label "iso_8859-8" (charset) 
 PASS Name "ISO-8859-8" has label "iso_8859-8:1988" (characterSet) 
 PASS Name "ISO-8859-8" has label "iso_8859-8:1988" (inputEncoding) 
 PASS Name "ISO-8859-8" has label "iso_8859-8:1988" (charset) 
@@ -266,9 +266,9 @@
 PASS Name "ISO-8859-10" has label "iso8859-10" (characterSet) 
 PASS Name "ISO-8859-10" has label "iso8859-10" (inputEncoding) 
 PASS Name "ISO-8859-10" has label "iso8859-10" (charset) 
-FAIL Name "ISO-8859-10" has label "iso885910" (characterSet) assert_equals: expected "ISO-8859-10" but got "ISO-8859-1"
-FAIL Name "ISO-8859-10" has label "iso885910" (inputEncoding) assert_equals: expected "ISO-8859-10" but got "ISO-8859-1"
-FAIL Name "ISO-8859-10" has label "iso885910" (charset) assert_equals: expected "ISO-8859-10" but got "ISO-8859-1"
+PASS Name "ISO-8859-10" has label "iso885910" (characterSet) 
+PASS Name "ISO-8859-10" has label "iso885910" (inputEncoding) 
+PASS Name "ISO-8859-10" has label "iso885910" (charset) 
 PASS Name "ISO-8859-10" has label "l6" (characterSet) 
 PASS Name "ISO-8859-10" has label "l6" (inputEncoding) 
 PASS Name "ISO-8859-10" has label "l6" (charset) 
@@ -281,9 +281,9 @@
 PASS Name "ISO-8859-13" has label "iso8859-13" (characterSet) 
 PASS Name "ISO-8859-13" has label "iso8859-13" (inputEncoding) 
 PASS Name "ISO-8859-13" has label "iso8859-13" (charset) 
-FAIL Name "ISO-8859-13" has label "iso885913" (characterSet) assert_equals: expected "ISO-8859-13" but got "ISO-8859-1"
-FAIL Name "ISO-8859-13" has label "iso885913" (inputEncoding) assert_equals: expected "ISO-8859-13" but got "ISO-8859-1"
-FAIL Name "ISO-8859-13" has label "iso885913" (charset) assert_equals: expected "ISO-8859-13" but got "ISO-8859-1"
+PASS Name "ISO-8859-13" has label "iso885913" (characterSet) 
+PASS Name "ISO-8859-13" has label "iso885913" (inputEncoding) 
+PASS Name "ISO-8859-13" has label "iso885913" (charset) 
 PASS Name "ISO-8859-14" has label "iso-8859-14" (characterSet) 
 PASS Name "ISO-8859-14" has label "iso-8859-14" (inputEncoding) 
 PASS Name "ISO-8859-14" has label "iso-8859-14" (charset) 
@@ -290,9 +290,9 @@
 PASS Name "ISO-8859-14" has label "iso8859-14" (characterSet) 
 PASS Name "ISO-8859-14" has label "iso8859-14" (inputEncoding) 
 PASS Name "ISO-8859-14" has label "iso8859-14" (charset) 
-FAIL Name "ISO-8859-14" has label "iso885914" (characterSet) assert_equals: expected "ISO-8859-14" but got "ISO-8859-1"
-FAIL Name "ISO-8859-14" has label "iso885914" (inputEncoding) assert_equals: expected "ISO-8859-14" but got "ISO-8859-1"
-FAIL Name "ISO-8859-14" has label "iso885914" (charset) assert_equals: expected "ISO-8859-14" but got "ISO-8859-1"
+PASS Name "ISO-8859-14" has label "iso885914" (characterSet) 
+PASS Name "ISO-8859-14" has label "iso885914" (inputEncoding) 
+PASS Name "ISO-8859-14" has label "iso885914" (charset) 
 PASS Name "ISO-8859-15" has label "csisolatin9" (characterSet) 
 PASS Name "ISO-8859-15" has label "csisolatin9" (inputEncoding) 
 PASS Name "ISO-8859-15" has label "csisolatin9" (charset) 
@@ -302,12 +302,12 @@
 PASS Name "ISO-8859-15" has label "iso8859-15" (characterSet) 
 PASS Name "ISO-8859-15" has label "iso8859-15" (inputEncoding) 
 PASS Name "ISO-8859-15" has label "iso8859-15" (charset) 
-FAIL Name "ISO-8859-15" has label "iso885915" (characterSet) assert_equals: expected "ISO-8859-15" but got "ISO-8859-1"
-FAIL Name "ISO-8859-15" has label "iso885915" (inputEncoding) assert_equals: expected "ISO-8859-15" but got "ISO-8859-1"
-FAIL Name "ISO-8859-15" has label "iso885915" (charset) assert_equals: expected "ISO-8859-15" but got "ISO-8859-1"
-FAIL Name "ISO-8859-15" has label "iso_8859-15" (characterSet) assert_equals: expected "ISO-8859-15" but got "ISO-8859-1"
-FAIL Name "ISO-8859-15" has label "iso_8859-15" (inputEncoding) assert_equals: expected "ISO-8859-15" but got "ISO-8859-1"
-FAIL Name "ISO-8859-15" has label "iso_8859-15" (charset) assert_equals: expected "ISO-8859-15" but got "ISO-8859-1"
+PASS Name "ISO-8859-15" has label "iso885915" (characterSet) 
+PASS Name "ISO-8859-15" has label "iso885915" (inputEncoding) 
+PASS Name "ISO-8859-15" has label "iso885915" (charset) 
+PASS Name "ISO-8859-15" has label "iso_8859-15" (characterSet) 
+PASS Name "ISO-8859-15" has label "iso_8859-15" (inputEncoding) 
+PASS Name "ISO-8859-15" has label "iso_8859-15" (charset) 
 PASS Name "ISO-8859-15" has label "l9" (characterSet) 
 PASS Name "ISO-8859-15" has label "l9" (inputEncoding) 
 PASS Name "ISO-8859-15" has label "l9" (charset) 
@@ -326,12 +326,12 @@
 PASS Name "KOI8-R" has label "koi8-r" (characterSet) 
 PASS Name "KOI8-R" has label "koi8-r" (inputEncoding) 
 PASS Name "KOI8-R" has label "koi8-r" (charset) 
-FAIL Name "KOI8-R" has label "koi8_r" (characterSet) assert_equals: expected "KOI8-R" but got "ISO-8859-1"
-FAIL Name "KOI8-R" has label "koi8_r" (inputEncoding) assert_equals: expected "KOI8-R" but got "ISO-8859-1"
-FAIL Name "KOI8-R" has label "koi8_r" (charset) assert_equals: expected "KOI8-R" but got "ISO-8859-1"
-FAIL Name "KOI8-U" has label "koi8-ru" (characterSet) assert_equals: expected "KOI8-U" but got "ISO-8859-1"
-FAIL Name "KOI8-U" has label "koi8-ru" (inputEncoding) assert_equals: expected "KOI8-U" but got "ISO-8859-1"
-FAIL Name "KOI8-U" has label "koi8-ru" (charset) assert_equals: expected "KOI8-U" but got "ISO-8859-1"
+PASS Name "KOI8-R" has label "koi8_r" (characterSet) 
+PASS Name "KOI8-R" has label "koi8_r" (inputEncoding) 
+PASS Name "KOI8-R" has label "koi8_r" (charset) 
+PASS Name "KOI8-U" has label "koi8-ru" (characterSet) 
+PASS Name "KOI8-U" has label "koi8-ru" (inputEncoding) 
+PASS Name "KOI8-U" has label "koi8-ru" (charset) 
 PASS Name "KOI8-U" has label "koi8-u" (characterSet) 
 PASS Name "KOI8-U" has label "koi8-u" (inputEncoding) 
 PASS Name "KOI8-U" has label "koi8-u" (charset) 
@@ -356,9 +356,9 @@
 PASS Name "windows-874" has label "iso8859-11" (characterSet) 
 PASS Name "windows-874" has label "iso8859-11" (inputEncoding) 
 PASS Name "windows-874" has label "iso8859-11" (charset) 
-FAIL Name "windows-874" has label "iso885911" (characterSet) assert_equals: expected "windows-874" but got "ISO-8859-1"
-FAIL Name "windows-874" has label "iso885911" (inputEncoding) assert_equals: expected "windows-874" but got "ISO-8859-1"
-FAIL Name "windows-874" has label "iso885911" (charset) assert_equals: expected "windows-874" but got "ISO-8859-1"
+PASS Name "windows-874" has label "iso885911" (characterSet) 
+PASS Name "windows-874" has label "iso885911" (inputEncoding) 
+PASS Name "windows-874" has label "iso885911" (charset) 
 PASS Name "windows-874" has label "tis-620" (characterSet) 
 PASS Name "windows-874" has label "tis-620" (inputEncoding) 
 PASS Name "windows-874" has label "tis-620" (charset) 
@@ -383,57 +383,57 @@
 PASS Name "windows-1251" has label "x-cp1251" (characterSet) 
 PASS Name "windows-1251" has label "x-cp1251" (inputEncoding) 
 PASS Name "windows-1251" has label "x-cp1251" (charset) 
-FAIL Name "windows-1252" has label "ansi_x3.4-1968" (characterSet) assert_equals: expected "windows-1252" but got "US-ASCII"
-FAIL Name "windows-1252" has label "ansi_x3.4-1968" (inputEncoding) assert_equals: expected "windows-1252" but got "US-ASCII"
-FAIL Name "windows-1252" has label "ansi_x3.4-1968" (charset) assert_equals: expected "windows-1252" but got "US-ASCII"
-FAIL Name "windows-1252" has label "ascii" (characterSet) assert_equals: expected "windows-1252" but got "US-ASCII"
-FAIL Name "windows-1252" has label "ascii" (inputEncoding) assert_equals: expected "windows-1252" but got "US-ASCII"
-FAIL Name "windows-1252" has label "ascii" (charset) assert_equals: expected "windows-1252" but got "US-ASCII"
+PASS Name "windows-1252" has label "ansi_x3.4-1968" (characterSet) 
+PASS Name "windows-1252" has label "ansi_x3.4-1968" (inputEncoding) 
+PASS Name "windows-1252" has label "ansi_x3.4-1968" (charset) 
+PASS Name "windows-1252" has label "ascii" (characterSet) 
+PASS Name "windows-1252" has label "ascii" (inputEncoding) 
+PASS Name "windows-1252" has label "ascii" (charset) 
 PASS Name "windows-1252" has label "cp1252" (characterSet) 
 PASS Name "windows-1252" has label "cp1252" (inputEncoding) 
 PASS Name "windows-1252" has label "cp1252" (charset) 
-FAIL Name "windows-1252" has label "cp819" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "cp819" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "cp819" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "csisolatin1" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "csisolatin1" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "csisolatin1" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "ibm819" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "ibm819" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "ibm819" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso-8859-1" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso-8859-1" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso-8859-1" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso-ir-100" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso-ir-100" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso-ir-100" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso8859-1" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso8859-1" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso8859-1" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso88591" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso88591" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso88591" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso_8859-1" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso_8859-1" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso_8859-1" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso_8859-1:1987" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso_8859-1:1987" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "iso_8859-1:1987" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "l1" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "l1" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "l1" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "latin1" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "latin1" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "latin1" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "us-ascii" (characterSet) assert_equals: expected "windows-1252" but got "US-ASCII"
-FAIL Name "windows-1252" has label "us-ascii" (inputEncoding) assert_equals: expected "windows-1252" but got "US-ASCII"
-FAIL Name "windows-1252" has label "us-ascii" (charset) assert_equals: expected "windows-1252" but got "US-ASCII"
+PASS Name "windows-1252" has label "cp819" (characterSet) 
+PASS Name "windows-1252" has label "cp819" (inputEncoding) 
+PASS Name "windows-1252" has label "cp819" (charset) 
+PASS Name "windows-1252" has label "csisolatin1" (characterSet) 
+PASS Name "windows-1252" has label "csisolatin1" (inputEncoding) 
+PASS Name "windows-1252" has label "csisolatin1" (charset) 
+PASS Name "windows-1252" has label "ibm819" (characterSet) 
+PASS Name "windows-1252" has label "ibm819" (inputEncoding) 
+PASS Name "windows-1252" has label "ibm819" (charset) 
+PASS Name "windows-1252" has label "iso-8859-1" (characterSet) 
+PASS Name "windows-1252" has label "iso-8859-1" (inputEncoding) 
+PASS Name "windows-1252" has label "iso-8859-1" (charset) 
+PASS Name "windows-1252" has label "iso-ir-100" (characterSet) 
+PASS Name "windows-1252" has label "iso-ir-100" (inputEncoding) 
+PASS Name "windows-1252" has label "iso-ir-100" (charset) 
+PASS Name "windows-1252" has label "iso8859-1" (characterSet) 
+PASS Name "windows-1252" has label "iso8859-1" (inputEncoding) 
+PASS Name "windows-1252" has label "iso8859-1" (charset) 
+PASS Name "windows-1252" has label "iso88591" (characterSet) 
+PASS Name "windows-1252" has label "iso88591" (inputEncoding) 
+PASS Name "windows-1252" has label "iso88591" (charset) 
+PASS Name "windows-1252" has label "iso_8859-1" (characterSet) 
+PASS Name "windows-1252" has label "iso_8859-1" (inputEncoding) 
+PASS Name "windows-1252" has label "iso_8859-1" (charset) 
+PASS Name "windows-1252" has label "iso_8859-1:1987" (characterSet) 
+PASS Name "windows-1252" has label "iso_8859-1:1987" (inputEncoding) 
+PASS Name "windows-1252" has label "iso_8859-1:1987" (charset) 
+PASS Name "windows-1252" has label "l1" (characterSet) 
+PASS Name "windows-1252" has label "l1" (inputEncoding) 
+PASS Name "windows-1252" has label "l1" (charset) 
+PASS Name "windows-1252" has label "latin1" (characterSet) 
+PASS Name "windows-1252" has label "latin1" (inputEncoding) 
+PASS Name "windows-1252" has label "latin1" (charset) 
+PASS Name "windows-1252" has label "us-ascii" (characterSet) 
+PASS Name "windows-1252" has label "us-ascii" (inputEncoding) 
+PASS Name "windows-1252" has label "us-ascii" (charset) 
 PASS Name "windows-1252" has label "windows-1252" (characterSet) 
 PASS Name "windows-1252" has label "windows-1252" (inputEncoding) 
 PASS Name "windows-1252" has label "windows-1252" (charset) 
-FAIL Name "windows-1252" has label "x-cp1252" (characterSet) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "x-cp1252" (inputEncoding) assert_equals: expected "windows-1252" but got "ISO-8859-1"
-FAIL Name "windows-1252" has label "x-cp1252" (charset) assert_equals: expected "windows-1252" but got "ISO-8859-1"
+PASS Name "windows-1252" has label "x-cp1252" (characterSet) 
+PASS Name "windows-1252" has label "x-cp1252" (inputEncoding) 
+PASS Name "windows-1252" has label "x-cp1252" (charset) 
 PASS Name "windows-1252" has label "x-user-defined" (characterSet) 
 PASS Name "windows-1252" has label "x-user-defined" (inputEncoding) 
 PASS Name "windows-1252" has label "x-user-defined" (charset) 
@@ -443,9 +443,9 @@
 PASS Name "windows-1253" has label "windows-1253" (characterSet) 
 PASS Name "windows-1253" has label "windows-1253" (inputEncoding) 
 PASS Name "windows-1253" has label "windows-1253" (charset) 
-FAIL Name "windows-1253" has label "x-cp1253" (characterSet) assert_equals: expected "windows-1253" but got "ISO-8859-1"
-FAIL Name "windows-1253" has label "x-cp1253" (inputEncoding) assert_equals: expected "windows-1253" but got "ISO-8859-1"
-FAIL Name "windows-1253" has label "x-cp1253" (charset) assert_equals: expected "windows-1253" but got "ISO-8859-1"
+PASS Name "windows-1253" has label "x-cp1253" (characterSet) 
+PASS Name "windows-1253" has label "x-cp1253" (inputEncoding) 
+PASS Name "windows-1253" has label "x-cp1253" (charset) 
 PASS Name "windows-1254" has label "cp1254" (characterSet) 
 PASS Name "windows-1254" has label "cp1254" (inputEncoding) 
 PASS Name "windows-1254" has label "cp1254" (charset) 
@@ -461,12 +461,12 @@
 PASS Name "windows-1254" has label "iso8859-9" (characterSet) 
 PASS Name "windows-1254" has label "iso8859-9" (inputEncoding) 
 PASS Name "windows-1254" has label "iso8859-9" (charset) 
-FAIL Name "windows-1254" has label "iso88599" (characterSet) assert_equals: expected "windows-1254" but got "ISO-8859-1"
-FAIL Name "windows-1254" has label "iso88599" (inputEncoding) assert_equals: expected "windows-1254" but got "ISO-8859-1"
-FAIL Name "windows-1254" has label "iso88599" (charset) assert_equals: expected "windows-1254" but got "ISO-8859-1"
-FAIL Name "windows-1254" has label "iso_8859-9" (characterSet) assert_equals: expected "windows-1254" but got "ISO-8859-1"
-FAIL Name "windows-1254" has label "iso_8859-9" (inputEncoding) assert_equals: expected "windows-1254" but got "ISO-8859-1"
-FAIL Name "windows-1254" has label "iso_8859-9" (charset) assert_equals: expected "windows-1254" but got "ISO-8859-1"
+PASS Name "windows-1254" has label "iso88599" (characterSet) 
+PASS Name "windows-1254" has label "iso88599" (inputEncoding) 
+PASS Name "windows-1254" has label "iso88599" (charset) 
+PASS Name "windows-1254" has label "iso_8859-9" (characterSet) 
+PASS Name "windows-1254" has label "iso_8859-9" (inputEncoding) 
+PASS Name "windows-1254" has label "iso_8859-9" (charset) 
 PASS Name "windows-1254" has label "iso_8859-9:1989" (characterSet) 
 PASS Name "windows-1254" has label "iso_8859-9:1989" (inputEncoding) 
 PASS Name "windows-1254" has label "iso_8859-9:1989" (charset) 
@@ -479,9 +479,9 @@
 PASS Name "windows-1254" has label "windows-1254" (characterSet) 
 PASS Name "windows-1254" has label "windows-1254" (inputEncoding) 
 PASS Name "windows-1254" has label "windows-1254" (charset) 
-FAIL Name "windows-1254" has label "x-cp1254" (characterSet) assert_equals: expected "windows-1254" but got "ISO-8859-1"
-FAIL Name "windows-1254" has label "x-cp1254" (inputEncoding) assert_equals: expected "windows-1254" but got "ISO-8859-1"
-FAIL Name "windows-1254" has label "x-cp1254" (charset) assert_equals: expected "windows-1254" but got "ISO-8859-1"
+PASS Name "windows-1254" has label "x-cp1254" (characterSet) 
+PASS Name "windows-1254" has label "x-cp1254" (inputEncoding) 
+PASS Name "windows-1254" has label "x-cp1254" (charset) 
 PASS Name "windows-1255" has label "cp1255" (characterSet) 
 PASS Name "windows-1255" has label "cp1255" (inputEncoding) 
 PASS Name "windows-1255" has label "cp1255" (charset) 
@@ -488,9 +488,9 @@
 PASS Name "windows-1255" has label "windows-1255" (characterSet) 
 PASS Name "windows-1255" has label "windows-1255" (inputEncoding) 
 PASS Name "windows-1255" has label "windows-1255" (charset) 
-FAIL Name "windows-1255" has label "x-cp1255" (characterSet) assert_equals: expected "windows-1255" but got "ISO-8859-1"
-FAIL Name "windows-1255" has label "x-cp1255" (inputEncoding) assert_equals: expected "windows-1255" but got "ISO-8859-1"
-FAIL Name "windows-1255" has label "x-cp1255" (charset) assert_equals: expected "windows-1255" but got "ISO-8859-1"
+PASS Name "windows-1255" has label "x-cp1255" (characterSet) 
+PASS Name "windows-1255" has label "x-cp1255" (inputEncoding) 
+PASS Name "windows-1255" has label "x-cp1255" (charset) 
 PASS Name "windows-1256" has label "cp1256" (characterSet) 
 PASS Name "windows-1256" has label "cp1256" (inputEncoding) 
 PASS Name "windows-1256" has label "cp1256" (charset) 
@@ -497,9 +497,9 @@
 PASS Name "windows-1256" has label "windows-1256" (characterSet) 
 PASS Name "windows-1256" has label "windows-1256" (inputEncoding) 
 PASS Name "windows-1256" has label "windows-1256" (charset) 
-FAIL Name "windows-1256" has label "x-cp1256" (characterSet) assert_equals: expected "windows-1256" but got "ISO-8859-1"
-FAIL Name "windows-1256" has label "x-cp1256" (inputEncoding) assert_equals: expected "windows-1256" but got "ISO-8859-1"
-FAIL Name "windows-1256" has label "x-cp1256" (charset) assert_equals: expected "windows-1256" but got "ISO-8859-1"
+PASS Name "windows-1256" has label "x-cp1256" (characterSet) 
+PASS Name "windows-1256" has label "x-cp1256" (inputEncoding) 
+PASS Name "windows-1256" has label "x-cp1256" (charset) 
 PASS Name "windows-1257" has label "cp1257" (characterSet) 
 PASS Name "windows-1257" has label "cp1257" (inputEncoding) 
 PASS Name "windows-1257" has label "cp1257" (charset) 
@@ -506,9 +506,9 @@
 PASS Name "windows-1257" has label "windows-1257" (characterSet) 
 PASS Name "windows-1257" has label "windows-1257" (inputEncoding) 
 PASS Name "windows-1257" has label "windows-1257" (charset) 
-FAIL Name "windows-1257" has label "x-cp1257" (characterSet) assert_equals: expected "windows-1257" but got "ISO-8859-1"
-FAIL Name "windows-1257" has label "x-cp1257" (inputEncoding) assert_equals: expected "windows-1257" but got "ISO-8859-1"
-FAIL Name "windows-1257" has label "x-cp1257" (charset) assert_equals: expected "windows-1257" but got "ISO-8859-1"
+PASS Name "windows-1257" has label "x-cp1257" (characterSet) 
+PASS Name "windows-1257" has label "x-cp1257" (inputEncoding) 
+PASS Name "windows-1257" has label "x-cp1257" (charset) 
 PASS Name "windows-1258" has label "cp1258" (characterSet) 
 PASS Name "windows-1258" has label "cp1258" (inputEncoding) 
 PASS Name "windows-1258" has label "cp1258" (charset) 
@@ -515,9 +515,9 @@
 PASS Name "windows-1258" has label "windows-1258" (characterSet) 
 PASS Name "windows-1258" has label "windows-1258" (inputEncoding) 
 PASS Name "windows-1258" has label "windows-1258" (charset) 
-FAIL Name "windows-1258" has label "x-cp1258" (characterSet) assert_equals: expected "windows-1258" but got "ISO-8859-1"
-FAIL Name "windows-1258" has label "x-cp1258" (inputEncoding) assert_equals: expected "windows-1258" but got "ISO-8859-1"
-FAIL Name "windows-1258" has label "x-cp1258" (charset) assert_equals: expected "windows-1258" but got "ISO-8859-1"
+PASS Name "windows-1258" has label "x-cp1258" (characterSet) 
+PASS Name "windows-1258" has label "x-cp1258" (inputEncoding) 
+PASS Name "windows-1258" has label "x-cp1258" (charset) 
 PASS Name "x-mac-cyrillic" has label "x-mac-cyrillic" (characterSet) 
 PASS Name "x-mac-cyrillic" has label "x-mac-cyrillic" (inputEncoding) 
 PASS Name "x-mac-cyrillic" has label "x-mac-cyrillic" (charset) 
@@ -536,9 +536,9 @@
 PASS Name "GBK" has label "gb2312" (characterSet) 
 PASS Name "GBK" has label "gb2312" (inputEncoding) 
 PASS Name "GBK" has label "gb2312" (charset) 
-FAIL Name "GBK" has label "gb_2312" (characterSet) assert_equals: expected "GBK" but got "ISO-8859-1"
-FAIL Name "GBK" has label "gb_2312" (inputEncoding) assert_equals: expected "GBK" but got "ISO-8859-1"
-FAIL Name "GBK" has label "gb_2312" (charset) assert_equals: expected "GBK" but got "ISO-8859-1"
+PASS Name "GBK" has label "gb_2312" (characterSet) 
+PASS Name "GBK" has label "gb_2312" (inputEncoding) 
+PASS Name "GBK" has label "gb_2312" (charset) 
 PASS Name "GBK" has label "gb_2312-80" (characterSet) 
 PASS Name "GBK" has label "gb_2312-80" (inputEncoding) 
 PASS Name "GBK" has label "gb_2312-80" (charset) 
@@ -551,9 +551,9 @@
 PASS Name "GBK" has label "x-gbk" (characterSet) 
 PASS Name "GBK" has label "x-gbk" (inputEncoding) 
 PASS Name "GBK" has label "x-gbk" (charset) 
-FAIL Name "gb18030" has label "gb18030" (characterSet) assert_equals: expected "gb18030" but got "GB18030"
-FAIL Name "gb18030" has label "gb18030" (inputEncoding) assert_equals: expected "gb18030" but got "GB18030"
-FAIL Name "gb18030" has label "gb18030" (charset) assert_equals: expected "gb18030" but got "GB18030"
+PASS Name "gb18030" has label "gb18030" (characterSet) 
+PASS Name "gb18030" has label "gb18030" (inputEncoding) 
+PASS Name "gb18030" has label "gb18030" (charset) 
 PASS Name "Big5" has label "big5" (characterSet) 
 PASS Name "Big5" has label "big5" (inputEncoding) 
 PASS Name "Big5" has label "big5" (charset) 

Modified: trunk/Source/WebCore/ChangeLog (204604 => 204605)


--- trunk/Source/WebCore/ChangeLog	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/Source/WebCore/ChangeLog	2016-08-18 19:53:35 UTC (rev 204605)
@@ -1,3 +1,26 @@
+2016-08-18  Chris Dumez  <[email protected]>
+
+        Align our encoding labels with the encoding specification
+        https://bugs.webkit.org/show_bug.cgi?id=160931
+
+        Reviewed by Darin Adler.
+
+        Align our encoding labels with the encoding specification:
+        - https://encoding.spec.whatwg.org/#names-and-labels
+
+        This also aligns with Firefox and Chrome.
+
+        No new tests, rebaselined existing test.
+
+        * platform/text/TextCodecICU.cpp:
+        (WebCore::TextCodecICU::registerEncodingNames):
+        * platform/text/TextCodecLatin1.cpp:
+        (WebCore::TextCodecLatin1::registerEncodingNames):
+        (WebCore::newStreamingTextDecoderWindowsLatin1): Deleted.
+        (WebCore::TextCodecLatin1::registerCodecs): Deleted.
+        * platform/text/TextCodecUTF8.cpp:
+        (WebCore::TextCodecUTF8::registerEncodingNames):
+
 2016-08-18  Andy Estes  <[email protected]>
 
         [Cocoa] Add SPI to WKProcessPool for enabling cookie storage partitioning

Modified: trunk/Source/WebCore/platform/text/TextCodecICU.cpp (204604 => 204605)


--- trunk/Source/WebCore/platform/text/TextCodecICU.cpp	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/Source/WebCore/platform/text/TextCodecICU.cpp	2016-08-18 19:53:35 UTC (rev 204605)
@@ -72,126 +72,115 @@
     return std::make_unique<TextCodecICU>(encoding.name(), static_cast<const char*>(additionalData));
 }
 
-void TextCodecICU::registerEncodingNames(EncodingNameRegistrar registrar)
-{
-    // We register Hebrew with logical ordering using a separate name.
-    // Otherwise, this would share the same canonical name as the
-    // visual ordering case, and then TextEncoding could not tell them
-    // apart; ICU treats these names as synonyms.
-    registrar("ISO-8859-8-I", "ISO-8859-8-I");
+#define DECLARE_ALIASES(encoding, ...) \
+    static const char* const encoding##_aliases[] { __VA_ARGS__ }
 
-    int32_t numConverters = ucnv_countAvailable();
-    for (int32_t i = 0; i < numConverters; ++i) {
-        const char* canonicalConverterName = ucnv_getAvailableName(i);
-        UErrorCode error = U_ZERO_ERROR;
-        // Try MIME before trying IANA to pick up commonly used names like
-        // 'EUC-JP' instead of horrendously long names like 
-        // 'Extended_UNIX_Code_Packed_Format_for_Japanese'. 
-        const char* webStandardName = ucnv_getStandardName(canonicalConverterName, "MIME", &error);
-        if (!U_SUCCESS(error) || !webStandardName) {
-            error = U_ZERO_ERROR;
-            // Try IANA to pick up 'windows-12xx' and other names
-            // which are not preferred MIME names but are widely used. 
-            webStandardName = ucnv_getStandardName(canonicalConverterName, "IANA", &error);
-            if (!U_SUCCESS(error) || !webStandardName)
-                continue;
-        }
+// From https://encoding.spec.whatwg.org.
+DECLARE_ALIASES(IBM866, "866", "cp866", "csibm866");
+DECLARE_ALIASES(ISO_8859_2, "csisolatin2", "iso-ir-101", "iso8859-2", "iso88592", "iso_8859-2", "iso_8859-2:1987", "l2", "latin2");
+DECLARE_ALIASES(ISO_8859_3, "csisolatin3", "iso-ir-109", "iso8859-3", "iso88593", "iso_8859-3", "iso_8859-3:1988", "l3", "latin3");
+DECLARE_ALIASES(ISO_8859_4, "csisolatin4", "iso-ir-110", "iso8859-4", "iso88594", "iso_8859-4", "iso_8859-4:1988", "l4", "latin4");
+DECLARE_ALIASES(ISO_8859_5, "csisolatincyrillic", "cyrillic", "iso-ir-144", "iso8859-5", "iso88595", "iso_8859-5", "iso_8859-5:1988");
+DECLARE_ALIASES(ISO_8859_6, "arabic", "asmo-708", "csiso88596e", "csiso88596i", "csisolatinarabic", "ecma-114", "iso-8859-6-e", "iso-8859-6-i", "iso-ir-127", "iso8859-6", "iso88596", "iso_8859-6", "iso_8859-6:1987");
+DECLARE_ALIASES(ISO_8859_7, "csisolatingreek", "ecma-118", "elot_928", "greek", "greek8", "iso-ir-126", "iso8859-7", "iso88597", "iso_8859-7", "iso_8859-7:1987", "sun_eu_greek");
+DECLARE_ALIASES(ISO_8859_8, "csiso88598e", "csisolatinhebrew", "hebrew", "iso-8859-8-e", "iso-ir-138", "iso8859-8", "iso88598", "iso_8859-8", "iso_8859-8:1988", "visual");
+DECLARE_ALIASES(ISO_8859_8_I, "csiso88598i", "logical");
+DECLARE_ALIASES(ISO_8859_10, "csisolatin6", "iso-ir-157", "iso8859-10", "iso885910", "l6", "latin6");
+DECLARE_ALIASES(ISO_8859_13, "iso8859-13", "iso885913");
+DECLARE_ALIASES(ISO_8859_14, "iso8859-14", "iso885914");
+DECLARE_ALIASES(ISO_8859_15, "csisolatin9", "iso8859-15", "iso885915", "iso_8859-15", "l9");
+DECLARE_ALIASES(KOI8_R, "cskoi8r", "koi", "koi8", "koi8_r");
+DECLARE_ALIASES(KOI8_U, "koi8-ru");
+DECLARE_ALIASES(macintosh, "csmacintosh", "mac", "x-mac-roman", "macroman", "x-macroman");
+DECLARE_ALIASES(windows_874, "dos-874", "iso-8859-11", "iso8859-11", "iso885911", "tis-620");
+DECLARE_ALIASES(windows_949, "euc-kr", "cseuckr", "csksc56011987", "iso-ir-149", "korean", "ks_c_5601-1987", "ks_c_5601-1989", "ksc5601", "ksc_5601", "ms949", "x-KSC5601", "x-windows-949", "x-uhc");
+DECLARE_ALIASES(windows_1250, "cp1250", "x-cp1250", "winlatin2");
+DECLARE_ALIASES(windows_1251, "cp1251", "wincyrillic", "x-cp1251");
+DECLARE_ALIASES(windows_1253, "wingreek", "cp1253", "x-cp1253");
+DECLARE_ALIASES(windows_1254, "winturkish", "cp1254", "csisolatin5", "iso-8859-9", "iso-ir-148", "iso8859-9", "iso88599", "iso_8859-9", "iso_8859-9:1989", "l5", "latin5", "x-cp1254");
+DECLARE_ALIASES(windows_1255, "winhebrew", "cp1255", "x-cp1255");
+DECLARE_ALIASES(windows_1256, "winarabic", "cp1256", "x-cp1256");
+DECLARE_ALIASES(windows_1257, "winbaltic", "cp1257", "x-cp1257");
+DECLARE_ALIASES(windows_1258, "winvietnamese", "cp1258", "x-cp1258");
+DECLARE_ALIASES(x_mac_cyrillic, "maccyrillic", "x-mac-ukrainian", "windows-10007", "mac-cyrillic", "maccy", "x-MacCyrillic", "x-MacUkraine");
+DECLARE_ALIASES(GBK, "cn-gb", "csgb231280", "x-euc-cn", "chinese", "csgb2312", "csiso58gb231280", "gb2312", "gb_2312", "gb_2312-80", "iso-ir-58", "x-gbk", "euc-cn", "cp936", "ms936", "gb2312-1980", "windows-936", "windows-936-2000");
+DECLARE_ALIASES(gb18030, "ibm-1392", "windows-54936");
+DECLARE_ALIASES(Big5, "cn-big5", "x-x-big5", "csbig5", "windows-950", "windows-950-2000", "ms950", "x-windows-950", "x-big5");
+DECLARE_ALIASES(EUC_JP, "x-euc", "cseucpkdfmtjapanese", "x-euc-jp");
+DECLARE_ALIASES(ISO_2022_JP, "jis7", "csiso2022jp");
+DECLARE_ALIASES(Shift_JIS, "shift-jis", "csshiftjis", "ms932", "ms_kanji", "sjis", "windows-31j", "x-sjis");
+// Encodings below are not in the standard.
+DECLARE_ALIASES(UTF_32, "ISO-10646-UCS-4", "ibm-1236", "ibm-1237", "csUCS4", "ucs-4");
+DECLARE_ALIASES(UTF_32LE, "UTF32_LittleEndian", "ibm-1234", "ibm-1235");
+DECLARE_ALIASES(UTF_32BE, "UTF32_BigEndian", "ibm-1232", "ibm-1233", "ibm-9424");
+DECLARE_ALIASES(x_mac_greek, "windows-10006", "macgr", "x-MacGreek");
+DECLARE_ALIASES(x_mac_centraleurroman, "windows-10029", "x-mac-ce", "macce", "maccentraleurope", "x-MacCentralEurope");
+DECLARE_ALIASES(x_mac_turkish, "windows-10081", "mactr", "x-MacTurkish");
+DECLARE_ALIASES(Big5_HKSCS, "big5hk", "HKSCS-BIG5", "ibm-1375", "ibm-1375_P100-2008");
 
-        // Any standard encoding overrides should match checks in registerCodecs() below.
+#define DECLARE_ENCODING_NAME(encoding, alias_array) \
+    { encoding, WTF_ARRAY_LENGTH(alias_array##_aliases), alias_array##_aliases }
 
-        // 1. Treat GB2312 encoding as GBK (its more modern superset), to match other browsers.
-        // 2. On the Web, GB2312 is encoded as EUC-CN or HZ, while ICU provides a native encoding
-        //    for encoding GB_2312-80 and several others. So, we need to override this behavior, too.
-        if (strcmp(webStandardName, "GB2312") == 0 || strcmp(webStandardName, "GB_2312-80") == 0)
-            webStandardName = "GBK";
-        // Similarly, EUC-KR encodings all map to an extended version.
-        else if (strcmp(webStandardName, "KSC_5601") == 0 || strcmp(webStandardName, "EUC-KR") == 0 || strcmp(webStandardName, "cp1363") == 0)
-            webStandardName = "windows-949";
-        // And so on.
-        // FIXME: strcasecmp is locale sensitive, we should not be using it.
-        else if (strcasecmp(webStandardName, "iso-8859-9") == 0) // This name is returned in different case by ICU 3.2 and 3.6.
-            webStandardName = "windows-1254";
-        else if (strcmp(webStandardName, "TIS-620") == 0)
-            webStandardName = "windows-874";
+#define DECLARE_ENCODING_NAME_NO_ALIASES(encoding) \
+    { encoding, 0, nullptr }
 
-        registrar(webStandardName, webStandardName);
+static const struct EncodingName {
+    const char* const name;
+    unsigned aliasCount;
+    const char* const * aliases;
+} encodingNames[] = {
+    DECLARE_ENCODING_NAME("IBM866", IBM866),
+    DECLARE_ENCODING_NAME("ISO-8859-2", ISO_8859_2),
+    DECLARE_ENCODING_NAME("ISO-8859-3", ISO_8859_3),
+    DECLARE_ENCODING_NAME("ISO-8859-4", ISO_8859_4),
+    DECLARE_ENCODING_NAME("ISO-8859-5", ISO_8859_5),
+    DECLARE_ENCODING_NAME("ISO-8859-6", ISO_8859_6),
+    DECLARE_ENCODING_NAME("ISO-8859-7", ISO_8859_7),
+    DECLARE_ENCODING_NAME("ISO-8859-8", ISO_8859_8),
+    DECLARE_ENCODING_NAME("ISO-8859-8-I", ISO_8859_8_I),
+    DECLARE_ENCODING_NAME("ISO-8859-10", ISO_8859_10),
+    DECLARE_ENCODING_NAME("ISO-8859-13", ISO_8859_13),
+    DECLARE_ENCODING_NAME("ISO-8859-14", ISO_8859_14),
+    DECLARE_ENCODING_NAME("ISO-8859-15", ISO_8859_15),
+    DECLARE_ENCODING_NAME_NO_ALIASES("ISO-8859-16"),
+    DECLARE_ENCODING_NAME("KOI8-R", KOI8_R),
+    DECLARE_ENCODING_NAME("KOI8-U", KOI8_U),
+    DECLARE_ENCODING_NAME("macintosh", macintosh),
+    DECLARE_ENCODING_NAME("windows-874", windows_874),
+    DECLARE_ENCODING_NAME("windows-949", windows_949),
+    DECLARE_ENCODING_NAME("windows-1250", windows_1250),
+    DECLARE_ENCODING_NAME("windows-1251", windows_1251),
+    DECLARE_ENCODING_NAME("windows-1253", windows_1253),
+    DECLARE_ENCODING_NAME("windows-1254", windows_1254),
+    DECLARE_ENCODING_NAME("windows-1255", windows_1255),
+    DECLARE_ENCODING_NAME("windows-1256", windows_1256),
+    DECLARE_ENCODING_NAME("windows-1257", windows_1257),
+    DECLARE_ENCODING_NAME("windows-1258", windows_1258),
+    DECLARE_ENCODING_NAME("x-mac-cyrillic", x_mac_cyrillic),
+    DECLARE_ENCODING_NAME("GBK", GBK),
+    DECLARE_ENCODING_NAME("gb18030", gb18030),
+    DECLARE_ENCODING_NAME("Big5", Big5),
+    DECLARE_ENCODING_NAME("EUC-JP", EUC_JP),
+    DECLARE_ENCODING_NAME("ISO-2022-JP", ISO_2022_JP),
+    DECLARE_ENCODING_NAME("Shift_JIS", Shift_JIS),
+    // Encodings below are not in the standard.
+    DECLARE_ENCODING_NAME("UTF-32", UTF_32),
+    DECLARE_ENCODING_NAME("UTF-32LE", UTF_32LE),
+    DECLARE_ENCODING_NAME("UTF-32BE", UTF_32BE),
+    DECLARE_ENCODING_NAME("x-mac-greek", x_mac_greek),
+    DECLARE_ENCODING_NAME("x-mac-centraleurroman", x_mac_centraleurroman),
+    DECLARE_ENCODING_NAME("x-mac-turkish", x_mac_turkish),
+    DECLARE_ENCODING_NAME("Big5-HKSCS", Big5_HKSCS),
+};
 
-        uint16_t numAliases = ucnv_countAliases(canonicalConverterName, &error);
-        ASSERT(U_SUCCESS(error));
-        if (U_SUCCESS(error))
-            for (uint16_t j = 0; j < numAliases; ++j) {
-                error = U_ZERO_ERROR;
-                const char* alias = ucnv_getAlias(canonicalConverterName, j, &error);
-                ASSERT(U_SUCCESS(error));
-                if (U_SUCCESS(error) && alias != webStandardName)
-                    registrar(alias, webStandardName);
-            }
+void TextCodecICU::registerEncodingNames(EncodingNameRegistrar registrar)
+{
+    for (auto& encodingName : encodingNames) {
+        registrar(encodingName.name, encodingName.name);
+        for (size_t i = 0; i < encodingName.aliasCount; ++i)
+            registrar(encodingName.aliases[i], encodingName.name);
     }
 
-    // Additional aliases.
-    // macroman is present in modern versions of ICU, but not in ICU 3.2 (shipped with Mac OS X 10.4).
-    // FIXME: Do any ports still use such old versions?
-    registrar("macroman", "macintosh");
-
-    // Additional aliases that historically were present in the encoding
-    // table in WebKit on Macintosh that don't seem to be present in ICU.
-    // Perhaps we can prove these are not used on the web and remove them.
-    // Or perhaps we can get them added to ICU.
-    registrar("x-mac-roman", "macintosh");
-    registrar("maccyrillic", "x-mac-cyrillic");
-    registrar("x-mac-ukrainian", "x-mac-cyrillic");
-    registrar("cn-big5", "Big5");
-    registrar("x-x-big5", "Big5");
-    registrar("cn-gb", "GBK");
-    registrar("csgb231280", "GBK");
-    registrar("x-euc-cn", "GBK");
-    registrar("x-gbk", "GBK");
-    registrar("csISO88598I", "ISO-8859-8-I");
-    registrar("koi", "KOI8-R");
-    registrar("logical", "ISO-8859-8-I");
-    registrar("visual", "ISO-8859-8");
-    registrar("winarabic", "windows-1256");
-    registrar("winbaltic", "windows-1257");
-    registrar("wincyrillic", "windows-1251");
-    registrar("iso-8859-11", "windows-874");
-    registrar("iso8859-11", "windows-874");
-    registrar("dos-874", "windows-874");
-    registrar("wingreek", "windows-1253");
-    registrar("winhebrew", "windows-1255");
-    registrar("winlatin2", "windows-1250");
-    registrar("winturkish", "windows-1254");
-    registrar("winvietnamese", "windows-1258");
-    registrar("x-cp1250", "windows-1250");
-    registrar("x-cp1251", "windows-1251");
-    registrar("x-euc", "EUC-JP");
-    registrar("x-windows-949", "windows-949");
-    registrar("KSC5601", "windows-949");
-    registrar("x-uhc", "windows-949");
-    registrar("shift-jis", "Shift_JIS");
-
-    // These aliases are present in modern versions of ICU, but use different codecs, and have no standard names.
-    // They are not present in ICU 3.2.
-    registrar("dos-720", "cp864");
-    registrar("jis7", "ISO-2022-JP");
-
-    // Alternative spelling of ISO encoding names.
-    registrar("ISO8859-1", "ISO-8859-1");
-    registrar("ISO8859-2", "ISO-8859-2");
-    registrar("ISO8859-3", "ISO-8859-3");
-    registrar("ISO8859-4", "ISO-8859-4");
-    registrar("ISO8859-5", "ISO-8859-5");
-    registrar("ISO8859-6", "ISO-8859-6");
-    registrar("ISO8859-7", "ISO-8859-7");
-    registrar("ISO8859-8", "ISO-8859-8");
-    registrar("ISO8859-8-I", "ISO-8859-8-I");
-    registrar("ISO8859-9", "windows-1254");
-    registrar("ISO8859-10", "ISO-8859-10");
-    registrar("ISO8859-13", "ISO-8859-13");
-    registrar("ISO8859-14", "ISO-8859-14");
-    registrar("ISO8859-15", "ISO-8859-15");
-    // Not registering ISO8859-16, because Firefox (as of version 3.6.6) doesn't know this particular alias,
-    // and because older versions of ICU don't support ISO-8859-16 encoding at all.
-
 #if PLATFORM(IOS)
     // A.B. adding a few more Mac encodings missing 'cause we don't have TextCodecMac right now
     // luckily, they are supported in ICU, just need to alias them.
@@ -218,40 +207,40 @@
 
 void TextCodecICU::registerCodecs(TextCodecRegistrar registrar)
 {
-    // See comment above in registerEncodingNames.
-    UErrorCode error = U_ZERO_ERROR;
-    const char* canonicalConverterName = ucnv_getCanonicalName("ISO-8859-8-I", "IANA", &error);
-    ASSERT(U_SUCCESS(error));
-    registrar("ISO-8859-8-I", create, canonicalConverterName);
-
-    int32_t numConverters = ucnv_countAvailable();
-    for (int32_t i = 0; i < numConverters; ++i) {
-        canonicalConverterName = ucnv_getAvailableName(i);
-        error = U_ZERO_ERROR;
-        const char* webStandardName = ucnv_getStandardName(canonicalConverterName, "MIME", &error);
-        if (!U_SUCCESS(error) || !webStandardName) {
-            error = U_ZERO_ERROR;
-            webStandardName = ucnv_getStandardName(canonicalConverterName, "IANA", &error);
-            if (!U_SUCCESS(error) || !webStandardName)
-                continue;
+    for (auto& encodingName : encodingNames) {
+        // These encodings currently don't have standard names, so we need to register encoders manually.
+        // http://demo.icu-project.org/icu-bin/convexp
+        if (!strcmp(encodingName.name, "windows-874")) {
+            registrar(encodingName.name, create, "windows-874-2000");
+            continue;
         }
-
-        // Don't register codecs for overridden encodings.
-        if (strcmp(webStandardName, "GB2312") == 0 || strcmp(webStandardName, "GB_2312-80") == 0
-            || strcmp(webStandardName, "KSC_5601") == 0 || strcmp(webStandardName, "EUC-KR") == 0
-            || strcmp(webStandardName, "cp1363") == 0
-            || strcasecmp(webStandardName, "iso-8859-9") == 0
-            || strcmp(webStandardName, "TIS-620") == 0)
+        if (!strcmp(encodingName.name, "windows-949")) {
+            registrar(encodingName.name, create, "windows-949-2000");
             continue;
+        }
+        if (!strcmp(encodingName.name, "x-mac-cyrillic")) {
+            registrar(encodingName.name, create, "macos-7_3-10.2");
+            continue;
+        }
+        if (!strcmp(encodingName.name, "x-mac-greek")) {
+            registrar(encodingName.name, create, "macos-6_2-10.4");
+            continue;
+        }
+        if (!strcmp(encodingName.name, "x-mac-centraleurroman")) {
+            registrar(encodingName.name, create, "macos-29-10.2");
+            continue;
+        }
+        if (!strcmp(encodingName.name, "x-mac-turkish")) {
+            registrar(encodingName.name, create, "macos-35-10.2");
+            continue;
+        }
 
-        registrar(webStandardName, create, fastStrDup(canonicalConverterName));
+        UErrorCode error = U_ZERO_ERROR;
+        const char* canonicalConverterName = ucnv_getCanonicalName(encodingName.name, "IANA", &error);
+        ASSERT(U_SUCCESS(error));
+        registrar(encodingName.name, create, canonicalConverterName);
     }
 
-    // These encodings currently don't have standard names, so we need to register encoders manually.
-    // FIXME: Is there a good way to determine the most up to date variant programmatically?
-    registrar("windows-874", create, "windows-874-2000");
-    registrar("windows-949", create, "windows-949-2000");
-
 #if PLATFORM(IOS)
     // See comment above in registerEncodingNames().
     int32_t i = 0;

Modified: trunk/Source/WebCore/platform/text/TextCodecLatin1.cpp (204604 => 204605)


--- trunk/Source/WebCore/platform/text/TextCodecLatin1.cpp	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/Source/WebCore/platform/text/TextCodecLatin1.cpp	2016-08-18 19:53:35 UTC (rev 204605)
@@ -72,35 +72,24 @@
 
 void TextCodecLatin1::registerEncodingNames(EncodingNameRegistrar registrar)
 {
+    // From https://encoding.spec.whatwg.org.
     registrar("windows-1252", "windows-1252");
-    registrar("ISO-8859-1", "ISO-8859-1");
-    registrar("US-ASCII", "US-ASCII");
-
-    registrar("WinLatin1", "windows-1252");
-    registrar("ibm-1252", "windows-1252");
-    registrar("ibm-1252_P100-2000", "windows-1252");
-
-    registrar("CP819", "ISO-8859-1");
-    registrar("IBM819", "ISO-8859-1");
-    registrar("csISOLatin1", "ISO-8859-1");
-    registrar("iso-ir-100", "ISO-8859-1");
-    registrar("iso_8859-1:1987", "ISO-8859-1");
-    registrar("l1", "ISO-8859-1");
-    registrar("latin1", "ISO-8859-1");
-
-    registrar("ANSI_X3.4-1968", "US-ASCII");
-    registrar("ANSI_X3.4-1986", "US-ASCII");
-    registrar("ASCII", "US-ASCII");
-    registrar("IBM367", "US-ASCII");
-    registrar("ISO646-US", "US-ASCII");
-    registrar("ISO_646.irv:1991", "US-ASCII");
-    registrar("cp367", "US-ASCII");
-    registrar("csASCII", "US-ASCII");
-    registrar("ibm-367_P100-1995", "US-ASCII");
-    registrar("iso-ir-6", "US-ASCII");
-    registrar("iso-ir-6-us", "US-ASCII");
-    registrar("us", "US-ASCII");
-    registrar("x-ansi", "US-ASCII");
+    registrar("ansi_x3.4-1968", "windows-1252");
+    registrar("ascii", "windows-1252");
+    registrar("cp1252", "windows-1252");
+    registrar("cp819", "windows-1252");
+    registrar("csisolatin1", "windows-1252");
+    registrar("ibm819", "windows-1252");
+    registrar("iso-8859-1", "windows-1252");
+    registrar("iso-ir-100", "windows-1252");
+    registrar("iso8859-1", "windows-1252");
+    registrar("iso88591", "windows-1252");
+    registrar("iso_8859-1", "windows-1252");
+    registrar("iso_8859-1:1987", "windows-1252");
+    registrar("l1", "windows-1252");
+    registrar("latin1", "windows-1252");
+    registrar("us-ascii", "windows-1252");
+    registrar("x-cp1252", "windows-1252");
 }
 
 static std::unique_ptr<TextCodec> newStreamingTextDecoderWindowsLatin1(const TextEncoding&, const void*)
@@ -111,10 +100,6 @@
 void TextCodecLatin1::registerCodecs(TextCodecRegistrar registrar)
 {
     registrar("windows-1252", newStreamingTextDecoderWindowsLatin1, 0);
-
-    // ASCII and Latin-1 both decode as Windows Latin-1 although they retain unique identities.
-    registrar("ISO-8859-1", newStreamingTextDecoderWindowsLatin1, 0);
-    registrar("US-ASCII", newStreamingTextDecoderWindowsLatin1, 0);
 }
 
 String TextCodecLatin1::decode(const char* bytes, size_t length, bool, bool, bool&)

Modified: trunk/Source/WebCore/platform/text/TextCodecUTF8.cpp (204604 => 204605)


--- trunk/Source/WebCore/platform/text/TextCodecUTF8.cpp	2016-08-18 19:04:28 UTC (rev 204604)
+++ trunk/Source/WebCore/platform/text/TextCodecUTF8.cpp	2016-08-18 19:53:35 UTC (rev 204605)
@@ -45,7 +45,10 @@
 
 void TextCodecUTF8::registerEncodingNames(EncodingNameRegistrar registrar)
 {
+    // From https://encoding.spec.whatwg.org.
     registrar("UTF-8", "UTF-8");
+    registrar("utf8", "UTF-8");
+    registrar("unicode-1-1-utf-8", "UTF-8");
 
     // Additional aliases that originally were present in the encoding
     // table in WebKit on Macintosh, and subsequently added by
@@ -53,7 +56,6 @@
     // and remove them.
     registrar("unicode11utf8", "UTF-8");
     registrar("unicode20utf8", "UTF-8");
-    registrar("utf8", "UTF-8");
     registrar("x-unicode20utf8", "UTF-8");
 }
 
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to