Title: [105691] trunk
Revision
105691
Author
[email protected]
Date
2012-01-23 21:08:14 -0800 (Mon, 23 Jan 2012)

Log Message

decodeEscapeSequences() not correct for some encodings (GBK, Big5, ...).
https://bugs.webkit.org/show_bug.cgi?id=71316

Reviewed by Daniel Bates.

Source/WebCore:

Pass trailing unescaped bytes into the character set decoder to get correct
results in the presence of encodings which re-use ASCII values in sequences.

Tests: http/tests/navigation/anchor-frames-gbk.html
       http/tests/security/xssAuditor/iframe-onload-GBK-char.html
       http/tests/security/xssAuditor/img-onerror-GBK-char.html
       http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html
       http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html
       http/tests/security/xssAuditor/script-tag-Big5-char.html
       http/tests/security/xssAuditor/script-tag-Big5-char2.html

* platform/text/DecodeEscapeSequences.h:
(WebCore::Unicode16BitEscapeSequence::findInString):
(WebCore::Unicode16BitEscapeSequence::findEndOfRun):
(WebCore::Unicode16BitEscapeSequence::decodeRun):
(WebCore::URLEscapeSequence::findInString):
(WebCore::URLEscapeSequence::findEndOfRun):
(WebCore::URLEscapeSequence::decodeRun):
(WebCore::decodeEscapeSequences):

LayoutTests:

* http/tests/navigation/anchor-frames-gbk-expected.txt: Added.
* http/tests/navigation/anchor-frames-gbk.html: Added.
* http/tests/navigation/resources/frame-with-anchor-gbk.html: Added.
* http/tests/security/xssAuditor/iframe-onload-GBK-char-expected.txt: Added.
* http/tests/security/xssAuditor/iframe-onload-GBK-char.html: Added.
* http/tests/security/xssAuditor/img-onerror-GBK-char-expected.txt: Added.
* http/tests/security/xssAuditor/img-onerror-GBK-char.html: Added.
* http/tests/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl:
* http/tests/security/xssAuditor/script-tag-Big5-char-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char2-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char2.html: Added.
* platform/chromium/test_expectations.txt:

Modified Paths

Added Paths

Diff

Modified: trunk/LayoutTests/ChangeLog (105690 => 105691)


--- trunk/LayoutTests/ChangeLog	2012-01-24 05:04:53 UTC (rev 105690)
+++ trunk/LayoutTests/ChangeLog	2012-01-24 05:08:14 UTC (rev 105691)
@@ -1,3 +1,28 @@
+2012-01-23  Tom Sepez  <[email protected]>
+
+        decodeEscapeSequences() not correct for some encodings (GBK, Big5, ...).
+        https://bugs.webkit.org/show_bug.cgi?id=71316
+
+        Reviewed by Daniel Bates.
+
+        * http/tests/navigation/anchor-frames-gbk-expected.txt: Added.
+        * http/tests/navigation/anchor-frames-gbk.html: Added.
+        * http/tests/navigation/resources/frame-with-anchor-gbk.html: Added.
+        * http/tests/security/xssAuditor/iframe-onload-GBK-char-expected.txt: Added.
+        * http/tests/security/xssAuditor/iframe-onload-GBK-char.html: Added.
+        * http/tests/security/xssAuditor/img-onerror-GBK-char-expected.txt: Added.
+        * http/tests/security/xssAuditor/img-onerror-GBK-char.html: Added.
+        * http/tests/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl:
+        * http/tests/security/xssAuditor/script-tag-Big5-char-expected.txt: Added.
+        * http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode-expected.txt: Added.
+        * http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html: Added.
+        * http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-expected.txt: Added.
+        * http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html: Added.
+        * http/tests/security/xssAuditor/script-tag-Big5-char.html: Added.
+        * http/tests/security/xssAuditor/script-tag-Big5-char2-expected.txt: Added.
+        * http/tests/security/xssAuditor/script-tag-Big5-char2.html: Added.
+        * platform/chromium/test_expectations.txt:
+
 2012-01-23  Zan Dobersek  <[email protected]>
 
         [GTK] editing/deleting/5408255.html results are incorrect

Added: trunk/LayoutTests/http/tests/navigation/anchor-frames-gbk-expected.txt (0 => 105691)


--- trunk/LayoutTests/http/tests/navigation/anchor-frames-gbk-expected.txt	                        (rev 0)
+++ trunk/LayoutTests/http/tests/navigation/anchor-frames-gbk-expected.txt	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,22 @@
+
+
+--------
+Frame: 'main'
+--------
+Tests that loading a frame with a URL that contains a fragment pointed at a named anchor actually scrolls to that anchor.
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS document.body.offsetHeight > document.documentElement.clientHeight is true
+PASS document.body.scrollTop > 0 is true
+PASS document.body.scrollTop + document.documentElement.clientHeight > 2000 is true
+PASS successfullyParsed is true
+
+TEST COMPLETE
+This is an anchor point named as the Unicode equivalent of the GBK sequence %a9g (test trailing low byte).
+
+--------
+Frame: 'footer'
+--------
+

Added: trunk/LayoutTests/http/tests/navigation/anchor-frames-gbk.html (0 => 105691)


--- trunk/LayoutTests/http/tests/navigation/anchor-frames-gbk.html	                        (rev 0)
+++ trunk/LayoutTests/http/tests/navigation/anchor-frames-gbk.html	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,14 @@
+<!DOCTYPE html>
+<html>
+  <meta http-equiv="Content-Type" content="text/html; charset=gbk"/>
+  <!-- See resources/frame-with-anchor-gbk.html for description of test -->
+  <!-- See also https://bugs.webkit.org/show_bug.cgi?id=71316 -->
+  <script>
+    if (window.layoutTestController)
+        layoutTestController.dumpChildFramesAsText();
+  </script>
+  <frameset rows="90%,10%">
+    <frame src="" name="main">
+    <frame src="" name="footer">
+  </frameset>
+</html>

Added: trunk/LayoutTests/http/tests/navigation/resources/frame-with-anchor-gbk.html (0 => 105691)


--- trunk/LayoutTests/http/tests/navigation/resources/frame-with-anchor-gbk.html	                        (rev 0)
+++ trunk/LayoutTests/http/tests/navigation/resources/frame-with-anchor-gbk.html	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,41 @@
+<!DOCTYPE html>
+<html>
+<head>
+  <meta http-equiv="Content-Type" content="text/html; charset=gbk"/>
+  <script src=""
+  <script>
+    function runTest() {
+      description('Tests that loading a frame with a URL that contains a fragment pointed at a named anchor actually scrolls to that anchor.');
+
+      // Check scroll position in a timeout to make sure that the anchor has
+      // been scrolled to.
+      setTimeout(function() {
+          // Make sure that the body is taller than the viewport (i.e. scrolling is
+          // required).
+          shouldBeTrue('document.body.offsetHeight > document.documentElement.clientHeight');
+          
+          // We should be scrolled at least a little bit
+          shouldBeTrue('document.body.scrollTop > 0');
+          
+          // And the bottom of the viewable area should be at least 2000 pixels from the top, due to the spacer element above.
+          shouldBeTrue('document.body.scrollTop + document.documentElement.clientHeight > 2000');
+
+          finishJSTest();          
+      }, 0);
+    }
+    
+    var jsTestIsAsync = true;
+  </script>  
+</head>
+<body _onload_="runTest()">
+<p id="description"></p>
+<div id="console"></div>
+
+<div style="height: 2000px">
+  <!-- Spacer to make sure that the named anchor below requires scrolling -->
+</div>
+
+<a name="&#x586f">This is an anchor point named as the Unicode equivalent of the GBK sequence %a9g (test trailing low byte)</a>.
+<script src=""
+</body>
+</html>

Added: trunk/LayoutTests/http/tests/security/xssAuditor/iframe-onload-GBK-char-expected.txt (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/iframe-onload-GBK-char-expected.txt	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/iframe-onload-GBK-char-expected.txt	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,3 @@
+CONSOLE MESSAGE: Refused to execute a _javascript_ script. Source code of script found within request.
+
+

Added: trunk/LayoutTests/http/tests/security/xssAuditor/iframe-onload-GBK-char.html (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/iframe-onload-GBK-char.html	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/iframe-onload-GBK-char.html	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,15 @@
+<!DOCTYPE html>
+<html>
+<head>
+<script>
+if (window.layoutTestController) {
+    layoutTestController.dumpAsText();
+    layoutTestController.setXSSAuditorEnabled(true);
+}
+</script>
+</head>
+<body>
+<iframe src=""
+</iframe>
+</body>
+</html>

Added: trunk/LayoutTests/http/tests/security/xssAuditor/img-onerror-GBK-char-expected.txt (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/img-onerror-GBK-char-expected.txt	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/img-onerror-GBK-char-expected.txt	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,3 @@
+CONSOLE MESSAGE: Refused to execute a _javascript_ script. Source code of script found within request.
+
+

Added: trunk/LayoutTests/http/tests/security/xssAuditor/img-onerror-GBK-char.html (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/img-onerror-GBK-char.html	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/img-onerror-GBK-char.html	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,15 @@
+<!DOCTYPE html>
+<html>
+<head>
+<script>
+if (window.layoutTestController) {
+  layoutTestController.dumpAsText();
+  layoutTestController.setXSSAuditorEnabled(true);
+}
+</script>
+</head>
+<body>
+<iframe src=""
+</iframe>
+</body>
+</html>

Modified: trunk/LayoutTests/http/tests/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl (105690 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl	2012-01-24 05:04:53 UTC (rev 105690)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl	2012-01-24 05:08:14 UTC (rev 105691)
@@ -67,7 +67,8 @@
     return $result;
 }
 
-print "Content-Type: text/html; charset=UTF-8\n\n";
+my $charsetToUse = $cgi->param('charset') ? $cgi->param('charset') : "UTF-8";
+print "Content-Type: text/html; charset=$charsetToUse\n\n";
 
 print "<!DOCTYPE html>\n";
 print "<html>\n";

Added: trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-expected.txt (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-expected.txt	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-expected.txt	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,3 @@
+CONSOLE MESSAGE: Refused to execute a _javascript_ script. Source code of script found within request.
+
+

Added: trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode-expected.txt (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode-expected.txt	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode-expected.txt	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,3 @@
+CONSOLE MESSAGE: Refused to execute a _javascript_ script. Source code of script found within request.
+
+

Added: trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,15 @@
+<!DOCTYPE html>
+<html>
+<head>
+<script>
+if (window.layoutTestController) {
+  layoutTestController.dumpAsText();
+  layoutTestController.setXSSAuditorEnabled(true);
+}
+</script>
+</head>
+<body>
+<iframe src=""
+</iframe>
+</body>
+</html>

Added: trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-expected.txt (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-expected.txt	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-expected.txt	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,3 @@
+CONSOLE MESSAGE: Refused to execute a _javascript_ script. Source code of script found within request.
+
+

Added: trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,15 @@
+<!DOCTYPE html>
+<html>
+<head>
+<script>
+if (window.layoutTestController) {
+  layoutTestController.dumpAsText();
+  layoutTestController.setXSSAuditorEnabled(true);
+}
+</script>
+</head>
+<body>
+<iframe src=""
+</iframe>
+</body>
+</html>

Added: trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char.html (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char.html	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char.html	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,15 @@
+<!DOCTYPE html>
+<html>
+<head>
+<script>
+if (window.layoutTestController) {
+  layoutTestController.dumpAsText();
+  layoutTestController.setXSSAuditorEnabled(true);
+}
+</script>
+</head>
+<body>
+<iframe src=""
+</iframe>
+</body>
+</html>

Added: trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char2-expected.txt (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char2-expected.txt	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char2-expected.txt	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,3 @@
+CONSOLE MESSAGE: Refused to execute a _javascript_ script. Source code of script found within request.
+
+

Added: trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char2.html (0 => 105691)


--- trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char2.html	                        (rev 0)
+++ trunk/LayoutTests/http/tests/security/xssAuditor/script-tag-Big5-char2.html	2012-01-24 05:08:14 UTC (rev 105691)
@@ -0,0 +1,15 @@
+<!DOCTYPE html>
+<html>
+<head>
+<script>
+if (window.layoutTestController) {
+  layoutTestController.dumpAsText();
+  layoutTestController.setXSSAuditorEnabled(true);
+}
+</script>
+</head>
+<body>
+<iframe src=""
+</iframe>
+</body>
+</html>

Modified: trunk/LayoutTests/platform/chromium/test_expectations.txt (105690 => 105691)


--- trunk/LayoutTests/platform/chromium/test_expectations.txt	2012-01-24 05:04:53 UTC (rev 105690)
+++ trunk/LayoutTests/platform/chromium/test_expectations.txt	2012-01-24 05:08:14 UTC (rev 105691)
@@ -1930,6 +1930,9 @@
 // Note: this test was also marked as flaky on WIN RELEASE above, BUGCR31342.
 BUGCR39423 : security/block-test.html = TIMEOUT
 
+// Due to the differences in handling text encodings in KURL and googleurl.
+BUGWK20559 : http/tests/navigation/anchor-frames-gbk.html = TEXT
+
 BUGWK36666 : storage/open-database-over-quota.html = TEXT
 
 BUGWK37283 : fast/overflow/scrollbar-restored-and-then-locked.html = TEXT

Modified: trunk/Source/WebCore/ChangeLog (105690 => 105691)


--- trunk/Source/WebCore/ChangeLog	2012-01-24 05:04:53 UTC (rev 105690)
+++ trunk/Source/WebCore/ChangeLog	2012-01-24 05:08:14 UTC (rev 105691)
@@ -1,3 +1,30 @@
+2012-01-23  Tom Sepez  <[email protected]>
+
+        decodeEscapeSequences() not correct for some encodings (GBK, Big5, ...).
+        https://bugs.webkit.org/show_bug.cgi?id=71316
+
+        Reviewed by Daniel Bates.
+
+        Pass trailing unescaped bytes into the character set decoder to get correct
+        results in the presence of encodings which re-use ASCII values in sequences.
+        
+        Tests: http/tests/navigation/anchor-frames-gbk.html
+               http/tests/security/xssAuditor/iframe-onload-GBK-char.html
+               http/tests/security/xssAuditor/img-onerror-GBK-char.html
+               http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html
+               http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html
+               http/tests/security/xssAuditor/script-tag-Big5-char.html
+               http/tests/security/xssAuditor/script-tag-Big5-char2.html
+
+        * platform/text/DecodeEscapeSequences.h:
+        (WebCore::Unicode16BitEscapeSequence::findInString):
+        (WebCore::Unicode16BitEscapeSequence::findEndOfRun):
+        (WebCore::Unicode16BitEscapeSequence::decodeRun):
+        (WebCore::URLEscapeSequence::findInString):
+        (WebCore::URLEscapeSequence::findEndOfRun):
+        (WebCore::URLEscapeSequence::decodeRun):
+        (WebCore::decodeEscapeSequences):
+
 2012-01-23  Adam Barth  <[email protected]>
 
         Fix a build break in a clean compile of the Chromium port (at least

Modified: trunk/Source/WebCore/platform/text/DecodeEscapeSequences.h (105690 => 105691)


--- trunk/Source/WebCore/platform/text/DecodeEscapeSequences.h	2012-01-24 05:04:53 UTC (rev 105690)
+++ trunk/Source/WebCore/platform/text/DecodeEscapeSequences.h	2012-01-24 05:08:14 UTC (rev 105691)
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2011 Daniel Bates ([email protected]). All Rights Reserved.
+ * Copyright (c) 2012 Google, inc.  All Rights Reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -9,6 +10,9 @@
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of Google Inc. nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
@@ -36,52 +40,81 @@
 
 // See <http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations>.
 struct Unicode16BitEscapeSequence {
-    enum { size = 6 }; // e.g. %u26C4
-    static size_t findInString(const String& string, unsigned start = 0) { return string.find("%u", start); }
-    static bool matchStringPrefix(const String& string, unsigned start = 0)
+    enum { sequenceSize = 6 }; // e.g. %u26C4
+    static size_t findInString(const String& string, size_t startPosition) { return string.find("%u", startPosition); }
+    static size_t findEndOfRun(const String& string, size_t startPosition, size_t endPosition)
     {
-        if (string.length() - start < size)
-            return false;
-        return string[start] == '%' && string[start + 1] == 'u'
-            && isASCIIHexDigit(string[start + 2]) && isASCIIHexDigit(string[start + 3])
-            && isASCIIHexDigit(string[start + 4]) && isASCIIHexDigit(string[start + 5]);
+        size_t runEnd = startPosition;
+        while (endPosition - runEnd >= sequenceSize && string[runEnd] == '%' && string[runEnd + 1] == 'u'
+               && isASCIIHexDigit(string[runEnd + 2]) && isASCIIHexDigit(string[runEnd + 3])
+               && isASCIIHexDigit(string[runEnd + 4]) && isASCIIHexDigit(string[runEnd + 5])) {
+            runEnd += sequenceSize;
+        }
+        return runEnd;
     }
     static String decodeRun(const UChar* run, size_t runLength, const TextEncoding&)
     {
         // Each %u-escape sequence represents a UTF-16 code unit.
         // See <http://www.w3.org/International/iri-edit/draft-duerst-iri.html#anchor29>.
-        size_t numberOfSequences = runLength / size;
+        // For 16-bit escape sequences, we know that findEndOfRun() has given us a contiguous run of sequences
+        // without any intervening characters, so decode the run without additional checks.
+        size_t numberOfSequences = runLength / sequenceSize;
         StringBuilder builder;
         builder.reserveCapacity(numberOfSequences);
         while (numberOfSequences--) {
             UChar codeUnit = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]);
             builder.append(codeUnit);
-            run += size;
+            run += sequenceSize;
         }
         return builder.toString();
     }
 };
 
 struct URLEscapeSequence {
-    enum { size = 3 }; // e.g. %41
-    static size_t findInString(const String& string, unsigned start = 0) { return string.find('%', start); }
-    static bool matchStringPrefix(const String& string, unsigned start = 0)
+    enum { sequenceSize = 3 }; // e.g. %41
+    static size_t findInString(const String& string, size_t startPosition) { return string.find('%', startPosition); }
+    static size_t findEndOfRun(const String& string, size_t startPosition, size_t endPosition)
     {
-        if (string.length() - start < size)
-            return false;
-        return string[start] == '%' && isASCIIHexDigit(string[start + 1]) && isASCIIHexDigit(string[start + 2]);
+        // Make the simplifying assumption that supported encodings may have up to two unescaped characters
+        // in the range 0x40 - 0x7F as the trailing bytes of their sequences which need to be passed into the
+        // decoder as part of the run. In other words, we end the run at the first value outside of the
+        // 0x40 - 0x7F range, after two values in this range, or at a %-sign that does not introduce a valid
+        // escape sequence.
+        size_t runEnd = startPosition;
+        int numberOfTrailingCharacters = 0;
+        while (runEnd < endPosition) {
+            if (string[runEnd] == '%') {
+                if (endPosition - runEnd >= sequenceSize && isASCIIHexDigit(string[runEnd + 1]) && isASCIIHexDigit(string[runEnd + 2])) {
+                    runEnd += sequenceSize;
+                    numberOfTrailingCharacters = 0;
+                } else
+                    break;
+            } else if (string[runEnd] >= 0x40 && string[runEnd] <= 0x7F && numberOfTrailingCharacters < 2) {
+                runEnd += 1;
+                numberOfTrailingCharacters += 1;
+            } else
+                break;
+        }
+        return runEnd;
     }
     static String decodeRun(const UChar* run, size_t runLength, const TextEncoding& encoding)
     {
-        size_t numberOfSequences = runLength / size;
+        // For URL escape sequences, we know that findEndOfRun() has given us a run where every %-sign introduces
+        // a valid escape sequence, but there may be characters between the sequences.
         Vector<char, 512> buffer;
-        buffer.resize(numberOfSequences);
+        buffer.resize(runLength); // Unescaping hex sequences only makes the length smaller.
         char* p = buffer.data();
-        while (numberOfSequences--) {
-            *p++ = (toASCIIHexValue(run[1]) << 4) | toASCIIHexValue(run[2]);
-            run += size;
+        const UChar* runEnd = run + runLength;
+        while (run < runEnd) {
+            if (run[0] == '%') {
+                *p++ = (toASCIIHexValue(run[1]) << 4) | toASCIIHexValue(run[2]);
+                run += sequenceSize;
+            } else {
+                *p++ = run[0];
+                run += 1;
+            }
         }
-        ASSERT(buffer.size() == static_cast<size_t>(p - buffer.data()));
+        ASSERT(buffer.size() >= static_cast<size_t>(p - buffer.data())); // Prove buffer not overrun.
         return (encoding.isValid() ? encoding : UTF8Encoding()).decode(buffer.data(), p - buffer.data());
     }
 };
@@ -95,9 +128,7 @@
     size_t searchPosition = 0;
     size_t encodedRunPosition;
     while ((encodedRunPosition = EscapeSequence::findInString(string, searchPosition)) != notFound) {
-        unsigned encodedRunEnd = encodedRunPosition;
-        while (length - encodedRunEnd >= EscapeSequence::size && EscapeSequence::matchStringPrefix(string, encodedRunEnd))
-            encodedRunEnd += EscapeSequence::size;
+        size_t encodedRunEnd = EscapeSequence::findEndOfRun(string, encodedRunPosition, length);
         searchPosition = encodedRunEnd;
         if (encodedRunEnd == encodedRunPosition) {
             ++searchPosition;
_______________________________________________
webkit-changes mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-changes

Reply via email to