Reviewers: Yang,

Message:
yangguo, ptal

Description:
Script streaming: UTF-8 handling fix.

The problem was that there can be several multi-byte UTF-8 characters near the
splitting point of the data chunks, and the code didn't handle it properly.

This was also the source of crbug.com/417891 - I thought the crash can only
happen when V8 is passed invalid UTF-8 data, but it can also happen in the
abovementioned case. After the fix, we handle the valid UTF-8 case and also
guard against invalid UTF-8 data.

[email protected]
BUG=chromium:417891
LOG=N

Please review this at https://codereview.chromium.org/654503002/

SVN Base: https://v8.googlecode.com/svn/branches/bleeding_edge

Affected files (+27, -0 lines):
  M src/scanner-character-streams.cc
  M test/cctest/test-api.cc


Index: src/scanner-character-streams.cc
diff --git a/src/scanner-character-streams.cc b/src/scanner-character-streams.cc index d06f479f94bef5e4d6507d0018406955bdc80360..732b2b43f6469ee8ce9dcb9e0373678cbf62cdce 100644
--- a/src/scanner-character-streams.cc
+++ b/src/scanner-character-streams.cc
@@ -420,6 +420,12 @@ void ExternalStreamingStream::HandleUtf8SplitCharacters(
          utf8_split_char_buffer_length_ < 4) {
     --current_data_length_;
     ++utf8_split_char_buffer_length_;
+    if (c >= (3 << 6)) {
+      // 3 << 6 = 0b11000000; this is the first byte of the multi-byte
+ // character. No need to copy the previous characters into the conversion
+      // buffer (even if they're multi-byte).
+      break;
+    }
   }
   CHECK(utf8_split_char_buffer_length_ <= 4);
   for (unsigned i = 0; i < utf8_split_char_buffer_length_; ++i) {
Index: test/cctest/test-api.cc
diff --git a/test/cctest/test-api.cc b/test/cctest/test-api.cc
index 7de465d8c12986d6e1a30fba221cb0c6d3e67d72..6b182d0845f7a2b3f3aab35a410ba91f6b67ad15 100644
--- a/test/cctest/test-api.cc
+++ b/test/cctest/test-api.cc
@@ -23813,3 +23813,24 @@ TEST(StreamingScriptWithInvalidUtf8) {
   const char* chunks[] = {chunk1, chunk2, "foo();", NULL};
RunStreamingTest(chunks, v8::ScriptCompiler::StreamedSource::UTF8, false);
 }
+
+
+TEST(StreamingUtf8ScriptWithMultipleMultibyteCharactersSomeSplit) {
+  // Regression test: Stream data where there are several multi-byte UTF-8
+ // characters in a sequence and one of them is split between two data chunks.
+  const char* reference = "\xeb\x91\x80";
+  char chunk1[] =
+      "function foo() {\n"
+ " // This function will contain an UTF-8 character which is not in\n"
+      "  // ASCII.\n"
+      "  var foob\xeb\x91\x80X";
+  char chunk2[] =
+      "XXr = 13;\n"
+      "  return foob\xeb\x91\x80\xeb\x91\x80r;\n"
+      "}\n";
+  chunk1[strlen(chunk1) - 1] = reference[0];
+  chunk2[0] = reference[1];
+  chunk2[1] = reference[2];
+  const char* chunks[] = {chunk1, chunk2, "foo();", NULL};
+  RunStreamingTest(chunks, v8::ScriptCompiler::StreamedSource::UTF8);
+}


--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to