Revision: 24547
Author: [email protected]
Date: Mon Oct 13 09:01:54 2014 UTC
Log: Script streaming: UTF-8 handling fix.
The problem was that there can be several multi-byte UTF-8 characters near
the
splitting point of the data chunks, and the code didn't handle it properly.
This was also the source of crbug.com/417891 - I thought the crash can only
happen when V8 is passed invalid UTF-8 data, but it can also happen in the
abovementioned case. After the fix, we handle the valid UTF-8 case and also
guard against invalid UTF-8 data.
[email protected]
BUG=chromium:417891
LOG=N
Review URL: https://codereview.chromium.org/654503002
https://code.google.com/p/v8/source/detail?r=24547
Modified:
/branches/bleeding_edge/src/scanner-character-streams.cc
/branches/bleeding_edge/test/cctest/test-api.cc
=======================================
--- /branches/bleeding_edge/src/scanner-character-streams.cc Fri Sep 26
11:17:31 2014 UTC
+++ /branches/bleeding_edge/src/scanner-character-streams.cc Mon Oct 13
09:01:54 2014 UTC
@@ -420,6 +420,12 @@
utf8_split_char_buffer_length_ < 4) {
--current_data_length_;
++utf8_split_char_buffer_length_;
+ if (c >= (3 << 6)) {
+ // 3 << 6 = 0b11000000; this is the first byte of the multi-byte
+ // character. No need to copy the previous characters into the
conversion
+ // buffer (even if they're multi-byte).
+ break;
+ }
}
CHECK(utf8_split_char_buffer_length_ <= 4);
for (unsigned i = 0; i < utf8_split_char_buffer_length_; ++i) {
=======================================
--- /branches/bleeding_edge/test/cctest/test-api.cc Fri Oct 10 10:40:29
2014 UTC
+++ /branches/bleeding_edge/test/cctest/test-api.cc Mon Oct 13 09:01:54
2014 UTC
@@ -23813,3 +23813,24 @@
const char* chunks[] = {chunk1, chunk2, "foo();", NULL};
RunStreamingTest(chunks, v8::ScriptCompiler::StreamedSource::UTF8,
false);
}
+
+
+TEST(StreamingUtf8ScriptWithMultipleMultibyteCharactersSomeSplit) {
+ // Regression test: Stream data where there are several multi-byte UTF-8
+ // characters in a sequence and one of them is split between two data
chunks.
+ const char* reference = "\xeb\x91\x80";
+ char chunk1[] =
+ "function foo() {\n"
+ " // This function will contain an UTF-8 character which is not
in\n"
+ " // ASCII.\n"
+ " var foob\xeb\x91\x80X";
+ char chunk2[] =
+ "XXr = 13;\n"
+ " return foob\xeb\x91\x80\xeb\x91\x80r;\n"
+ "}\n";
+ chunk1[strlen(chunk1) - 1] = reference[0];
+ chunk2[0] = reference[1];
+ chunk2[1] = reference[2];
+ const char* chunks[] = {chunk1, chunk2, "foo();", NULL};
+ RunStreamingTest(chunks, v8::ScriptCompiler::StreamedSource::UTF8);
+}
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.