[v8-dev] Convert Unicode code points outside the basic multilingual plane to the replacement character. (issue2832050)

lrn Mon, 05 Jul 2010 05:42:52 -0700

Reviewers: Erik Corry,

Description:
Convert Unicode code points outside the basic multilingual plane to the
replacement character.
Previous behavior was to silently truncate the value to 16 bits.


Please review this at http://codereview.chromium.org/2832050/show

Affected files:
  M src/heap.cc


Index: src/heap.cc
diff --git a/src/heap.cc b/src/heap.cc

index6ae46f2a6ee743e4bac99446ca61e08288657b91..fa6344556fbcc43e633a1d9c5c67eb7cca7ae3a7100644

--- a/src/heap.cc
+++ b/src/heap.cc

@@ -2866,6 +2866,8 @@ Object* Heap::AllocateStringFromAscii(Vector<constchar> string,


 Object* Heap::AllocateStringFromUtf8(Vector<const char> string,
                                      PretenureFlag pretenure) {
+  // V8 only supports characters in the Basic Multilingual Plane.
+  const uc32 kMaxSupportedChar = 0xFFFF;
   // Count the number of characters in the UTF-8 string and check if
   // it is an ASCII string.
   Access<Scanner::Utf8Decoder> decoder(Scanner::utf8_decoder());

@@ -2890,6 +2892,7 @@ Object* Heap::AllocateStringFromUtf8(Vector<constchar> string,

   decoder->Reset(string.start(), string.length());
   for (int i = 0; i < chars; i++) {
     uc32 r = decoder->GetNext();
+    if (r > kMaxSupportedChar) { r = unibrow::Utf8::kBadChar; }
     string_result->Set(i, r);
   }
   return result;


--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Convert Unicode code points outside the basic multilingual plane to the replacement character. (issue2832050)

Reply via email to