I feel slightly uncomfortable making this change while the ongoing
discussion is
still in flight, but I don't fully understand all the issues surrounding
Unicode
anyway, so I trust you that moving from UC16 to UTF16 is the right thing to
do
at this point. :)
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/debug-agent.cc
File src/debug-agent.cc (right):
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/debug-agent.cc#newcode405
src/debug-agent.cc:405: int kEncodedSurrogateLength = 3;
Maybe move this declaration up and use it instead of literal 3 in the
above condition. Also, make it const for clarity.
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/objects.cc
File src/objects.cc (right):
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/objects.cc#newcode6389
src/objects.cc:6389: // flatten the string and retry. Failures are
caused by surrogate pairs in deep
Not sure I understand the failure mode. Why does the function need to
fail on these, and not on other recursions?
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/objects.cc#newcode6446
src/objects.cc:6446: } else if (max_recursion > 0) {
Why is this the only recursive path actually checking max_recursion?
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/objects.cc#newcode6486
src/objects.cc:6486: total -= 2;
I wouldn't mind a comment here, why -2? Is that
kSizeOfUnmatchedSurrogate-1?
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/objects.h
File src/objects.h (right):
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/objects.h#newcode6772
src/objects.h:6772: inline int Utf8Length() {
Even with the inline definition here, I think a comment is warranted
that a negative result signals failure, and how to react to that (maybe
combine with the comment below).
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/objects.h#newcode6780
src/objects.h:6780: static int Utf8Length(String* input,
Please specify the result of these functions in the case where partial
surrogates appear at either end of the string.
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/objects.h#newcode6784
src/objects.h:6784: int max_recursion,
Is this a useful interface? How is a caller supposed to pick a suitable
recursion limit? It seems to me that you could as well make it an
internal constant.
In fact, why not abstract away the flattening and retry internally,
instead of making it the responsibility of every call site? Is there a
reason a caller would want to know about this happening, or do something
special? (I didn't see anything like that in the CL.)
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/scanner-character-streams.cc
File src/scanner-character-streams.cc (right):
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/scanner-character-streams.cc#newcode191
src/scanner-character-streams.cc:191: while (i < length - 1) {
When does this ever consume the last character in the stream if the
previous one does not happen to be a lead surrogate?
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode-inl.h
File src/unicode-inl.h (right):
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode-inl.h#newcode92
src/unicode-inl.h:92: previous != kNoPreviousCharacter &&
Isn't that implied by Utf16::IsLeadSurrogate(previous)?
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode-inl.h#newcode96
src/unicode-inl.h:96: Utf8::kNoPreviousCharacter) - 3;
kSizeOfUnmatchedSurrogate instead of 3 perhaps (here and above).
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode-inl.h#newcode130
src/unicode-inl.h:130: previous != kNoPreviousCharacter &&
See above.
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode-inl.h#newcode132
src/unicode-inl.h:132: return 1;
This is 4 - 3 already counted, I suppose.
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode.h
File src/unicode.h (right):
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode.h#newcode118
src/unicode.h:118: class Utf16 {
Nit: this doesn't quite fit into the above Utf8 section comment.
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode.h#newcode126
src/unicode.h:126: static inline int CombineSurrogatePair(uchar lead,
uchar trail) {
Isn't int32_t more accurate as result type?
https://chromiumcodereview.appspot.com/9600009/diff/14003/src/unicode.h#newcode130
src/unicode.h:130: static inline uchar LeadSurrogate(int char_code) {
Similar here (and below), isn't char_code an int32?
https://chromiumcodereview.appspot.com/9600009/
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev