Thank you for the CL. Below are some comments.

http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/break-iterator.cc
File src/extensions/experimental/break-iterator.cc (right):

http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/break-iterator.cc#newcode63
src/extensions/experimental/break-iterator.cc:63:
text_value->Utf8Length());
Chrome's ICU didn't define a macro (U_CHARSET_IS_UTF8) to get ICU to
assume that |char *| is UTF-8. Perhaps, we'll in the future.

At least for now (and perhaps in the future, too because we don't
control how other embedders will build ICU),  you should explicitly
specify the encoding for UnicodeString ctor. You didn't get caught with
this because you tested it on Mac (or Linux under UTF-8 locale). Under
non-UTF-8 locale on Linux or Windows, your test would have failed.

You have three alternatives:

1. use UnicodeString:: fromUTF8
2. use a ctor accepting the codepage name : pass "UTF-8"
3. use a ctor accepting UChar* instead of char*.

#1 is faster than #2.

http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/break-iterator.cc#newcode81
src/extensions/experimental/break-iterator.cc:81: return
v8::Int32::New(break_iterator->next());
When there's no more item, ICU's next() will returb UBRK_DONE (or sth
like that). How would you signal that to your API users? If you want to
use '-1' for that, perhaps you need to check the return value from
bi->next() and return '-1' explicitly for UBRK_DONE.

http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/break-iterator.cc#newcode110
src/extensions/experimental/break-iterator.cc:110: return
v8::Int32::New(UBRK_WORD_IDEO);
We need to give a bit more thought on what types to expose to
Javascript.

BTW, line/setnence breakers can also have rule_status (
http://icu-project.org/apiref/icu4c/ubrk_8h.html#ad03d8e27f121bcf11eaed0a288786a71
) although they may not be very useful.

http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/break-iterator.cc#newcode154
src/extensions/experimental/break-iterator.cc:154:
raw_template->Set(v8::String::New("assign"),
How about 'adopt' or 'adoptText'? 'assign' is kinda misleading
(assigning 'string' to 'break iterator' ?)

http://codereview.chromium.org/6598014/diff/1/src/extensions/experimental/break-iterator.cc#newcode163
src/extensions/experimental/break-iterator.cc:163:
v8::FunctionTemplate::New(BreakIteratorBreakType));
One  method is missing;  a method that returns the word at the current
position.  A customer of this API can certainly find that out, but
wouldn't it be nice to offer ourselves?

Yeah, we discussed this and I said that next() is sufficient without
nextWord().

Alternatives:

1. Do nothing. Customers can figure that out
2. Add currentWord() method (either in C++ or JavaScript).
3. Add nextWord() (and remove next())

What do you say?

http://codereview.chromium.org/6598014/

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to