Having looked at the non-ascii case independent pairs I found we needed
another macro assembler instruction to cover a good chunk of the
remaining equivalence classes.  This version includes the new
instruction.


http://codereview.chromium.org/11352/diff/1/9
File src/jsregexp.cc (right):

http://codereview.chromium.org/11352/diff/1/9#newcode1132
Line 1132: macro_assembler->CheckCharacter(chars[0], &ok);
On 2008/11/24 09:49:21, plesner wrote:
> We could consider using the OrThenCheckNotCharacter trick for pairs of
> characters in this and the previous case.

There are 837 set of canonical characters in the current unicode spec.
With the 'or' trick we cover over 500, leaving 319.  If we add a
subtract-then-or trick we cover another 90, leaving 229.  Of these only
15 are triplets and one is a quad.  I don't think it's worth spending
much time on such rare cases.

http://codereview.chromium.org/11352/diff/1/9#newcode1198
Line 1198: if (!cc->is_negated()) {
On 2008/11/24 09:49:21, plesner wrote:
> Is there a reason to not use char_is_in_class here?

Yes, the sense is reversed.

http://codereview.chromium.org/11352

--~--~---------~--~----~------------~-------~--~----~
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to