I've found two bugs recently in regexp. I'm new to the list, so I
apologize if these are known issues.
I wanted to notify the list of the problems I found, ensure they're
actually problems, and make sure I'm going about solving them in the
correct manner.
1) RECompiler dies when compiling regular expressions with '*?('
sequence of characters in the regexp. Sometimes the next offset of a node
has not been set to zero, so when next = node + instruction[node +
offsetNext], next is very large, and you get an arrayoutOfBounds
exception. I added a check to make sure there was no array out of bounds
case, and returned -1 in that case. It appears to work, but there may be
a more correct way to fix this bug.
2) The other problem is with reluctant closures. Because reluctant
closures are not recursive, cases like the following fail: b(aaa|aaaaa)*?b
does not accept baaaaaaaaaab (10 a's), when it should. I have tried to
change around reluctant closures so they're implemented more similarly to
greedy ones(with recursive or's), but I don't have it working yet.
Ian Swett