Hello,

The regex-tdfa package has had a series of bug fix releases (0.97.1 and 2 and 3 and now 4). This 0.97.4 releases finishes fixing the bug that was only mostly fixed in the 0.97.1 release.

An example of the fixed bug: Apply the regex pattern (BB(B?))+(B?) to the text BBBB. The "BB" in the pattern should be used twice and both "B?" should match nothing. My code grouped the "+" wrong and matched the "BB" once and then both the "B?" matched a "B".

The case fixed here was not initially caught because of how I search for unknown bugs. I use "Arbitrary" from QuickCheck to generate random patterns and strings to search, and compare regex-tdfa to another POSIX engine.

Because I am on OS X, I am limited by the the native POSIX libraries bugs: this bug in regex-tdfa was triggered only when the native POSIX was also buggy.

But the source of most of my unit tests is AT&T research [1], and they have a "libast" with a POSIX implementation. I have adapted my regex-* wrapper packages to make a "regex-ast" Haskell interface, but the difficulties with the AT&T headers prevent me from releasing this on hackage. This "regex-ast" has given me access to a less buggy POSIX back-end, and randomized testing has led to catching the bug fixed here (as well as a few bug reports back to AT&T).

So while regex-tdfa will not win many speed contests, it is the only POSIX regular expression library I have running that passes all the unit tests.

[1] http://www.research.att.com/sw/download/
    http://www.research.att.com/~gsf/testregex/
    http://www.research.att.com/~gsf/testregex/re-interpretation.html

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to