Re: Re: Regular Expressions and Canonical Equivalence

Stephen E Slevinski Jr Thu, 14 May 2015 10:28:27 -0700

On 5/14/15 5:58 AM, Philippe Verdy wrote:

Yes it is problematic: (ab)* is not the same as (a|b)* as thisrequires matching pairs of letters "ab" in that order in the firstexpression, but random strings of "a" and "b" i nthe second one (sothe second matches *more* input samples.
Even if you consider canonical equivalences (where the relative orderof "ab" does not matter for example because they have distinctnon-zero canonical) this does not mean that "a" alone will match inthe first expression "(ab*)", even though it MUST match in "(a|b)*".
So the solution is just elegant to simplify the first level ofanalysis of "(ab)*" by using "(a|b)*" instead. But then you need toperform a second pass on the match to make sure it is containing onlycomplete sequences "ab" in that order (or any other order if they areall combining with a non-zero combining class) and no unpaired "a" or "b".

If you always want to find "a" and "b" in a pair without regard to theorder, how about the regex:

((ab)|(ba))*

∼Steve

Re: Re: Regular Expressions and Canonical Equivalence

Reply via email to