On Thu, 20 Feb 2014, Go L. Elijah wrote: > > I believe regular expressions lack a construct (ignore the spaces): > > (?@ aaa | bbb(d*) | ccc(a*) | ddd) > > which, when applied to > > "cccaaaddd" > > results in this (PHP-alike notation): > > array("cccaaa", 2, "aaa") > > meaning the (?@ ... ) captures as an integer, and the clauses merely enumerate > options.
I am not sure what you mean by "captures as an integer", not quite how you are doing the matching (I'm not familiar with PHP). I work only at the C-level code of PCRE, where the result of a match is a list of strings. > this may be an attractive substitute (with identical output): > > ( aaa (?0)| bbb(d*) (?1)| ccc(a*) (?2)| ddd (?3)) > > however, the numbers are now optional, which requires a dynamic typing scheme > as in PHP. but it would also allow for: > > ( aaa (?"one")| bbb (?"two")| ccc (?"three")| ddd (?"four")) > > which means that certain patterns can be replaced by a canonical pattern. > > anyway, you can see where this is going. No, I'm sorry, I can't. I have a feeling that this may relate to the existing feature for duplicate subpattern numbers or names, but I am not sure. Are you familiar with those features? In any event, this suggestion looks like a major non-Perl-compatible change, which makes it unlikely to be attractive to PCRE developers. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev