On Wed, 1 Sep 2010, Praet Paul wrote:

> Given the following regular expressions:
>  - al...@[a-z]+\.[a-z]+
>  - [a-...@[a-z]+\.[a-z]+
>  - .*
> 
> The string [email protected] will obviously match all three regular
> expressions. In the application I am developing, we are only
> interested in the 'most specific' match. In this case this is
> obviously the first one. Unfortunately there seems no way to do this.

You could proceed as follows: 

Write one long pattern with each of yours as a named alternative:

  (?<nameA>patternA) | (<?nameB>)patternB | ...
  
Make patternA the most specific, patternB the next most, etc. Then, 
after a match, see which of nameA, nameB, etc is set.  

> A possible way would be to keep the regular expressions sorted on
> descending specificity and then simply take the first match. Of course
> then the next question would be how to sort the array of regular
> expressions. It is not an option to give the responsibility to the
> end-user to ensure that the array is sorted. 

Ah. That is a much more difficult question. I suspect that specificity 
is in the eye of the beholder, that is, it may not be something that can 
easily be turned into an algorithm. The only thing I can think of is 
perhaps to count up the number of literal matches versus character 
classes such as [a-z] versus complete wild cards such as .* and generate 
some kind of measure from that. Suppose you count 2 for each literal, 1 
for each class, and 0 for anything else. Then your three patterns above 
come out at 16, 7, and 0. (I haven't considered repetition.) That's just 
an idea off the top of my head; it may not fly in practice.

How to do this for an arbitrary pattern? You *could* make use of 
pcretest, which has an option -b for showing the compiled bytecode. This 
might be easier to parse with something like a Perl script than 
re-interpreting the regex yourself. Or maybe not...

Philip

PS The speed of this response is pure luck. I am NOT online all the 
time!

-- 
Philip Hazel
-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to