On 2014-06-11 13:23, BrJohan wrote:
For some genealogical purposes I consider using Python's re module.

Rather many names can be spelled in a number of similar ways, and in order to
match names even if they are spelled differently, I will build regular
expressions, each of which is supposed to match  a number of similar names.

I guess that there will be a few hundred such regular expressions covering most
popular names.

Now, my problem: Is there a way to decide whether any two - or more - of those
regular expressions will match the same string?

Or, stated a little differently:

Can it, for a pair of regular expressions be decided whether at least one string
matching both of those regular expressions, can be constructed?

If it is possible to make such a decision, then how? Anyone aware of an
algorithm for this?

And if that isn't the best straight line for the old saying, I don't know what 


Anyways, to your new problem, yes it's possible. Search for "regular expression intersection" for possible approaches. You will probably have to translate the regular expression to a different formalism or at least a different library to implement this.

Consider just listing out the different possibilities. All of your regexes should be "well-behaved" given the constraints of the domain (tightly bounded, at least). There are tools that help generate matching strings from a Python regex. This will help you QA your regexes, too, to be sure that they match what you expect them to and not match non-names.


Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco


Reply via email to