On 2014-06-11 13:23, BrJohan wrote:
For some genealogical purposes I consider using Python's re module.
Rather many names can be spelled in a number of similar ways, and in order to
match names even if they are spelled differently, I will build regular
expressions, each of which is supposed to match a number of similar names.
I guess that there will be a few hundred such regular expressions covering most
Now, my problem: Is there a way to decide whether any two - or more - of those
regular expressions will match the same string?
Or, stated a little differently:
Can it, for a pair of regular expressions be decided whether at least one string
matching both of those regular expressions, can be constructed?
If it is possible to make such a decision, then how? Anyone aware of an
algorithm for this?
And if that isn't the best straight line for the old saying, I don't know what
Anyways, to your new problem, yes it's possible. Search for "regular expression
intersection" for possible approaches. You will probably have to translate the
regular expression to a different formalism or at least a different library to
Consider just listing out the different possibilities. All of your regexes
should be "well-behaved" given the constraints of the domain (tightly bounded,
at least). There are tools that help generate matching strings from a Python
regex. This will help you QA your regexes, too, to be sure that they match what
you expect them to and not match non-names.
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco