Re: Python's re module and genealogy problem

Robert Kern Wed, 11 Jun 2014 06:29:25 -0700

On 2014-06-11 13:23, BrJohan wrote:

For some genealogical purposes I consider using Python's re module.


Rather many names can be spelled in a number of similar ways, and in order to
match names even if they are spelled differently, I will build regular
expressions, each of which is supposed to match  a number of similar names.

I guess that there will be a few hundred such regular expressions covering most
popular names.

Now, my problem: Is there a way to decide whether any two - or more - of those
regular expressions will match the same string?

Or, stated a little differently:

Can it, for a pair of regular expressions be decided whether at least one string
matching both of those regular expressions, can be constructed?

If it is possible to make such a decision, then how? Anyone aware of an
algorithm for this?


And if that isn't the best straight line for the old saying, I don't know what 
is.

  http://en.wikiquote.org/wiki/Jamie_Zawinski

Anyways, to your new problem, yes it's possible. Search for "regular expressionintersection" for possible approaches. You will probably have to translate theregular expression to a different formalism or at least a different library toimplement this.

Consider just listing out the different possibilities. All of your regexesshould be "well-behaved" given the constraints of the domain (tightly bounded,at least). There are tools that help generate matching strings from a Pythonregex. This will help you QA your regexes, too, to be sure that they match whatyou expect them to and not match non-names.


  https://github.com/asciimoo/exrex

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
https://mail.python.org/mailman/listinfo/python-list

Re: Python's re module and genealogy problem

Reply via email to