[EMAIL PROTECTED] wrote: > Hello > > I am looking for python code that takes as input a list of strings > (most similar, > but not necessarily, and rather short: say not longer than 50 chars) > and that computes and outputs the python regular expression that > matches > these string values (not necessarily strictly, perhaps the code is able > to determine > patterns, i.e. families of strings...). > > Thanks for any idea >
I'm not sure your application, but Genomicists and Proteomicists have found that Hidden Markov Models can be very powerful for developing pattern models. Perhaps have a look at "Biological Sequence Analysis" by Durbin et al. Also, a very cool regex based algorithm was developed at IBM: http://cbcsrv.watson.ibm.com/Tspd.html But I think HMMs are the way to go. Check out HMMER at WUSTL by Sean Eddy and colleagues: http://hmmer.janelia.org/ http://selab.janelia.org/people/eddys/ James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list