Terry J. Reedy added the comment: DNA matching can be done with difflib. Serious high-volume work should use compiled specialized matchers and aligners.
This particular benchmark, explained a bit at https://benchmarksgame.alioth.debian.org/u64q/regexdna-description.html#regexdna, manipulates and searches standard FASTA format representations of sequences with the regex available in each language. (The site has another Python implementation at https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexdna&lang=python3&id=1. It uses unicode strings rather than bytes, and multiprocessing.Pool to run re.findall in parallel.) FASTA uses lowercase a,c,g,t for known bases and at least 11 uppercase letters for subsets of bases representing partially known bases. The third task is to expand upper case letters to subsets of lowercase letters. Since the rules requires use of re and one substitution at a time, the 2 Python programs run re.sub over the current sequence 11 times. More idiomatic for Python, and probably faster, would be to use seq.replace(old,new) instead. Perhaps even more idiomatic and probably faster still, would be to use str.translate, as in this reduced example. >>> table = {ord('B') : '(c|g|t)', ord('D') : '(a|g|t)'} >>> 'aBcDg'.translate(table) 'a(c|g|t)c(a|g|t)g' ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26436> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com