On 2011-11-11 16:53, Jerry Hill wrote:
There's nothing wrong with writing your own code to find the longest common
substring, but are you aware that python has a module in the standard
library that already does this? In the difflib module, the SequenceMatcher
class can compare two sequences and extract the longest common sequence of
elements from it, like this:
Thanks for the tip. I've played around with it, but I think it doesn't
help in the OP's situation. "SequenceMatcher.find_longest_match()" just
finds the first common block:
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import difflib
>>> first = [0, 1, 2, 3, 0, 4, 5, 6, 0]
>>> second = [1, 2, 3, 4, 5, 6]
>>> match = difflib.SequenceMatcher(None, first, second)
>>> match.find_longest_match(0, len(first), 0, len(second))
Match(a=1, b=0, size=3)
Here it returns just [1, 2, 3] but misses [4, 5, 6]. So you would have
to adjust the lower limits to get it.
"SequenceMatcher.get_matching_blocks()" seems to be a better choice:
>>> match.get_matching_blocks()
[Match(a=1, b=0, size=3), Match(a=5, b=3, size=3), Match(a=9, b=6, size=0)]
Now you get [1, 2, 3] and [4, 5, 6]. But if the two blocks are in the
reversed order, there is no longest common subsequence [1, 2, 3, 4, 5,
6] any more and "SequenceMatcher" only finds one part (apparently it
chooses the first it comes across in the first list if both have the
same length):
>>> first = [0, 1, 2, 3, 0, 4, 5, 6, 0]
>>> second = [4, 5, 6, 1, 2, 3]
>>> match = difflib.SequenceMatcher(None, first, second)
>>> match.find_longest_match(0, len(first), 0, len(second))
Match(a=1, b=3, size=3)
>>> match.get_matching_blocks()
[Match(a=1, b=3, size=3), Match(a=9, b=6, size=0)]
From both methods you get [1, 2, 3].
As I've learnt during this tests, there is a difference between
subsequences and substrings:
http://en.wikipedia.org/wiki/Subsequence#Substring_vs._subsequence
If I've understood the OP right, he/she wants to find all common
substrings with a minimum length regardless of their order in the strings.
Bye, Andreas
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor