koara wrote: > Hello, it might be too late or too hot, but i cannot work out this > behaviour of find_longest_match() in difflib.SequenceMatcher: > > string1: [snipped 500-byte string] > > string2: > [snipped 500-byte string] > > find_longest_match(0,500,0,500)=(24,43,10)="version01t" > > What? O_o Clearly there is a longer match, right at the beginning! > And then, after removal of the last character from each string (i found > the limit of 500 by trial and error -- and it looks suspiciously > rounded):
What limit? (a) My results [see below] (b) my inspection of the Python version 2.4 source for the difflib module (c) what I know of the author -- all tend to indicate that there is no hidden undocumented length limit. > > find_longest_match(0,499,0,499)=(0,0,32)="releasenotesforwildmagicversion0" > > > Is this the expected behaviour? What's going on? My code: (koara.py) 8<--- strg1 = r"""releasenotesforwildmagicversion01thiscdromcontainstheinitialreleaseofthesourcecodethataccompaniesthebook"3dgameenginedesign:apracticalapproachtorealtimecomputergraphics"thereareanumberofknownissuesaboutthecodeastheseissuesareaddressedtheupdatedcodewillbeavailableatthewebsitehttp://wwwmagicsoftwarecom/[EMAIL PROTECTED]""" strg2 = r"""releasenotesforwildmagicversion02updatefromversion01toversion02ifyourcopyofthebookhasversion01andifyoudownloadedversion02fromthewebsitethenapplythefollowingdirectionsforinstallingtheupdateforalinuxinstallationseethesectionattheendofthisdocumentupdatedirectionsassumingthatthetopleveldirectoryiscalledmagicreplacebyyourtoplevelnameyoushouldhavetheversion01contentsinthislocation1deletethecontentsofmagic\include2deletethesubdirectorymagic\source\mgcapplication3deletetheobsoletefiles:amagic\source\mgc""" import sys print sys.version from difflib import SequenceMatcher as SM smo = SM(None, strg1, strg2) print len(strg1), len(strg2) print smo.find_longest_match(0, 500, 0, 500) print smo.find_longest_match(0, 499, 0, 499) print smo.find_longest_match(0, 100, 0, 100) print smo.find_longest_match(1, 101, 1, 101) print smo.find_longest_match(2, 102, 2, 102) 8<--- The results on 4 python versions: C:\junk>c:\python24\python koara.py 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] 500 500 (24, 43, 10) (24, 43, 10) (24, 43, 10) (24, 43, 10) (24, 43, 10) C:\junk>c:\python23\python koara.py 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] 500 500 (24, 43, 10) (24, 43, 10) (24, 43, 10) (24, 43, 10) (24, 43, 10) C:\junk>c:\python22\python koara.py 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] 500 500 (0, 0, 32) (0, 0, 32) (0, 0, 32) (1, 1, 31) (2, 2, 30) C:\junk>c:\python21\python koara.py 2.1.3 (#35, Apr 8 2002, 17:47:50) [MSC 32 bit (Intel)] 500 500 (0, 0, 32) (0, 0, 32) (0, 0, 32) (1, 1, 31) (2, 2, 30) Looks to me like the problem has nothing at all to do with the length of the searched strings, but a bug appeared in 2.3. What version(s) were you using? Can you reproduce your results (500 & 499 giving different answers) with the same version? Anyway, as they say in the classics, "Take a number; the timbot will be with you shortly." Cheers, John -- http://mail.python.org/mailman/listinfo/python-list