New submission from Marien Zwart <m_zw...@123mail.org>:

SequenceMatcher caches the result of get_matching_blocks and get_opcodes. There 
are some problems with this:

What get_matching_blocks caches is a list of tuples. The first call does not 
return that list: it returns map(Match._make, self.matching_blocks) (converting 
the tuples to namedtuples). Subsequent calls just return self.matching_blocks 
directly. Especially in python 3 and up this is weird, since the first call 
returns a map object while later calls return a list.

This caching behavior is not documented, so calling code may mutate the 
returned list. One example of calling code is difflib itself: 
get_grouped_opcodes mutates the result of get_opcodes (a cached list). I am not 
sure if the right fix is to have get_grouped_opcodes copy before it mutates or 
to have get_opcodes return a copy.

Snippet demonstrating both bugs:

matcher = difflib.SequenceMatcher(a='aaaaaaaabc', b='aaaaaaaadc')
print(list(matcher.get_matching_blocks()))
# This should print the same thing, but it does not:
print(list(matcher.get_matching_blocks()))

print(matcher.get_opcodes())
print(list(matcher.get_grouped_opcodes()))
# This should print the same thing as the previous get_opcodes()
# list, but it does not:
print(matcher.get_opcodes())

----------
components: Library (Lib)
messages: 117612
nosy: marienz
priority: normal
severity: normal
status: open
title: difflib.SequenceMatcher has slightly buggy and undocumented caching 
behavior
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9985>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to