[issue11802] filecmp.cmp needs a documented way to clear cache
Raymond Hettinger raymond.hettin...@gmail.com added the comment: After more thought, will just close this report. If a new project emerges to improve the design of filecmp, it can be done in a separate tracker entry. -- resolution: later - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Roundup Robot devnull@devnull added the comment: New changeset 11568c59d9d4 by Raymond Hettinger in branch '2.7': Issue 11802: filecmp cache was growing without bound. http://hg.python.org/cpython/rev/11568c59d9d4 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Roundup Robot devnull@devnull added the comment: New changeset 2bacaf6a80c4 by Raymond Hettinger in branch '3.2': Issue 11802: filecmp cache was growing without bound. http://hg.python.org/cpython/rev/2bacaf6a80c4 New changeset 8f4466619e1c by Raymond Hettinger in branch 'default': Issue 11802: filecmp cache was growing without bound. http://hg.python.org/cpython/rev/8f4466619e1c -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Raymond Hettinger raymond.hettin...@gmail.com added the comment: Made a simple fix to keep the cache from growing without bound. Leaving this open for 3.3 as a feature request to implement a more sophisticated strategy using file hashes or somesuch. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Raymond Hettinger raymond.hettin...@gmail.com added the comment: Nadeem, I want to review this but won't have a chance to do it right away. Offhand, it seems like we could use the existing functools.lru_cache() for this if the stats were included as part of the key: cache[f1, f2, s1, s2]=outcome. Also, I want to take a fresh look at the cache strategy (saving diffs of two files vs saving file contents individually) and think about whether than makes any sense at all for real world use cases (is there a common need to compare the same file pairs over and over again or is the typical use the comparison of many different file pairs). There may even be a better way to approach the underlying problem using hashes of entire files (md5, sha1, etc). -- assignee: nadeem.vawda - rhettinger resolution: - later ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
jeff deifik j...@jeffunit.com added the comment: There are many possible solutions to this problem. Personally, I think mine is the simplest, though it changes the API. However, there have been several suggestions on simple fixes that don't change the API, all of which fix the resource leak. Doing nothing will not fix the resource leak. How about a simple fix right now, using a lru cache, fixing all versions of Python, and perhaps come up with a superior solution at a later date? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Raymond Hettinger raymond.hettin...@gmail.com added the comment: We will do something. The next release isn't for a while, so there is time to put thought into it rather than making an immediate check-in. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: Also, I want to take a fresh look at the cache strategy (saving diffs of two files vs saving file contents individually) and think about whether than makes any sense at all for real world use cases (is there a common need to compare the same file pairs over and over again or is the typical use the comparison of many different file pairs). There may even be a better way to approach the underlying problem using hashes of entire files (md5, sha1, etc). I like that idea. A hash-based approach could speed up the detection of non-equal files quite a bit. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Georg Brandl ge...@python.org added the comment: -1 on backporting. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: OK. I'll try to put together something cleaner just for 3.3, then. -- assignee: - nadeem.vawda stage: patch review - needs patch versions: -Python 2.7, Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: Georg? Benjamin? Do you think this fix should be backported? -- nosy: +benjamin.peterson, georg.brandl ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Changes by Éric Araujo mer...@netwok.org: Removed file: http://bugs.python.org/file21585/filecmp-lru-cache-2.7.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Éric Araujo mer...@netwok.org added the comment: Why use an ordered dict instead of functools.lru_cache? -- versions: +Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: Because the lru_cache decorator doesn't provide any way to invalidate stale cache entries. Perhaps I should factor out the duplicated code into a separate class that can then also be exposed to users of the stdlib. But that would only apply to 3.3, so the uglier fix is still necessary for older versions. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Raymond Hettinger raymond.hettin...@gmail.com added the comment: I question whether this should be backported. Please discuss with the RM. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: I question whether this should be backported. Please discuss with the RM. Will do. Are you referring specifically to 2.7, or to 3.1 and 3.2 as well? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: I've looked at the code for Python 3, and there isn't anything there that prevents this from happening there, either. So the fix should be applied to 3.2 and 3.3 as well. An alternative approach would be to limit the size of the cache, so that the caller doesn't need to explicitly clear the cache. Something along the lines of functools.lru_cache() should do the trick. I don't think it'll be possible to use lru_cache() itself, though - it doesn't provide a mechanism to invalidate cache entries when they become stale (and in any case, it doesn't exist in 2.7). -- nosy: +nadeem.vawda stage: - needs patch type: - resource usage versions: +Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
R. David Murray rdmur...@bitdance.com added the comment: Putting in a size limit is reasonable. We did this for fnmatch not that long ago (issue 7846). That was in fact the inspiration for lru_cache. -- nosy: +r.david.murray ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Changes by Éric Araujo mer...@netwok.org: -- nosy: +eric.araujo, rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: Patch for 3.3 and 3.2 -- keywords: +patch Added file: http://bugs.python.org/file21584/filecmp-lru-cache-3.3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: Patch for 2.7. -- Added file: http://bugs.python.org/file21585/filecmp-lru-cache-2.7.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- stage: needs patch - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
Nadeem Vawda nadeem.va...@gmail.com added the comment: Oops, there was a typo in the 2.7 patch (import _thread instead of import thread). Corrected patch attached. -- Added file: http://bugs.python.org/file21586/filecmp-lru-cache-2.7.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11802] filecmp.cmp needs a documented way to clear cache
New submission from jeff deifik j...@jeffunit.com: I have a program which calls filecmp.cmp a lot. It runs out of memory. I read the source to filecmp, and then I periodically set filecmp._cache = {} Without doing this, filecmp's cache uses up all the memory in the computer. There needs to be a documented interface to clear the cache. I suggest a function def clear_cache: _cache = {} Without a documented interface, there is no standard way to clear the cache. It is possible different versions of python will require different methods to clear the cache, which will reduce python code portability and is a bad idea. Alternatively, one might disable the caching code. One shouldn't have to look at the source code of a library function to see why it is consuming memory. -- messages: 133290 nosy: lopgok priority: normal severity: normal status: open title: filecmp.cmp needs a documented way to clear cache versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com