[issue11802] filecmp.cmp needs a documented way to clear cache

2011-06-26 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

After more thought, will just close this report.  If a new project emerges to 
improve the design of filecmp, it can be done in a separate tracker entry.

--
resolution: later - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-06-25 Thread Roundup Robot

Roundup Robot devnull@devnull added the comment:

New changeset 11568c59d9d4 by Raymond Hettinger in branch '2.7':
Issue 11802:  filecmp cache was growing without bound.
http://hg.python.org/cpython/rev/11568c59d9d4

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-06-25 Thread Roundup Robot

Roundup Robot devnull@devnull added the comment:

New changeset 2bacaf6a80c4 by Raymond Hettinger in branch '3.2':
Issue 11802:  filecmp cache was growing without bound.
http://hg.python.org/cpython/rev/2bacaf6a80c4

New changeset 8f4466619e1c by Raymond Hettinger in branch 'default':
Issue 11802:  filecmp cache was growing without bound.
http://hg.python.org/cpython/rev/8f4466619e1c

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-06-25 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

Made a simple fix to keep the cache from growing without bound.
Leaving this open for 3.3 as a feature request to implement a more 
sophisticated strategy using file hashes or somesuch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-06-06 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

Nadeem, I want to review this but won't have a chance to do it right away.  
Offhand, it seems like we could use the existing functools.lru_cache() for this 
if the stats were included as part of the key:  cache[f1, f2, s1, s2]=outcome.

Also, I want to take a fresh look at the cache strategy (saving diffs of two 
files vs saving file contents individually) and think about whether than makes 
any sense at all for real world use cases (is there a common need to compare 
the same file pairs over and over again or is the typical use the comparison of 
many different file pairs).   There may even be a better way to approach the 
underlying problem using hashes of entire files (md5, sha1, etc).

--
assignee: nadeem.vawda - rhettinger
resolution:  - later

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-06-06 Thread jeff deifik

jeff deifik j...@jeffunit.com added the comment:

There are many possible solutions to this problem.
Personally, I think mine is the simplest, though it changes the API.

However, there have been several suggestions on simple fixes that don't change 
the API, all of which fix the resource leak.

Doing nothing will not fix the resource leak.

How about a simple fix right now, using a lru cache, fixing all versions of 
Python, and perhaps come up with a superior solution at a later date?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-06-06 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

We will do something.  The next release isn't for a while, so there is time to 
put thought into it rather than making an immediate check-in.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-06-06 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 Also, I want to take a fresh look at the cache strategy (saving diffs
 of two files vs saving file contents individually) and think about
 whether than makes any sense at all for real world use cases
 (is there a common need to compare the same file pairs over and over
 again or is the typical use the comparison of many different file
 pairs).   There may even be a better way to approach the underlying
 problem using hashes of entire files (md5, sha1, etc).

I like that idea. A hash-based approach could speed up the detection of
non-equal files quite a bit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-05-15 Thread Georg Brandl

Georg Brandl ge...@python.org added the comment:

-1 on backporting.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-05-15 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

OK. I'll try to put together something cleaner just for 3.3, then.

--
assignee:  - nadeem.vawda
stage: patch review - needs patch
versions:  -Python 2.7, Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-10 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Georg? Benjamin? Do you think this fix should be backported?

--
nosy: +benjamin.peterson, georg.brandl

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-09 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


Removed file: http://bugs.python.org/file21585/filecmp-lru-cache-2.7.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-09 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Why use an ordered dict instead of functools.lru_cache?

--
versions: +Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-09 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Because the lru_cache decorator doesn't provide any way to invalidate
stale cache entries.

Perhaps I should factor out the duplicated code into a separate class
that can then also be exposed to users of the stdlib. But that would only
apply to 3.3, so the uglier fix is still necessary for older versions.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-09 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

I question whether this should be backported.  Please discuss with the RM.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-09 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 I question whether this should be backported.  Please discuss with the RM.

Will do. Are you referring specifically to 2.7, or to 3.1 and 3.2 as well?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-08 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

I've looked at the code for Python 3, and there isn't anything there that
prevents this from happening there, either. So the fix should be applied
to 3.2 and 3.3 as well.

An alternative approach would be to limit the size of the cache, so that
the caller doesn't need to explicitly clear the cache. Something along
the lines of functools.lru_cache() should do the trick. I don't think
it'll be possible to use lru_cache() itself, though - it doesn't provide
a mechanism to invalidate cache entries when they become stale (and in
any case, it doesn't exist in 2.7).

--
nosy: +nadeem.vawda
stage:  - needs patch
type:  - resource usage
versions: +Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-08 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Putting in a size limit is reasonable.  We did this for fnmatch not that long 
ago (issue 7846).  That was in fact the inspiration for lru_cache.

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-08 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


--
nosy: +eric.araujo, rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-08 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Patch for 3.3 and 3.2

--
keywords: +patch
Added file: http://bugs.python.org/file21584/filecmp-lru-cache-3.3.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-08 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Patch for 2.7.

--
Added file: http://bugs.python.org/file21585/filecmp-lru-cache-2.7.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-08 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-08 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Oops, there was a typo in the 2.7 patch (import _thread instead of
import thread). Corrected patch attached.

--
Added file: http://bugs.python.org/file21586/filecmp-lru-cache-2.7.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11802] filecmp.cmp needs a documented way to clear cache

2011-04-07 Thread jeff deifik

New submission from jeff deifik j...@jeffunit.com:

I have a program which calls filecmp.cmp a lot.
It runs out of memory.
I read the source to filecmp, and then I periodically set
filecmp._cache = {}

Without doing this, filecmp's cache uses up all the memory in the computer.

There needs to be a documented interface to clear the cache.

I suggest a function
def clear_cache:
_cache = {}

Without a documented interface, there is no standard way to clear the
cache. It is possible different versions of python will require
different methods to clear the cache, which will reduce python code
portability and is a bad idea.

Alternatively, one might disable the caching code.

One shouldn't have to look at the source code of a library function
to see why it is consuming memory.

--
messages: 133290
nosy: lopgok
priority: normal
severity: normal
status: open
title: filecmp.cmp needs a documented way to clear cache
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com