[EMAIL PROTECTED] wrote:
> In an application dealing with very large text files, I need to create
> dictionaries indexed by tuples of words (bi-, tri-, n-grams) or nested
> dictionaries. The number of different data structures in memory grows
> into orders beyond 1E7.
>
> It turns out that the default behaviour of Python is not very suitable
> for such a situation, as garbage collection occasionally traverses all
> objects in memory (including all tuples) in order to find out which
> could be collected. This leads to the sitation that creating O(N)
> objects effectively takes O(N*N) time.
Do your data structures need garbage collection? CPython is
a reference counted system with garbage collection as a backup
to catch cycles. Try using weak back-pointers in your data
structures to avoid creating cycles.
John Nagle
--
http://mail.python.org/mailman/listinfo/python-list