sum(map(len, l)) => 99998200 for 1st case and 99999100 for 2nd case. Roughly 100MB as I mentioned.
On Wed, Mar 16, 2011 at 11:21 PM, John Gordon <gor...@panix.com> wrote: > In <mailman.988.1300289897.1189.python-l...@python.org> Amit Dev > <amit...@gmail.com> writes: > >> I'm observing a strange memory usage pattern with strings. Consider >> the following session. Idea is to create a list which holds some >> strings so that cumulative characters in the list is 100MB. > >> >>> l = [] >> >>> for i in xrange(100000): >> ... l.append(str(i) * (1000/len(str(i)))) > >> This uses around 100MB of memory as expected and 'del l' will clear that. > >> >>> for i in xrange(20000): >> ... l.append(str(i) * (5000/len(str(i)))) > >> This is using 165MB of memory. I really don't understand where the >> additional memory usage is coming from. > >> If I reduce the string size, it remains high till it reaches around >> 1000. In that case it is back to 100MB usage. > > I don't know anything about the internals of python storage -- overhead, > possible merging of like strings, etc. but some simple character counting > shows that these two loops do not produce the same number of characters. > > The first loop produces: > > Ten single-digit values of i which are repeated 1000 times for a total of > 10000 characters; > > Ninety two-digit values of i which are repeated 500 times for a total of > 45000 characters; > > Nine hundred three-digit values of i which are repeated 333 times for a > total of 299700 characters; > > Nine thousand four-digit values of i which are repeated 250 times for a > total of 2250000 characters; > > Ninety thousand five-digit values of i which are repeated 200 times for > a total of 18000000 characters. > > All that adds up to a grand total of 20604700 characters. > > Or, to condense the above long-winded text in table form: > > range num digits 1000/len(str(i)) total chars > 0-9 10 1 1000 10000 > 10-99 90 2 500 45000 > 100-999 900 3 333 299700 > 1000-9999 9000 4 250 2250000 > 10000-99999 90000 5 200 18000000 > ======== > grand total chars 20604700 > > The second loop yields this table: > > range num digits 5000/len(str(i)) total bytes > 0-9 10 1 5000 50000 > 10-99 90 2 2500 225000 > 100-999 900 3 1666 1499400 > 1000-9999 9000 4 1250 11250000 > 10000-19999 10000 5 1000 10000000 > ======== > grand total chars 23024400 > > The two loops do not produce the same numbers of characters, so I'm not > surprised they do not consume the same amount of storage. > > P.S.: Please forgive me if I've made some basic math error somewhere. > > -- > John Gordon A is for Amy, who fell down the stairs > gor...@panix.com B is for Basil, assaulted by bears > -- Edward Gorey, "The Gashlycrumb Tinies" > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list