On 22 April 2013 17:38, Blind Anagram <blindanag...@nowhere.org> wrote: > On 22/04/2013 17:06, Oscar Benjamin wrote: > >> I don't know what your application is but I would say that my first >> port of call here would be to consider a different algorithmic >> approach. An obvious question would be about the sparsity of this data >> structure. How frequent are the values that you are trying to count? >> Would it make more sense to store a list of their indices? > > Actually it is no more than a simple prime sieve implemented as a Python > class (and, yes, I realize that there are plenty of these around).
If I understand correctly, you have a list of roughly a billion True/False values indicating which integers are prime and which are not. You would like to discover how many prime numbers there are between two numbers a and b. You currently do this by counting the number of True values in your list between the indices a and b. If my description is correct then I would definitely consider using a different algorithmic approach. The density of primes from 1 to 1 billlion is about 5%. Storing the prime numbers themselves in a sorted list would save memory and allow a potentially more efficient way of counting the number of primes within some interval. To see how it saves memory (on a 64 bit system): $ python Python 2.7.3 (default, Sep 26 2012, 21:51:14) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> a = ([True] + [False]*19) * 50000 >>> len(a) 1000000 >>> sys.getsizeof(a) 8000072 >>> a = list(range(50000)) >>> sys.getsizeof(a) 450120 >>> sum(sys.getsizeof(x) for x in a) 1200000 So you're using about 1/5th of the memory with a list of primes compared to a list of True/False values. Further savings would be possible if you used an array to store the primes as 64 bit integers. In this case it would take about 400MB to store all the primes up to 1 billion. The more efficient way of counting the primes would then be to use the bisect module. This gives you a way of counting the primes between a and b with a cost that is logarithmic in the total number of primes stored rather than linear in the size of the range (e.g. b-a). For large enough primes/ranges this is certain to be faster. Whether it actually works that way for your numbers I can't say. Oscar -- http://mail.python.org/mailman/listinfo/python-list