Greetings -- I am doing a sampling without replacement, taking out random elements from an array. The picked element is then removed from the array. When my arrays were on the order of 10,000 elements long, everything was fast. But when I increased them to 1,000,000 it suddenly was hours. I tracked it down to the line I first threw in to cut out the element i:
a = a[:i] + a[i+1:] It was indeed the weakest link. But when I replace it with e.g. a.pop(i) it is still slow. I wonder what approach can be taken to speed it up? Basically, the algorithm picks an element from the array at random and checks its size in a different hashtable i=>size. If the size fits into an existing accumulator, i.e. is below a threshold, we take it and delete it from the array as above. Thus just random.sample is not enough as we may not use an element we pick until we find the right one, element by element, running a cumulative statistics. Using an array is natural here as it represents "without replacement" -- we take an element by removing it from the array. But in Python it's very slow... What approaches are there to implement a shrinking array with random deletions with the magnitude of millions of elements? Cheers, Alexy -- http://mail.python.org/mailman/listinfo/python-list