On 04/11/12 00:04, Albert-Jan Roskam wrote:
Hello,

I want to make a get() method that uses a binary search. This requires
 the data to be sorted which is an expensive operation, so I would like
to do this no more often than needed.

Reset your intuition. Sorting an almost-sorted list with Timsort (Python's
sort algorithm) is blindingly fast, approaching O(N), but running at full
C speed rather than pure Python. You may find that the easiest, fastest,
most efficient way to add data to a list and keep it sorted is to simply
append at the end and then sort.

class SortedSequence(object):
    def __init__(self, sequence):
        self.data = sorted(sequence)
    def insert(self, item):
        self.data.append(item)
        self.data.sort()


Another approach is to sort only on demand: tag the instance as "dirty",
and then whenever you do a lookup, if the list is dirty, sort first.

class SortedSequence(object):
    def __init__(self, sequence):
        self.data = sorted(sequence)
        self.dirty = False
    def insert(self, item):
        self.data.append(item)
        self.dirty = True
    def get(self, key):
        if self.dirty:
            self.data.sort()
            self.dirty = False
        # now use bisect to find the key


A third approach: always use the bisect algorithm to insert the item in
the right place.


class SortedSequence(object):
    def __init__(self, sequence):
        self.data = sorted(sequence)
    def insert(self, item):
        # use bisect in here...
    def get(self, key):
        # and in here too...


You may find that a hybrid approach is best, say using version 1 for
data lists with less than a million items, and version 3 for bigger
ones. Don't guess, measure and find out.

Whatever you do, make sure you test with BIG lists -- for small lists,
*any* approach will likely be fast enough.


Now that you have a SortedSequence type, you can easily deal with many
of them:

mapping = {}  # map param to data

mapping['x'] = SortedSequence(lots of data)
mapping['y'] = SortedSequence(more data)

mapping['x'].insert(item)
mapping['y'].get(key)


sort of thing.


One final thought... why are you using binary search instead of a dict?
Dict lookups will generally be much faster than binary search, O(1)
compared to O(log N), and simpler too. For large N, the difference
between constant-time lookups and those proportional to log N can be
substantial.


Which of the to classes below is the best way to keep track of the
 sorted lists?


Neither.




--
Steven
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to