Using sort() with a user compare function is not recommended when you care about performance. The problem is that the sort function has to call back into Python code for every compare, of which there are many. The decorate - sort - undecorate idiom is the preferred way to do this in Python < 2.4. Python 2.4 adds the key= parameter to sort() and does the DSU internally.

Ironically, the best way to speed up Python code is often to eliminate it - the more work you can do in the C runtime, the faster your code will run. Sometimes an approach that seems to do more work - like DSU, which builds two extra lists - is actually faster because it does more in C code.

Here is a program to time the three methods. On my machine, the ratio of times of sort1 : sort2 : sort3 is pretty consistently about 1 : 2 : 6-7, i.e. sort with key parameter is twice as fast as DSU which is three times faster than sort with cmp.

A couple of interesting observations:
- The ratio of sort times is independent of the length of the list. I expected that the performance of sort3 would get progressively worse as the list got longer because I expect that the cmp function would be called a number of times that is O(n*logn). Apparently this is not the case.


- If the list is large and already sorted, sort3 is significantly faster than sort2 - it takes about 2/3 the time of sort2. You can try this by taking out the call to random.shuffle().

Kent


import operator, random, timeit

a = range(10000)
random.shuffle(a)
dlist = [ {'name':str(i), 'size':i} for i in a ]


def sort1(ds): ''' Sort using Python 2.4 key argument ''' ds.sort(key=operator.itemgetter('size'))


def sort2(ds): ''' Sort using DSU ''' d2 = [ (d['size'], d) for d in ds ] d2.sort() ds[:] = [ d for (name, d) in d2 ]

def sort3(ds):
    ''' Sort using cmp argument to sort '''
    ds.sort(lambda m, n: cmp(m['size'], n['size']))


def timeOne(fn): setup = "from __main__ import " + fn.__name__ + ',dlist' stmt = fn.__name__ + '(dlist[:])'

    t = timeit.Timer(stmt, setup)
    secs = min(t.repeat(3, 10))
    print '%s: %f secs' % (fn.__name__, secs)


# Make sure they all give the same answer a1=dlist[:] sort1(a1)

a2=dlist[:]
sort2(a2)

a3=dlist[:]
sort3(a3)


assert a1 == a2 assert a3 == a2

timeOne(sort1)
timeOne(sort2)
timeOne(sort3)


Karl Pflästerer wrote:
On  9 Dez 2004, [EMAIL PROTECTED] wrote:


I have a list of dictionaries, each representing info about a file,
something like:

[{'name':'foo.txt','size':35}, {'name':'bar.txt','size':35}, ...]

I want to present a sorted list of all the files' data, sorting on the
keys 'name' or 'size'. The file 'name' s should be unique (I'm hoping)
across all the dictionaries. Can someone point me towards an efficient
solution for accomplishing the sort? (The list has 1000s of files).


That's easy to achieve, since sort takes a custom sort function as
optional argument.  Now you need only a function which takes the values
of the fileds and compares them.

E.g.

lst.sort(lambda m, n: cmp(m.get(field), n.get(field)))
where field is either 'name' or 'size'.

As a function:

def sort_it (lst, field):
    lst.sort(lambda m, n: cmp(m.get(field), n.get(field)))


Karl
_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to