Ironically, the best way to speed up Python code is often to eliminate it - the more work you can do in the C runtime, the faster your code will run. Sometimes an approach that seems to do more work - like DSU, which builds two extra lists - is actually faster because it does more in C code.
Here is a program to time the three methods. On my machine, the ratio of times of sort1 : sort2 : sort3 is pretty consistently about 1 : 2 : 6-7, i.e. sort with key parameter is twice as fast as DSU which is three times faster than sort with cmp.
A couple of interesting observations:
- The ratio of sort times is independent of the length of the list. I expected that the performance of sort3 would get progressively worse as the list got longer because I expect that the cmp function would be called a number of times that is O(n*logn). Apparently this is not the case.
- If the list is large and already sorted, sort3 is significantly faster than sort2 - it takes about 2/3 the time of sort2. You can try this by taking out the call to random.shuffle().
Kent
import operator, random, timeit
a = range(10000) random.shuffle(a) dlist = [ {'name':str(i), 'size':i} for i in a ]
def sort1(ds): ''' Sort using Python 2.4 key argument ''' ds.sort(key=operator.itemgetter('size'))
def sort2(ds): ''' Sort using DSU ''' d2 = [ (d['size'], d) for d in ds ] d2.sort() ds[:] = [ d for (name, d) in d2 ]
def sort3(ds): ''' Sort using cmp argument to sort ''' ds.sort(lambda m, n: cmp(m['size'], n['size']))
def timeOne(fn): setup = "from __main__ import " + fn.__name__ + ',dlist' stmt = fn.__name__ + '(dlist[:])'
t = timeit.Timer(stmt, setup) secs = min(t.repeat(3, 10)) print '%s: %f secs' % (fn.__name__, secs)
# Make sure they all give the same answer a1=dlist[:] sort1(a1)
a2=dlist[:] sort2(a2)
a3=dlist[:] sort3(a3)
assert a1 == a2 assert a3 == a2
timeOne(sort1) timeOne(sort2) timeOne(sort3)
Karl Pflästerer wrote:
On 9 Dez 2004, [EMAIL PROTECTED] wrote:
I have a list of dictionaries, each representing info about a file, something like:
[{'name':'foo.txt','size':35}, {'name':'bar.txt','size':35}, ...]
I want to present a sorted list of all the files' data, sorting on the keys 'name' or 'size'. The file 'name' s should be unique (I'm hoping) across all the dictionaries. Can someone point me towards an efficient solution for accomplishing the sort? (The list has 1000s of files).
That's easy to achieve, since sort takes a custom sort function as optional argument. Now you need only a function which takes the values of the fileds and compares them.
E.g.
lst.sort(lambda m, n: cmp(m.get(field), n.get(field))) where field is either 'name' or 'size'.
As a function:
def sort_it (lst, field): lst.sort(lambda m, n: cmp(m.get(field), n.get(field)))
Karl
_______________________________________________ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor