Re: [Tutor] fast list traversal

Dinesh B Vadhia Fri, 31 Oct 2008 05:35:29 -0700

Hi Kent

The code is very simple:

dict_long_lists = defaultdict(list)
for long_list in dict_long_lists.itervalues()
        for element in long_list:
                array_a[element] = m + n + p        # m,n,p are numbers

The long_list's are read from a defaultdict(list) dictionary and so don't need 
initializing.  The elements of long_list are integers and ordered (sorted 
before placing in dictionary).  There are > 20,000 long_list's each with a 
variable number of elements (>5,000).  The elements of long_list are immutable 
(ie. don't change).

I've tried set() using defaultdict(set) but the elements are not ordered.

The problem is what is the fastest way to traverse long_list sequentially from 
the beginning to the end?  Maybe there is another data structure that can be 
used instead of a list.

Hth

Dinesh

--------------------------------------------------------------------------------

Message: 1
Date: Thu, 30 Oct 2008 22:19:52 -0400
From: "Kent Johnson" <[EMAIL PROTECTED]>
Subject: Re: [Tutor] fast list traversal
To: "Shawn Milochik" <[EMAIL PROTECTED]>
Cc: [email protected]
Message-ID:
<[EMAIL PROTECTED]>
Content-Type: text/plain; charset="iso-8859-1"

On Thu, Oct 30, 2008 at 4:55 PM, Shawn Milochik <[EMAIL PROTECTED]> wrote:

> I just ran a very crude test.
>
> Results: Lists load more quickly but iterate more slowly. Dictionaries
> take longer to load but iteration takes about half the time.

Here are my results using timeit and Python 2.6:

Initializing a list is faster than a dict:
kent $ py -m timeit -n 1 -r 3 -s "l=[]" "for i in range(7654321):
l.append(i)"
1 loops, best of 3: 1.77 sec per loop

kent $ py -m timeit -n 1 -r 3 -s "d={}" "for i in range(7654321): d[i] = 0"
1 loops, best of 3: 2.27 sec per loop

The list version can be sped up considerably by hoisting the lookup of the
append method out of the loop:
kent $ py -m timeit -n 1 -r 3 -s "l=[];append=l.append" "for i in
range(7654321): append(i)"
1 loops, best of 3: 1.02 sec per loop

However this is not the fastest way to create either the list or the dict
from a range:
kent $ py -m timeit -n 1 -r 3  "l = range(7654321)"
1 loops, best of 3: 167 msec per loop

kent $ py -m timeit -n 1 -r 3  "d=dict.fromkeys(range(7654321), 0)"
1 loops, best of 3: 1.63 sec per loop

xrange() is a little faster for the dict:
kent $ py -m timeit -n 1 -r 3  "d=dict.fromkeys(xrange(7654321), 0)"
1 loops, best of 3: 1.43 sec per loop

That is why I asked the OP for some real code; you have to optimize for a
specific case, not a generality.

For iteration, I get nearly identical results for the list and dict:
kent $ py -m timeit -n 1 -r 3  -s "l = range(7654321)" "for i in l: i*i"
1 loops, best of 3: 2.74 sec per loop

kent $ py -m timeit -n 1 -r 3  -s "d = dict.fromkeys(range(7654321), 0)"
"for i in d: i*i"
1 loops, best of 3: 2.81 sec per loop

Using a list comprehension is slower, which surprised me; probably because
it has to actually create the list:

kent $ py -m timeit -n 1 -r 3  -s "l = range(7654321)" "[i*i for i in l]"
1 loops, best of 3: 3.65 sec per loop

kent $ py -m timeit -n 1 -r 3  -s "d = dict.fromkeys(range(7654321), 0)"
"[i*i for i in d]"
1 loops, best of 3: 3.66 sec per loop

The moral of the story, as usual with optimization, is that you have to time
and you have to use real code to get meaningful results.

fill_dict_time = time.time()
>
> for x in range(iter_num):
>    the_list.append(x)

Not sure why you have the list append in this loop?

>    the_dict[x] = 0
>
> fill_dict_time = time.time() - fill_dict_time

the_list now has twice as many entries as the_dict, which would account for
it taking twice as long to iterate. Which is another pitfall of timing
tests; it is remarkably easy to make a mistake and time something different
than what you think you are timing.

Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/tutor/attachments/20081030/5c6de155/attachment-0001.htm>

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] fast list traversal

Reply via email to