Hi Kent The code is very simple:
dict_long_lists = defaultdict(list) for long_list in dict_long_lists.itervalues() for element in long_list: array_a[element] = m + n + p # m,n,p are numbers The long_list's are read from a defaultdict(list) dictionary and so don't need initializing. The elements of long_list are integers and ordered (sorted before placing in dictionary). There are > 20,000 long_list's each with a variable number of elements (>5,000). The elements of long_list are immutable (ie. don't change). I've tried set() using defaultdict(set) but the elements are not ordered. The problem is what is the fastest way to traverse long_list sequentially from the beginning to the end? Maybe there is another data structure that can be used instead of a list. Hth Dinesh -------------------------------------------------------------------------------- Message: 1 Date: Thu, 30 Oct 2008 22:19:52 -0400 From: "Kent Johnson" <[EMAIL PROTECTED]> Subject: Re: [Tutor] fast list traversal To: "Shawn Milochik" <[EMAIL PROTECTED]> Cc: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="iso-8859-1" On Thu, Oct 30, 2008 at 4:55 PM, Shawn Milochik <[EMAIL PROTECTED]> wrote: > I just ran a very crude test. > > Results: Lists load more quickly but iterate more slowly. Dictionaries > take longer to load but iteration takes about half the time. Here are my results using timeit and Python 2.6: Initializing a list is faster than a dict: kent $ py -m timeit -n 1 -r 3 -s "l=[]" "for i in range(7654321): l.append(i)" 1 loops, best of 3: 1.77 sec per loop kent $ py -m timeit -n 1 -r 3 -s "d={}" "for i in range(7654321): d[i] = 0" 1 loops, best of 3: 2.27 sec per loop The list version can be sped up considerably by hoisting the lookup of the append method out of the loop: kent $ py -m timeit -n 1 -r 3 -s "l=[];append=l.append" "for i in range(7654321): append(i)" 1 loops, best of 3: 1.02 sec per loop However this is not the fastest way to create either the list or the dict from a range: kent $ py -m timeit -n 1 -r 3 "l = range(7654321)" 1 loops, best of 3: 167 msec per loop kent $ py -m timeit -n 1 -r 3 "d=dict.fromkeys(range(7654321), 0)" 1 loops, best of 3: 1.63 sec per loop xrange() is a little faster for the dict: kent $ py -m timeit -n 1 -r 3 "d=dict.fromkeys(xrange(7654321), 0)" 1 loops, best of 3: 1.43 sec per loop That is why I asked the OP for some real code; you have to optimize for a specific case, not a generality. For iteration, I get nearly identical results for the list and dict: kent $ py -m timeit -n 1 -r 3 -s "l = range(7654321)" "for i in l: i*i" 1 loops, best of 3: 2.74 sec per loop kent $ py -m timeit -n 1 -r 3 -s "d = dict.fromkeys(range(7654321), 0)" "for i in d: i*i" 1 loops, best of 3: 2.81 sec per loop Using a list comprehension is slower, which surprised me; probably because it has to actually create the list: kent $ py -m timeit -n 1 -r 3 -s "l = range(7654321)" "[i*i for i in l]" 1 loops, best of 3: 3.65 sec per loop kent $ py -m timeit -n 1 -r 3 -s "d = dict.fromkeys(range(7654321), 0)" "[i*i for i in d]" 1 loops, best of 3: 3.66 sec per loop The moral of the story, as usual with optimization, is that you have to time and you have to use real code to get meaningful results. fill_dict_time = time.time() > > for x in range(iter_num): > the_list.append(x) Not sure why you have the list append in this loop? > the_dict[x] = 0 > > fill_dict_time = time.time() - fill_dict_time the_list now has twice as many entries as the_dict, which would account for it taking twice as long to iterate. Which is another pitfall of timing tests; it is remarkably easy to make a mistake and time something different than what you think you are timing. Kent -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/tutor/attachments/20081030/5c6de155/attachment-0001.htm>
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor