On Fri, Apr 23, 2010 at 11:33 PM, Emile van Sebille <em...@fenx.com> wrote: > On 4/23/2010 2:21 PM Alan Gauld said... >> >> "Emile van Sebille" <em...@fenx.com> wrote >>> >>> It's expensive enough that for a list this size I'd convert it to a >>> dict and use in on that. eg, >>> >>> a = range(100000) >>> d = dict(zip(a,a)) >>> >> >> Surely that would depend on how often you do the search? If its a one >> off occurence I'd expect the overhead of zipping and converting to a >> dict would outweight the savings? > > Oh sure, but in practical terms, if it's a one-off situation who cares which > you us? For a one-off I'd just use in and not be concerned. >
I guess Alan missed my second e-mail with the micro benchmarks, but that nuances my first answer quite a bit, and makes all the points he is making. That does not make him less correct, of course. set (I used frozenset, but either way) is faster than dict, and in a one-off you're best off using bisect. For completeness sake, on a 10000 item list, using the in operator takes *in the worst case* around 7 seconds. bisect, again in the worst case, takes only around 0.01 seconds (that's on a Core 2 Duo T5300 @ 1.7 GHZ, 2 GB RAM). That's quite a savings. those 7 seconds can easily be 99% of the execution time of a typical script. So for sufficiently large data set it can definitely pay off to be concerned, even with a one-off Of course, profiling will immediately catch that kind of performance bottleneck. So even if you care about performance, you can start off using 'in' and easily optimize later with bisect or a set, the whole "don't do it yet" thing and all. Hugo _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor