From: tutor-bounces+bermanrl=cfl.rr....@python.org [mailto:tutor-bounces+bermanrl=cfl.rr....@python.org] On Behalf Of Ricardo Aráoz Sent: Friday, April 23, 2010 6:33 PM To: Hugo Arts Cc: tutor@python.org; Emile van Sebille Subject: Re: [Tutor] Binary search question
Hugo Arts wrote: On Fri, Apr 23, 2010 at 11:33 PM, Emile van Sebille <em...@fenx.com> wrote: On 4/23/2010 2:21 PM Alan Gauld said... "Emile van Sebille" <em...@fenx.com> wrote It's expensive enough that for a list this size I'd convert it to a dict and use in on that. eg, a = range(100000) d = dict(zip(a,a)) Surely that would depend on how often you do the search? If it's a one off occurrence I'd expect the overhead of zipping and converting to a dict would outweigh the savings? Oh sure, but in practical terms, if it's a one-off situation who cares which you us? For a one-off I'd just use in and not be concerned. I guess Alan missed my second e-mail with the micro benchmarks, but that nuances my first answer quite a bit, and makes all the points he is making. That does not make him less correct, of course. set (I used frozenset, but either way) is faster than dict, and in a one-off you're best off using bisect. For completeness sake, on a 10000 item list, using the in operator takes *in the worst case* around 7 seconds. bisect, again in the worst case, takes only around 0.01 seconds (that's on a Core 2 Duo T5300 @ 1.7 GHZ, 2 GB RAM). That's quite a savings. those 7 seconds can easily be 99% of the execution time of a typical script. So for sufficiently large data set it can definitely pay off to be concerned, even with a one-off Of course, profiling will immediately catch that kind of performance bottleneck. So even if you care about performance, you can start off using 'in' and easily optimize later with bisect or a set, the whole "don't do it yet" thing and all. Hugo _______________________________________________ In the same vein of completeness sake, and since this is the *tutor* list, we should stress that the 'right' approach is to use 'in' whatever the case. And only if the system is too slow and profiling shows that 'in' is the culprit should we seek for alternatives. Wow. I feel I have just been b h slapped across the face. I think Hugos test results pretty much confirmed in is not the way to go although it is one of the methods I am trying even though my tendency, since the number of elements is always less than 1,000,000, is to use the dictionary approach. But, even though my years of experience using Python is less than 4 I would be reluctant to use in just based on what I have been reading from those who took the time to answer my post. Just my $0.02 worth. Robert Berman What you don't see with your eyes, don't invent with your mouth. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor