On Fri, Oct 19, 2018 at 09:12:54AM -0700, Pat Martin wrote: > Sorry my first email didn't have a subject line > > TLDR; How do you figure out if code is inefficient (if it isn't necessarily > obvious) and how do you find a more efficient solution?
You know it is inefficient if it takes longer to run than it ought to. The tricky questions are: - if you don't *measure* the time, how do you know how long it takes? - how do you know how long it ought to take? other than wishful thinking? - how do you find a more efficient solution? That's comes from experience, practice, and thinking about the problem you are trying to solve. The last one is especially tricky, because it may involve coming up with a solution to a problem that has never been solved before. That requires creativity and insight. But before you do any of that, remember the two critical rules of optimization, from the classic 1975 book "Principles of Program Design" by Michael A. Jackson: Rule 1: Don’t do it. Rule 2 (for experts only): Don’t do it yet. Donald Knuth wrote: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." And to quote W.A. Wulf: "More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity." > I use code wars sometimes to get some practice with Python, there was a > challenge to compare two strings and if string1 had enough characters to be > rearranged to make string2 return True, otherwise return False. Think about the problem to be solved. How do we know that one string could be rearranged to the other? Each string much have: - the same individual letters, e.g. "binary" and "brainy"; - the same total number of letters; - the same number of *each* letter. We don't need to check that the individual letters are the same, because checking the counts will suffice. If they are not the same, one string will have (let's say) two A's while the other will have none, and the counts will be different. If the two strings have different length, we know they can't be anagrams of each other: if len(string1) != len(string2): return False Only then do we need to compare the counts of each letter: for c in string1: if string1.count(c) != string2.count(c): return False Can that be improved? Perhaps -- for short strings, that ought to be pretty speedy, but for very long strings it has to inspect every letter over and over again... for the first letter in string1: count how many times it appears: - compare it to the first letter; - compare it to the second letter; - compare it to the third letter; etc for the second letter in string1: count how many times it appears: - compare it to the first letter; - compare it to the second letter; - compare it to the third letter; etc etc. So you can see, each letter gets inspected over and over again. If the strings are long, that does a lot of work. True, it is done at the speed of optimized C code (the string.count() method is pretty fast), but we can do better by using the Counter, which only needs to inspect each letter once: if collections.Counter(string1) != collections.Counter(string2): return False Can we do better? Maybe... building the counters only needs to inspect each letter once. But Counters have a bit of overhead, they're not trivial code. There's a conceptually simpler way which *might* (or might not!) be faster in practice: if sorted(string1) != sorted(string2): return False Think about why that test works. If you have truly huge strings, then the Counter will be better, for sure. But for moderate-sized, or small, strings, I wouldn't want to guess which is faster. Even though technically sorted does more work, it probably has less overhead. It might very well win for some cases. > Sorry about the long email, not sure if there is even an answer to this > question except maybe write more code and get more experience? Indeed. Nothing beats experience. But you can also stand on the shoulders of those who came before you, in the words of Sir Isaac Newton, one of the greatest mathematicians and physicists of any time: "If I have seen further, it is because I have stood on the shoulders of giants" which was a not-at-all veiled dig against his rival and equal genius, Sir Robert Hooke, who was short and bent. (Newton was a genius, but he was not a nice guy, and he had many rivalries and feuds. In fairness, so did Hooke.) You should start here: https://www.joelonsoftware.com/2001/12/11/back-to-basics/ and think about how that relates to your original algorithm. As far as programming in Python goes, some general advice: - whenever you can do something by calling a built-in function or method, it will probably be faster than doing it in Python code you wrote yourself; - so if you want fast code, try to find a way to use built-ins whenever possible; - looking up values in a set or dict is optimized to be very fast; - appending to the end of a list is fast; - but inserting at the beginning is slow; - "fast" and "slow" is relative: don't worry about "slow" operations until you know that they actually make a difference. (Remember the quote from W.A. Wulf.) - and watch out for those Shlemiel the painter’s algorithms! -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor