On Mon, 25 Jul 2011 10:07 am Billy Mays wrote: > On 7/24/2011 2:27 PM, SigmundV wrote:
>> list_of_integers = map(string_to_int, list_of_strings) >> >> Of course, this will be horribly slow if you have thousands of >> strings. In such a case you should use an iterator (assuming you use >> python 2.7): >> >> import itertools as it >> iterator = it.imap(string_to_int, list_of_strings) > if the goal is speed, then you should use generator expressions: > > list_of_integers = (int(float(s)) for s in list_of_strings) I'm not intending to pick on Billy or Sigmund here, but for the beginners out there, there are a lot of myths about the relative speed of map, list comprehensions, generator expressions, etc. The usual optimization rules apply: We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. -- Donald Knuth More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity. -- W.A. Wulf and of course: If you haven't measured it, you're only guessing whether it is faster or slower. (And unless you're named Raymond Hettinger, I give little or no credibility to your guesses except for the most obvious cases. *wink*) Generators (including itertools.imap) include some overhead which list comprehensions don't have (at least in some versions of Python). So for small sets of data, creating the generator may be more time consuming than evaluating the generator all the way through. For large sets of data, that overhead is insignificant, but in *total* generators aren't any faster than creating the list up front. They can't be. They end up doing the same amount of work: if you have to process one million strings, then whether you use a list comp or a gen expression, you still end up processing one million strings. The only advantage to the generator expression (and it is a HUGE advantage, don't get me wrong!) is that you can do the processing lazily, on demand, rather than all up front, possibly bailing out early if necessary. But if you end up pre-processing the entire data set, there is no advantage to using a gen expression rather than a list comp, or map. So which is faster depends on how you end up using the data. One other important proviso: if your map function is a wrapper around a Python expression: map(lambda x: x+1, data) [x+1 for x in data] then the list comp will be much faster, due to the overhead of the function call. List comps and gen exprs can inline the expression x+1, performing it in fast C rather than slow Python. But if you're calling a function in both cases: map(int, data) [int(x) for x in data] then the overhead of the function call is identical for both the map and the list comp, and they should be equally as fast. Or slow, as the case may be. But don't take my word on this! Measure, measure, measure! Performance is subject to change without notice. I could be mistaken. (And don't forget that everything changes in Python 3. Whatever you think you know about speed in Python 2, it will be different in Python 3. Generator expressions become more efficient; itertools.imap disappears; the built-in map becomes a lazy generator rather than returning a list.) -- Steven -- http://mail.python.org/mailman/listinfo/python-list