On Fri, Jun 13, 2014 at 05:10:28AM -0700, Albert-Jan Roskam wrote: > The other day I used collections.namedtuple and I re-initialized > Record (see below) with every function*) call. Bad idea! It looks > nicer because I did not need a global (and globals are baaad, mkay?), > but it was *much* slower. I processed a log of a few million lines, I > think. > > # bad --> time-consuming > import collections > > def do_something_with(raw_record): > Record = collections.namedtuple("_", " ".join("v%%03d" % i for i in > range(100))) > return Record(*raw_record.split())
Look at how much work you do here. First, you create a long string of the form: "v000 v001 v002 v003 ... v099" representing 1000 v-digits names. Then you create a brand new Record class that takes those 100 v-digits names as arguments. Creating that class requires building a string, parsing it as Python code, and then running it. (You're not expected to know that, but if you read the source code for namedtuple you will see that's how it works.) So creating that class is slow. Every time you call the function, it builds a new "v000 ... v099" string, from scratch, then builds a new class, also from scratch, and finally populates an instance of that class with 100 values from the raw_record. Only that last step needs to be done inside the function. > # better --> even though it uses a global variable > import collections > > Record = collections.namedtuple("_", " ".join("v%%03d" % i for i in > range(100))) [Aside: you may find it easier to debug problems with this if you give the namedtuple class a sensible name, like "Record", rather than "_".] How is that a global *variable*? It's a global name, "Record", but it is no more a "variable" than it would be if you did: class Record(tuple): def __new__(cls, v000, v001, v002, ... , v099): # code goes here @property def v000(self): return self[0] # likewise for v001, v002, ... v099 # plus additional methods namedtuple is a factory function which creates a class. Buried deep within it is a class statement, just as if you had written the class yourself. Normally, when you create a class, you don't treat it as a variable, you treat it as a constant, like functions. That is no different from classes you create with the class keyword. So "global variables are bad" doesn't apply because it's not a variable. Even if it were a variable, what really matters is not that it gets stored in the global scope, but whether or not it gets explicitly passed to functions as arguments, or implicitly modified secretly behind the scenes. For example: # Not bad data = ["put", "stuff", "here"] process(data) do_things_with(data) # Bad, for various reasons data = ["put", "stuff", "here"] process() # process what? do_things_with() # What are we doing things with? In the first case, "data" may be stored in the global scope, but inside each function it is treated as a regular local variable. Let's contrast how one might operate on a second set of data in each case: # Too easy process(some_other_data) # Ouch, this is painful save_data = data data = some_other_data process() data = save_data # restore the previous value Global variables aren't bad because Moses came down from the mountains with a stone tablet that declares that they are bad. They're bad because they cause excessive coupling, they operate by side-effect, they spoil idepotent code, and they are implicit instead of explicit. > def do_something_with(raw_record): > return Record(*raw_record.split()) Much more sensible! -- Steven _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor