On Feb 17, 11:23 pm, greg <[EMAIL PROTECTED]> wrote: > Wolfgang Draxinger wrote: > > Somehow you seem to think, that a lookup table will require more > > resources (memory I guess you thought) than a sequence of > > comparisons. However you didn't take into account, that the > > program code itself requires memory, too (for the operation > > codes). > > In Python, there's the additional twist that for > lookup tables based on a mutable type, such as a > dictionary, you need to execute code to build the > dictionary. So you end up using very roughly twice > the memory -- for the code to build the dictionary, > and the dictionary itself. > > If the dictionary isn't too huge, and it's looked > up many times during each run of the program, the > extra speed of the dict lookup is probably worth > the overhead of creating it. > > If the dict is very large, and is consulted > relatively few times in a given run, then it might > well be faster and not use too much more memory to > use a tree (NOT a linear sequence!) of if-else > statements in the code. > > You could also consider loading the dict from some > other form, such as a pickle, instead of creating > it using code. This would use less memory, although > probably would take roughly the same time to set > up the table. > > For extremely large infrequently-used tables, it's > probably better to look at not loading the whole > table into memory at all, and keeping it in some > kind of external indexed structure such as a b-tree, > relational database, etc. > > In other languages, the tradeoffs will be completely > different. E.g. in C, if you can describe the table > entirely with static data, it'll be very fast to > load and incur no overhead for code to create it at > all. Also, with demand-paging out of the executable > file, and infrequent lookups, only parts of the > table will actually get loaded anyway. > > -- > Greg
Past a "many-small" certain point on numbers of hash-tables, if that's the right word, in a program, and intepreter process on a machine, is it be more time-efficient to allocate a 2**32-byte table? Are 'modulo' and 'doublesize' the only steps of the lookup process that it would eliminate, and are they expensive ones? If so, what point? If not, what's a tighter bottleneck? -- http://mail.python.org/mailman/listinfo/python-list