The straightforward way to solve this problem is to create a dictionary. Like so:
[...]
a, b = get_information(line) if a in dict.keys(): dict[a].append(b) else: dict[a] = [b]
So I timed the three suggestions with a few different datasets:
> cat builddict.py
def askpermission(d, k, v):
if k in d:
d[k].append(v)
else:
d[k] = [v]def askforgiveness(d, k, v):
try:
d[k].append(v)
except KeyError:
d[k] = [v]def default(d, k, v):
d.setdefault(k, []).append(v)def test(items, func):
d = {}
for k, v in items:
func(d, k, v)Dataset where every value causes a collision:
> python -m timeit -s "import builddict as bd" "bd.test([(100, i) for i in range(1000)], bd.askpermission)"
1000 loops, best of 3: 1.62 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(100, i) for i in range(1000)], bd.askforgiveness)"
1000 loops, best of 3: 1.58 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(100, i) for i in range(1000)], bd.default)"
100 loops, best of 3: 2.03 msec per loop
Dataset where no value causes a collision:
> python -m timeit -s "import builddict as bd" "bd.test([(i, i) for i in range(1000)], bd.askpermission)"
100 loops, best of 3: 2.29 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(i, i) for i in range(1000)], bd.askforgiveness)"
100 loops, best of 3: 9.96 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(i, i) for i in range(1000)], bd.default)"
100 loops, best of 3: 2.98 msec per loop
Datset where one of every 5 values causes a collision:
> python -m timeit -s "import builddict as bd" "bd.test([(i % 5, i) for i in range(1000)], bd.askpermission)"
1000 loops, best of 3: 1.82 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(i % 5, i) for i in range(1000)], bd.askforgiveness)"
1000 loops, best of 3: 1.79 msec per loop
> python -m timeit -s "import builddict as bd" "bd.test([(i % 5, i) for i in range(1000)], bd.default)"
100 loops, best of 3: 2.2 msec per loop
So, when there are lots of collisions, you may get a small benefit from the try/except solution. If there are very few collisions, you probably would rather the if/else solution. The setdefault solution patterns about like the if/else solution, but is somewhat slower.
I will probably continue to use setdefault, because I think it's prettier =) but if you're running into a speed bottleneck in this sort of situation, you might consider one of the other solutions.
Steve -- http://mail.python.org/mailman/listinfo/python-list
