On Tue, Oct 27, 2015 at 2:32 PM, jarod_v6--- via Tutor <tutor@python.org> wrote: > Hi! > I want to reads two files and create simple dictionary. Input file contain > more than 10000 rows > > diz5 = {} > with open("tmp1.txt") as p: > for i in p: > lines = i.rstrip("\n").split("\t") > diz5.setdefault(lines[0],set()).add(lines[1]) > > diz3 = {} > with open("tmp2.txt") as p: > for i in p: > lines = i.rstrip("\n").split("\t") > diz3.setdefault(lines[0],set()).add(lines[1])
10000 rows today is not a lot of data, since typical computer memories have grown quite a bit. I get the feeling your program should be able to handle this all in-memory. But let's assume, for the moment, that you do need to deal with a lot of data, where you can't hold the whole thing in memory. Ideally, you'd like to have access to its contents in a key/value store, because that feels most like a Python dict. If that's the case, then what you're looking for is a on on-disk database. There are several out there; one that comes standard in Python 3 is the "dbm" module: https://docs.python.org/3.5/library/dbm.html Instead of doing: diz5 = {} ... we'd do something like this: with diz5 = dbm.open('diz5, 'c'): ... And otherwise, your code will look very similar! This dictionary-like object will store its data on disk, rather than in-memory, so that it can grow fairly large. The other nice thing is that you can do the dbm creation up front. If you run your program again, you might add a bit of logic to *reuse* the dbm that's already on disk, so that you don't have to process your input files all over again. Databases, too, have capacity limits, but you're unlikely to hit them unless you're really doing something hefty. And that's out of scope for tutor@python.org. :P _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor