It is probably better to store your key file in memory
then loop over the large data file and check the
line against each key. Better to check 2000 data
keys in memory for one loop of the data file.
That way you only read the key file and data file
once each - 502,000 reads instead of a billion.
I replace one loop (in file), with searching in a *list*, and it's much
faster :)
my_list = open("list_file.txt")
file_list = [i[:-1] for i in my_list.readlines()]
file_large = open("large_file.txt")
save_file = open("output.txt", "w")
for row in file_large:
split_row = row.split()
if split_row[0] in file_list:
save_file.write(row)
file_large.close()
file_list.close()
thx for all
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor