Hi all,

Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.

The first design of the algorithm was

for line in fileinput.input(sys.argv[1:]):
    ip = line.split()[0]
    if match_counter.has_key(ip):
        match_counter[ip] += 1
    else:
        match_counter[ip] = 1

And it took 3min 58 seg to give me the stats

Then i tried a generator solution like

def generateit():
    for line in fileinput.input(sys.argv[1:]):
        yield line.split()[0]

for ip in generateit():
    ...the same if sentence

Instead of being faster it took 4 min 20 seg

Should i leave fileinput behind?
Am i using generators with the wrong aproach?

Thanks in advance,

Federico.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to