I am working on a reducer that needs to produce a sorted output of files
sorted on their overall bandwidth use. I create a dictionary with the file
name as the key (it is always unique) and in the values I am populating a
list with the two values of bytes and bytes sent.
Each entry looks like {filename:[bytes, bytes_sent]}
how would I sort on bytes sent?
how would I make this more efficient?
code:
# Expect as input:
# URI,1,return_code,bytes,referer,ip,time_taken,bytes_sent,ref_dom
# index 0 1 2 3 4 5 6 7 8
import sys
dict = {}
def update_dict(filename, bytes, bytes_sent):
# Build and update our dictionary adding total bytes sent.
if dict.has_key(filename):
bytes_sent += dict[filename][1]
dict[filename] = [bytes, bytes_sent]
else:
dict[filename] = [bytes, bytes_sent]
# input comes from STDIN
for line in sys.stdin:
# remove leading and trailing whitespace and split on tab
words = line.rstrip().split('\t')
file = words[0]
bytes = words[3]
bytes_sent = int(words[7])
update_dict(file, bytes, bytes_sent)
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor