Apache log munging
I have a written a generator for an apache log which returns two types of information, hostname and the filename requested. The 'log' generator can be 'consumed' like this: for r in log: print r['host'], r['filename'] I want to find the top '100' hosts (sorted in descending order of total requests) like follows: host filename1 filename2 filename3 Total hostA 6 9 45 110 hostC 4 4343 98 hostB 344 45 83 and so on. Is there a fast way to this without scanning the log file many times? Thanks in advance. - Jo -- http://mail.python.org/mailman/listinfo/python-list
Re: Apache log munging
On Wed, Oct 8, 2008 at 1:55 PM, Joe Python [EMAIL PROTECTED] wrote: I want to find the top '100' hosts (sorted in descending order of total requests) like follows: Is there a fast way to this without scanning the log file many times? As you encounter a new host add it to a dict (or another type of collection), and if encountered again, use that host as the key to retrieve the dict entry and increment it's request count. You should only have to read the file once. -- http://mail.python.org/mailman/listinfo/python-list
Re: Apache log munging
I am currently using the following technic to get the info above: all = defaultdict(int) hosts = defaultdict(int) filename = defaultdict(int) for r in log: all[r['host'],r['file']] += 1 hosts[r['host']] += 1 filename[r['file']] = 1 for host in sorted(hosts,key=hosts.get, reverse=True): for file in filename: print host, all[host,file] print hosts[host] I was looking for a better option instead of building 'three' collections to improve performance. - Jo On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel [EMAIL PROTECTED] wrote: On Wed, Oct 8, 2008 at 1:55 PM, Joe Python [EMAIL PROTECTED] wrote: I want to find the top '100' hosts (sorted in descending order of total requests) like follows: Is there a fast way to this without scanning the log file many times? As you encounter a new host add it to a dict (or another type of collection), and if encountered again, use that host as the key to retrieve the dict entry and increment it's request count. You should only have to read the file once. -- http://mail.python.org/mailman/listinfo/python-list