I am currently using the following technic to get the info above:
all = defaultdict(int)
hosts = defaultdict(int)
filename = defaultdict(int)
for r in log:
all[r['host'],r['file']] += 1
hosts[r['host']] += 1
filename[r['file']] = 1
for host in sorted(hosts,key=hosts.get, reverse=True):
for file in filename:
print host, all[host,file]
print hosts[host]
I was looking for a better option instead of building 'three' collections
to improve performance.
- Jo
On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel <[EMAIL PROTECTED]> wrote:
> On Wed, Oct 8, 2008 at 1:55 PM, Joe Python <[EMAIL PROTECTED]> wrote:
> > I want to find the top '100' hosts (sorted in descending order of total
> > requests) like follows:
> > Is there a fast way to this without scanning the log file many times?
>
> As you encounter a new "host" add it to a dict (or another type of
> collection), and if encountered again, use that "host" as the key to
> retrieve the dict entry and increment it's request count. You should
> only have to read the file once.
>
--
http://mail.python.org/mailman/listinfo/python-list