splitting a string into an array using a time value
I want to find a way to split a string into an array using a time value. s = r 8/25/2008 11:10:08 AM Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed imperdiet luctus nisl. ipsum vel arcu gravida mattis. In mattis dolor id sem. Praesent dictum tortor non lacus. 0/3/2008 5:10:23 PM ras quis ante id lacus sodales accumsan. Morbi bibendum iaculis purus 10/6/2008 4:39:55 PM Maecenas lectus libero, tincidunt sed I am looking for an output in the form of an array as follows: resulting-array = [ 8/25/2008 11:10:08 AM Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed imperdiet luctus nisl. ipsum vel arcu gravida mattis. In mattis dolor id sem. Praesent dictum tortor non lacus., 0/3/2008 5:10:23 PM ras quis ante id lacus sodales accumsan. Morbi bibendum iaculis purus, 10/6/2008 4:39:55 PM Maecenas lectus libero, tincidunt sed ] Note: there is an element corresponding to each time entry in the array I tried to use the pattern but its not working: pattern = r'(\d+/\d+/\d+ \d+:\d+:\d+ .+)' pat = re.compile(pattern) result = re.split(pat,s) - Joe Python -- http://mail.python.org/mailman/listinfo/python-list
Apache log munging
I have a written a generator for an apache log which returns two types of information, hostname and the filename requested. The 'log' generator can be 'consumed' like this: for r in log: print r['host'], r['filename'] I want to find the top '100' hosts (sorted in descending order of total requests) like follows: host filename1 filename2 filename3 Total hostA 6 9 45 110 hostC 4 4343 98 hostB 344 45 83 and so on. Is there a fast way to this without scanning the log file many times? Thanks in advance. - Jo -- http://mail.python.org/mailman/listinfo/python-list
Re: Apache log munging
I am currently using the following technic to get the info above: all = defaultdict(int) hosts = defaultdict(int) filename = defaultdict(int) for r in log: all[r['host'],r['file']] += 1 hosts[r['host']] += 1 filename[r['file']] = 1 for host in sorted(hosts,key=hosts.get, reverse=True): for file in filename: print host, all[host,file] print hosts[host] I was looking for a better option instead of building 'three' collections to improve performance. - Jo On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel [EMAIL PROTECTED] wrote: On Wed, Oct 8, 2008 at 1:55 PM, Joe Python [EMAIL PROTECTED] wrote: I want to find the top '100' hosts (sorted in descending order of total requests) like follows: Is there a fast way to this without scanning the log file many times? As you encounter a new host add it to a dict (or another type of collection), and if encountered again, use that host as the key to retrieve the dict entry and increment it's request count. You should only have to read the file once. -- http://mail.python.org/mailman/listinfo/python-list