splitting a string into an array using a time value

2008-10-14 Thread Joe Python
I want to find a way to split a string into an array using a time value.
s = r
  8/25/2008 11:10:08 AM  Lorem ipsum dolor sit amet, consectetuer
adipiscing elit. Sed imperdiet luctus nisl.
  ipsum vel arcu gravida mattis. In mattis dolor id sem. Praesent dictum
tortor non lacus.  0/3/2008 5:10:23 PM
  ras quis ante id lacus sodales accumsan. Morbi bibendum iaculis purus
10/6/2008 4:39:55 PM Maecenas lectus libero,
  tincidunt sed
  
I am looking for an output in the form of an array as follows:

resulting-array = [ 8/25/2008 11:10:08 AM  Lorem ipsum dolor sit amet,
consectetuer adipiscing elit. Sed imperdiet luctus nisl.
  ipsum vel arcu gravida mattis. In mattis dolor id sem. Praesent dictum
tortor non lacus.,

 0/3/2008 5:10:23 PM   ras quis ante id lacus sodales accumsan.
Morbi bibendum iaculis purus,

 10/6/2008 4:39:55 PM Maecenas lectus libero,   tincidunt sed ]

 Note: there is an element corresponding to each time entry in the array

I tried to use the pattern but its not working:
 pattern = r'(\d+/\d+/\d+ \d+:\d+:\d+ .+)'
 pat = re.compile(pattern)
 result = re.split(pat,s)

- Joe Python
--
http://mail.python.org/mailman/listinfo/python-list


Apache log munging

2008-10-08 Thread Joe Python
I have a written a generator for an apache log which returns two types of
information,
hostname and the filename requested.

The 'log' generator can be 'consumed' like this:

for r in log:
  print r['host'], r['filename']

I want to find the top '100' hosts (sorted in descending order of total
requests) like follows:

host  filename1  filename2 filename3 Total

hostA   6  9 45 110
hostC   4 4343  98
hostB   344 45  83

and so on.
Is there a fast way to this without scanning the log file many times?
Thanks in advance.
- Jo
--
http://mail.python.org/mailman/listinfo/python-list


Re: Apache log munging

2008-10-08 Thread Joe Python
I am currently using the following technic to get the info above:

all = defaultdict(int)
hosts = defaultdict(int)
filename = defaultdict(int)

for r in log:
   all[r['host'],r['file']] += 1
   hosts[r['host']] += 1
   filename[r['file']] = 1


for host in sorted(hosts,key=hosts.get, reverse=True):
for file in filename:
  print host, all[host,file]
print hosts[host]
I was looking for a better option instead of building 'three' collections
to improve performance.

- Jo

On Wed, Oct 8, 2008 at 2:15 PM, Joe Riopel [EMAIL PROTECTED] wrote:

 On Wed, Oct 8, 2008 at 1:55 PM, Joe Python [EMAIL PROTECTED] wrote:
  I want to find the top '100' hosts (sorted in descending order of total
  requests) like follows:
  Is there a fast way to this without scanning the log file many times?

 As you encounter a new host add it to a dict (or another type of
 collection), and if encountered again, use that host as the key to
 retrieve the dict entry and increment it's request count. You should
 only have to read the file once.

--
http://mail.python.org/mailman/listinfo/python-list