On Tue, Sep 29, 2009 at 11:11 AM, Scooter <slbent...@gmail.com> wrote:
> I'm attempting to reformat an apache log file that was written with a
> custom output format. I'm attempting to get it to w3c format using a
> python script. The problem I'm having is the field-to-field matching.
> In my python code I'm using split with spaces as my delimiter. But it
> fails when it reaches the user agent because that field itself
> contains spaces. But that user agent is enclosed with double quotes.
> So is there a way to split on a certain delimiter but not to split
> within quoted words.
>
> i.e. a line might look like
>
> 2009-09-29 12:00:00 - GET / "Mozilla/4.0 (compatible; MSIE 7.0;
> Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC
> 5.0; .NET CLR 3.0.04506; .NET CLR 3.5.21022)" http://somehost.com 200
> 1923 1360 31715 -
> --
> http://mail.python.org/mailman/listinfo/python-list
>

s = '''2009-09-29 12:00:00 - GET / "Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0;
.NET CLR 3.0.04506; .NET CLR 3.5.21022)" http://somehost.com 200 1923
1360 31715 -'''


initial, user_agent, trailing = s.split('"')

# Then depending on what you want to do with them...
foo = initial.split() + [user_agent] + trailing.split()
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to