A slightly faster version that uses the powers of str_to_date() better
(about 10% speed up in my tests):
load data infile '/tmp/access.log' into table apache_log fields
terminated by ' ' optionally enclosed by '"'
(@ip,@not_used,@not_used,@ts_str,@not_used,@url_str,status_code,content_length,ref_url,user_agent)
set ip = inet_aton(@ip), ts =
str_to_date(@ts_str,'[%d/%b/%Y:%H:%i:%s'), file =
substring(@url_str,locate('/',@url_str),
length(@url_str)-8-locate('/',@url_str));
To make it go even faster you could write a couple of UDFs for parsing
the input. I would not mess with inet_aton() since it is about as fast
as it gets. However, str_to_date() could be replaced with a UDF that
is Apache date-format specific and you could make it very tight and
quick. I would, however, expect the most improvement from the UDF that
extracts the file name.
And in case this is not obvious - the IP address is stored as an
integer. So you will need inet_ntoa() to convert it to human-readable
format.
--
Sasha Pachev
AskSasha Linux Consulting
http://asksasha.com
Fast Running Blog.
http://fastrunningblog.com
Run. Blog. Improve. Repeat.
/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/