Note that LOAD DATA INFILE can deal with duplicate key errors. It has
IGNORE/REPLACE option.
Also, you can capture an import column into a user variable and then
use the SET syntax to massage it.
I just did a little exercise with a default apache log. First, I
created the following table:
CREATE TABLE `apache_log` (
`ip` int(10) unsigned NOT NULL,
`ts` datetime NOT NULL,
`file` text NOT NULL,
`status_code` smallint(6) NOT NULL,
`content_length` int(10) unsigned NOT NULL,
`ref_url` text NOT NULL,
`user_agent` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Then I loaded a default-formatted Apache access log like this:
load data infile '/tmp/access.log' into table apache_log fields
terminated by ' ' optionally enclosed by '"'
(@ip,@not_used,@not_used,@ts_str,@not_used,@url_str,status_code,content_length,ref_url,user_agent)
set ip = inet_aton(@ip), ts =
str_to_date(substring(@ts_str,2),'%d/%b/%Y:%H:%i:%s'), file =
substring(@url_str,locate('/',@url_str),
length(@url_str)-8-locate('/',@url_str));
The log had 98 M of data with 374726 rows. The load took 10 seconds on
Pentium 4 2.4 GHz, 1 GB RAM, a regular IDE drive, running Mysql
5.0.67.
--
Sasha Pachev
AskSasha Linux Consulting
http://asksasha.com
Fast Running Blog.
http://fastrunningblog.com
Run. Blog. Improve. Repeat.
/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/