Note that LOAD DATA INFILE can deal with duplicate key errors. It has
IGNORE/REPLACE option.

Also, you can capture an import column into a user variable and then
use the SET syntax to massage it.

I just did a little exercise with a default apache log. First, I
created the following table:


CREATE TABLE `apache_log` (
  `ip` int(10) unsigned NOT NULL,
  `ts` datetime NOT NULL,
  `file` text NOT NULL,
  `status_code` smallint(6) NOT NULL,
  `content_length` int(10) unsigned NOT NULL,
  `ref_url` text NOT NULL,
  `user_agent` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Then I loaded a default-formatted Apache access log like this:

load data infile '/tmp/access.log' into table apache_log fields
terminated by ' ' optionally enclosed by '"'
(@ip,@not_used,@not_used,@ts_str,@not_used,@url_str,status_code,content_length,ref_url,user_agent)
set ip = inet_aton(@ip), ts =
str_to_date(substring(@ts_str,2),'%d/%b/%Y:%H:%i:%s'), file =
substring(@url_str,locate('/',@url_str),
length(@url_str)-8-locate('/',@url_str));

The log had 98 M of data with 374726 rows. The load took 10 seconds on
Pentium 4 2.4 GHz, 1 GB RAM, a regular IDE drive, running Mysql
5.0.67.


-- 
Sasha Pachev
AskSasha Linux Consulting
http://asksasha.com

Fast Running Blog.
http://fastrunningblog.com
Run. Blog. Improve. Repeat.

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to