>Gotcha.  So the bulk insert is where the speed comes from, not the >file
>processing.  Does MySQL support transactions yet?  Sounds like a >great
>use for them.  Start a transaction, add the queries one at a time, >then
>post them in bulk (well, every 8000).

MySQL has supported transactions for a very long time - since late
2001 if I remember right. However, transactions are not related to the
bulk insert. Bulk insert can happen just as fast, in fact faster for
the same type of row, on a non-transactional (MyISAM) table.

If you ever have a large file, even if it does need some
pre-processing, the fastest way to get it into your tables regardless
of which engine they are in is LOAD DATA INFILE. In the last 5 or so
years I do not recall ever needing to use a scripting language to load
MySQL data from some reasonably delimited file even when it was quite
dirty and some clean-up was needed. General rule of thumb is that if
you have record per line (even if the line delimiter is some weird
character or set of characters instead of the standard linebreak) you
can do it with SQL commands alone.

Generally in a difficult case the steps are like this:

- Create a temporary table  that will store the data in the import
file layout. It may benefit from keys if you are going to be filtering
or matching.  Keys will slightly slow down LOAD DATA INFILE, but then
may speed up the filtering or matching. Note that LOAD DATA INFILE
allows you to skip first N lines of input, or skip unnecessary
columns. It is also capable of dealing with weird delimiters (even
multi-character or even blank), and equally weird line terminators,
quoting, and escapes.
- Execute a clever INSERT INTO SELECT that will fix whatever is wrong
in the original data on the file. This will give you the benefits of
the bulk insert as well.
- Sometimes you may need to use a multi-table UPDATE
- Sometimes you may need INSERT IGNORE or REPLACE
- Sometimes INSERT ... ON DUPLICATE KEY UPDATE is helpful

SQL language, or I should say its MySQL variant to avoid angering the
purists and pedants, is more powerful than we often give it credit
for. One time for the purpose of testing job applicants creativity I
gave them a challenge to extract word frequency counts from a text
file into a MySQL table only using MySQL command-line client and
nothing else. Just about everybody said that it either could not be
done or they had no clue how to do it, or they thought I must have
meant something else (sure this cannot be right!) and used forbidden
techniques. Except for the one that got the job.

Aside from the challenge aspect, if you are able to do the load with
SQL alone it  usually runs a lot faster than if you involve scripting.

-- 
Sasha Pachev
AskSasha Linux Consulting
http://asksasha.com

Fast Running Blog.
http://fastrunningblog.com
Run. Blog. Improve. Repeat.

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to