>Gotcha. So the bulk insert is where the speed comes from, not the >file >processing. Does MySQL support transactions yet? Sounds like a >great >use for them. Start a transaction, add the queries one at a time, >then >post them in bulk (well, every 8000).
MySQL has supported transactions for a very long time - since late 2001 if I remember right. However, transactions are not related to the bulk insert. Bulk insert can happen just as fast, in fact faster for the same type of row, on a non-transactional (MyISAM) table. If you ever have a large file, even if it does need some pre-processing, the fastest way to get it into your tables regardless of which engine they are in is LOAD DATA INFILE. In the last 5 or so years I do not recall ever needing to use a scripting language to load MySQL data from some reasonably delimited file even when it was quite dirty and some clean-up was needed. General rule of thumb is that if you have record per line (even if the line delimiter is some weird character or set of characters instead of the standard linebreak) you can do it with SQL commands alone. Generally in a difficult case the steps are like this: - Create a temporary table that will store the data in the import file layout. It may benefit from keys if you are going to be filtering or matching. Keys will slightly slow down LOAD DATA INFILE, but then may speed up the filtering or matching. Note that LOAD DATA INFILE allows you to skip first N lines of input, or skip unnecessary columns. It is also capable of dealing with weird delimiters (even multi-character or even blank), and equally weird line terminators, quoting, and escapes. - Execute a clever INSERT INTO SELECT that will fix whatever is wrong in the original data on the file. This will give you the benefits of the bulk insert as well. - Sometimes you may need to use a multi-table UPDATE - Sometimes you may need INSERT IGNORE or REPLACE - Sometimes INSERT ... ON DUPLICATE KEY UPDATE is helpful SQL language, or I should say its MySQL variant to avoid angering the purists and pedants, is more powerful than we often give it credit for. One time for the purpose of testing job applicants creativity I gave them a challenge to extract word frequency counts from a text file into a MySQL table only using MySQL command-line client and nothing else. Just about everybody said that it either could not be done or they had no clue how to do it, or they thought I must have meant something else (sure this cannot be right!) and used forbidden techniques. Except for the one that got the job. Aside from the challenge aspect, if you are able to do the load with SQL alone it usually runs a lot faster than if you involve scripting. -- Sasha Pachev AskSasha Linux Consulting http://asksasha.com Fast Running Blog. http://fastrunningblog.com Run. Blog. Improve. Repeat. /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
