Also, as Sasha noted about keys, personally, I would recommend always using a MyIsam table w/no keys as the first table--this will be the fastest to just get it into the database. Then, as you massage the data and work it into the final set of tables whatever they need to be MyIsam/InnoDb/other you can add indexes as necessary. This solves at least two problems: 1) keys slow it down (because it checks/updates the keys for each row on insert) 2) unique keys can make the import fail if key constraints are not met in the raw data--you don't want that to happen on the last record of an 80GB file...
Once the data is loaded, adding (or attempting to add) a unique index is relatively cheap if it fails compared to having to load the full data set again. If you have variable length data, once it's loaded into a table, the next step I usually do is split it into two tables--one for the fixed length fields, and a second for all the variable length fields. This strategy can make many types of queries much faster, since the optimizer can eliminate rows on a fixed- record-length table much better (or at all in some cases). Also, ime, one rarely does anything that is optimal on variable length data anyways, while the converse is usually true of fixed-length fields--so letting the fixed length fields be in a table that allows the optimizer freedom, while keeping the fields that will usually kill the optimizer's chances in a separate table just makes sense to me. /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
