Re: Reading large (10+GB) files in PHP Command Line

Nicholas Leippe Wed, 02 Sep 2009 15:35:39 -0700

Also, as Sasha noted about keys, personally, I would recommend always using a 
MyIsam table w/no keys as the first table--this will be the fastest to just 
get it into the database. Then, as you massage the data and work it into the 
final set of tables whatever they need to be MyIsam/InnoDb/other you can add 
indexes as necessary. This solves at least two problems:
1) keys slow it down (because it checks/updates the keys for each row on 
insert)
2) unique keys can make the import fail if key constraints are not met in the 
raw data--you don't want that to happen on the last record of an 80GB file...


Once the data is loaded, adding (or attempting to add) a unique index is 
relatively cheap if it fails compared to having to load the full data set 
again.

If you have variable length data, once it's loaded into a table, the next step 
I usually do is split it into two tables--one for the fixed length fields, and 
a second for all the variable length fields. This strategy can make many types 
of queries much faster, since the optimizer can eliminate rows on a fixed-
record-length table much better (or at all in some cases). Also, ime, one 
rarely does anything that is optimal on variable length data anyways, while 
the converse is usually true of fixed-length fields--so letting the fixed 
length fields be in a table that allows the optimizer freedom, while keeping 
the fields that will usually kill the optimizer's chances in a separate table 
just makes sense to me.



/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Re: Reading large (10+GB) files in PHP Command Line

Reply via email to