Re: Bulk inserts

Shannon -jj Behrens Wed, 11 Jun 2008 19:00:21 -0700

Beware of multiple threads and/or processes writing to the file at the
same time.  It may make sense to use a mutex.


I agree with the earlier recommendations to use mysql load data in
file.  It really is a lot faster.

You can completely remove Python from the picture when importing the
file.  Here's an example:

    mysqlimport \
        --user=user \
        --password=password \
        --columns=id,name \
        --compress \
        --fields-optionally-enclosed-by='"' \
        --fields-terminated-by='\t' \
        --lines-terminated-by='\n' \
        --local \
        --lock-tables \
        --replace \
        --verbose \
        mydatabase mylocalfile.tsv

To write that tsv file, I used the csv module with:

DEFAULT_KARGS = dict(dialect='excel-tab', lineterminator='\n')


def create_default_reader(iterable):
    """Return a csv.reader with our default options."""
    return csv.reader(iterable, **DEFAULT_KARGS)


def create_default_writer(iterable):
    """Return a csv.writer with our default options."""
    return csv.writer(iterable, **DEFAULT_KARGS)

There's one more thing to keep in mind.  If you do a load data from
file using Python, make sure you pass  local_infile=True to MySQLdb's
connect function.

Happy Hacking!
-jj

On Wed, Jun 11, 2008 at 6:15 PM, Pete Wright <[EMAIL PROTECTED]> wrote:
>
> That's an awesome tip - thanks so much guys for pointing me in a far
> less painful direction. My only concern now is speed in working with
> the file and minimizing wait states. For example, I presumably don't
> want anything writing to the text file while I'm loading it into the
> database, and similarly I don't want to take time on every single
> request to check the size of the text file to see if I should run a
> LOAD DATA on it.
>
> So, I think what I'm going to end up doing here is converting time
> into seconds since midnight and then mod 60 it to come up with a file
> name suffix. I'll also have another process running separately
> processing and deleting unecessary files. Do you guys see any problems
> with that method?
>
>
> On Jun 11, 6:58 pm, Jonathan Vanasco <[EMAIL PROTECTED]> wrote:
>> write to a .txt log in some sort of standardized format , when it hits
>> 10k lines run a batch query
>>
>> as mentioned above, mysql has its own insert format; postgres does
>> too.  both insert formats are FAR faster than going through individual
>> inserts.
>>
>> when i still used mysql about 4 years ago, i remember benching an
>> insert of geographic data.  it was something like 6 hours via inserts,
>> 4hours via grouped inserts, and 25 seconds using the native import
>> format.
> >
>



-- 
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Bulk inserts

Reply via email to