Going back to the Ben's original question, this page
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html says that Anand's
suggestion of using the LOAD DATA INFILE command is 20 times faster than
doing INSERTs.

If you do want to continue doing INSERTs, it has suggestions for how to
speed them up (including the LOCK TABLES <tables> WRITE) command I
suggested earlier.

Some less impactful suggestions:
- On the reading side, given the CPU/IO balance of today's systems, it's
faster to keep the dump compressed and decompress it on the fly (less data
to read from the slow I/O subsystem + fast CPU == win).
- I'd consider dropping your the column containing the position for
multivalued items.  I don't think position is significant in most cases.
 If you want to keep it, store it as a short integer rather than a string.

I'd be curious how the performance has improved from the original 10k
records/hr with the various optimizations you've tried.

Tom


On Thu, Sep 12, 2013 at 6:10 PM, Ben Companjen <[email protected]>wrote:

> Thanks Anand, Bryan.
>
> My guess was right: not committing every record reduces the load on the
> harddisk and speeds things up, even though it still is pretty slow.
>
> Anyway, I put the script in the OL Dump scripts collection:
> https://github.com/bencomp/oldumpscripts/blob/master/dump2mysql.py
> next to the existing dump2csv.py:
> https://github.com/bencomp/oldumpscripts/blob/master/dump2csv.py
>
> Neither convert everything in every record, but it's all copy/paste from
> here.
>
> Ben
>
>
> On 1 September 2013 23:45, Bryan Fordham <[email protected]> wrote:
>
>> If you want to post your script, perhaps someone can speed it up.
>>
>> Thanks,
>> --B
>>
>> Sent from my phone. Please excuse any spelling or grammar errors.
>> On Sep 1, 2013 5:42 PM, "Ben Companjen" <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I created a Python script that reads a dump file and puts the edition
>>> records in a MySQL database.
>>>
>>> It works (when you manually create the tables), but it's very slow:
>>> 10000 records in about an hour, which means all editions will take
>>> about 10 days of continuous operation.
>>>
>>> Does anybody have a faster way? Is there some script for this in the
>>> repository?
>>>
>>> Regards,
>>>
>>> Ben
>>> _______________________________________________
>>> Ol-tech mailing list
>>> [email protected]
>>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>>> To unsubscribe from this mailing list, send email to
>>> [email protected]
>>>
>>
>> _______________________________________________
>> Ol-tech mailing list
>> [email protected]
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>> To unsubscribe from this mailing list, send email to
>> [email protected]
>>
>>
>
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to
> [email protected]
>
>
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
Archives: http://www.mail-archive.com/[email protected]/
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to