Dear all,

I recently tried to import one of the latest SQL dumps of English Wikipedia titles in MySQL:

 * 2016-04-11 20:07:38 done Base per-page data (id, title, old
   restrictions, etc).
     o enwiki-20160407-page.sql.gz
       <https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-page.sql.gz>
       1.3 GB

I think there is a performance issue that can be largely improved.
Despite I disabled the check of foreign constraints and I enabled a large packet size and did some other optimizations, I took around 2 days to do it in a laptop with 8GB of RAM. I think I can't disable the autocommit due to the lack of RAM, but even without auto_commit I tried to modify some columns and it took ages.

I did a test. I edited the SQL dump file and remove the creation of indexes when the table is created. This was not very easy because very few test editors can open such a large file. I ran the import again, without inidices and key constraints, etc. and it took less than two hours. Creating the indices / key constraints after importing the tables took around 10-15 minutes per index. I think with the original dump there is a problem: all the indices are being rebuilt after every insertion, which seems to be very inefficient. So I would like to ask if there is any way that the dump files are created in a more optimized way, skipping the index creation at first and leaving it for the end of the script.

Best regards,

Rafael
_______________________________________________
Xmldatadumps-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to