Re: [OSM-dev] Faster loading with scabies
Most modern databases have some kind of block-load API for efficiently importing large chunks of data into their tables. It's really only the most simple stuff that __doesn't__ slow down when importing using INSERT. Although it should plateau to a constant time/INSERT pretty quickly. Brett Henderson wrote: Joac MyISAM tables are very fast to import regardless of number of rows, but InnoDB seems to slow down as the number of rows increase. I'm surprised loading with LOAD DATA INFILE fixes that. LDI is probably the fastes way to get data into mysql. Disclaimer: http://www.peralex.com/disclaimer.html ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Faster loading with scabies
Sascha Silbe wrote: On Thu, Oct 16, 2008 at 09:21:54AM +1100, Brett Henderson wrote: The biggest problem I found wasn't the actual processing of INSERT statements, it was MySQL scaling non-linearly with the number of rows. MyISAM tables are very fast to import regardless of number of rows, but InnoDB seems to slow down as the number of rows increase. I'm surprised loading with LOAD DATA INFILE fixes that. Just a quick stab in the dark as I do know almost nothing about osmosis and MySQL: If you create the indices before loading data, they will be constantly rebuilt as you add each row = would explain the exponential slow-down. The PostgreSQL manual has a section about optimizing database imports giving some useful tips tricks; I'd guess the same is true for the MySQL one. If osmosis already defers index creation (including referential constraints which usually build indices implicitly), just ignore this mail. :) It does some index disabling prior to import using the DISABLE KEYS statements. But that doesn't disable all indices (and I can't remember if it works on InnoDB tables ...). So this would appear to explain the exponential slow down. I seem to remember manually dropping indexes completely and still having trouble though, but I have no numbers to back this up. If somebody wishes to run some benchmarks with and without indexes for several different dataset sizes I'd be extremely grateful. Writing code to do this stuff is the easy bit, it's testing and benchmarking that takes the time. Better yet if somebody can send me patches for osmosis I'll be ecstatic. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Faster loading with scabies
Joachim Zobel wrote: Am Donnerstag, den 16.10.2008, 09:21 +1100 schrieb Brett Henderson: Do you have any numbers to show what sort of performance increase can be expected. I can load europe.osm into the current_* tables in less than 12 hours (on my laptop - overpowered CPU, 5400 disk). Do you have an equivalent time for osmosis to do the same import? If it can provide a measurable performance improvements then I should definitely look into it. I just don't want to spend a lot of time writing something to find it only saves 5% of time on import ... If it's significant I should add similar functionality into osmosis. I did hope you would say that. I'm happy to add it to the TODO list (ie. TRAC). I probably won't get to it for a while though. Of course if somebody sends me a patch it will get done sooner :-) The biggest problem I found wasn't the actual processing of INSERT statements, it was MySQL scaling non-linearly with the number of rows. I missed that one. I loaded europe osm with osmosis and found it too slow. I then did all the speed tests with europe.osm. Do you have any numbers concerning the scalability issue? No I don't actually. I did have a text file with all the timings I'd done after making various tweaks but I think I've deleted it. My current code just runs a DISABLE KEYS statement on each of the tables prior to import. I believe that leaves some indexes still intact which would explain a non-linear import time. But I think I tried deleting all indexes and it didn't make much difference. To be honest I can't remember exactly what happened now, I think I'd burnt a lot of time on it and started to lose interest ;-) MyISAM tables are very fast to import regardless of number of rows, but InnoDB seems to slow down as the number of rows increase. I'm surprised loading with LOAD DATA INFILE fixes that. LDI is probably the fastes way to get data into mysql. Yep, agree. I'd love to see some hard numbers for comparison though if possible. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Faster loading with scabies
Am Donnerstag, den 16.10.2008, 09:21 +1100 schrieb Brett Henderson: Do you have any numbers to show what sort of performance increase can be expected. I can load europe.osm into the current_* tables in less than 12 hours (on my laptop - overpowered CPU, 5400 disk). If it's significant I should add similar functionality into osmosis. I did hope you would say that. The biggest problem I found wasn't the actual processing of INSERT statements, it was MySQL scaling non-linearly with the number of rows. I missed that one. I loaded europe osm with osmosis and found it too slow. I then did all the speed tests with europe.osm. Do you have any numbers concerning the scalability issue? MyISAM tables are very fast to import regardless of number of rows, but InnoDB seems to slow down as the number of rows increase. I'm surprised loading with LOAD DATA INFILE fixes that. LDI is probably the fastes way to get data into mysql. Sincerely, Joachim ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Faster loading with scabies
On Thu, Oct 16, 2008 at 09:21:54AM +1100, Brett Henderson wrote: The biggest problem I found wasn't the actual processing of INSERT statements, it was MySQL scaling non-linearly with the number of rows. MyISAM tables are very fast to import regardless of number of rows, but InnoDB seems to slow down as the number of rows increase. I'm surprised loading with LOAD DATA INFILE fixes that. Just a quick stab in the dark as I do know almost nothing about osmosis and MySQL: If you create the indices before loading data, they will be constantly rebuilt as you add each row = would explain the exponential slow-down. The PostgreSQL manual has a section about optimizing database imports giving some useful tips tricks; I'd guess the same is true for the MySQL one. If osmosis already defers index creation (including referential constraints which usually build indices implicitly), just ignore this mail. :) CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Faster loading with scabies
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Joachim Zobel schreef: Hi. I have written a utility named scabies to load data from osm files into a mysql database. See http://www.heute-morgen.de/scabies/ Usage is rather straightforward, but you need lots of disk space. scabies_parse.pl parses the osm file and creates files that are then loaded by scabies_load.sql with LOAD DATA INFILE. The current version only loads the current_* tables, but its easy to extend. It is early beta. The motivation for this was that osmosis is too slow with the InnoDb tables. Checking what osmosis does I found no way to incrementally improve it. Loading data can probably not be done much better as long as INSERTs are used. http://repo.or.cz/w/handlerosm.git?a=blob;f=osmparser.c;h=5357c2b394a75ab7390be0ce1d231123a6a3e5b7;hb=HEAD If someone things it needs to be even faster. Stefan -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREKAAYFAkj2UQQACgkQYH1+F2Rqwn0pqQCfcH2cswRHlCnbSvCA6Bc59qpF HXgAn3oE2CxH6eU3dciEP5N/JReC94qs =dKm4 -END PGP SIGNATURE- ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Faster loading with scabies
Hi Joachim, Nice work, any info on how to improve load speed is much appreciated. On Thu, Oct 16, 2008 at 6:15 AM, Joachim Zobel [EMAIL PROTECTED]wrote: Hi. I have written a utility named scabies to load data from osm files into a mysql database. See http://www.heute-morgen.de/scabies/ Usage is rather straightforward, but you need lots of disk space. scabies_parse.pl parses the osm file and creates files that are then loaded by scabies_load.sql with LOAD DATA INFILE. The current version only loads the current_* tables, but its easy to extend. It is early beta. The motivation for this was that osmosis is too slow with the InnoDb tables. Checking what osmosis does I found no way to incrementally improve it. Loading data can probably not be done much better as long as INSERTs are used. Do you have any numbers to show what sort of performance increase can be expected. If it's significant I should add similar functionality into osmosis. I already have something similar for the PostgreSQL schema supported by osmosis, it wouldn't be difficult to create a similar thing for MySQL. The biggest problem I found wasn't the actual processing of INSERT statements, it was MySQL scaling non-linearly with the number of rows. MyISAM tables are very fast to import regardless of number of rows, but InnoDB seems to slow down as the number of rows increase. I'm surprised loading with LOAD DATA INFILE fixes that. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev