Re: [OSM-dev] Faster loading with scabies

2008-10-20 Thread Noel Grandin

Most modern databases have some kind of block-load API for efficiently
importing large chunks of data into their tables.
It's really only the most simple stuff that __doesn't__ slow down when
importing using INSERT.
Although it should plateau to a constant time/INSERT pretty quickly.

Brett Henderson wrote:
 Joac

 MyISAM tables are very fast to import regardless of number of rows,
 but InnoDB seems to slow down as the number of rows increase.  I'm
 surprised loading with LOAD DATA INFILE fixes that.
 
   
 LDI is probably the fastes way to get data into mysql. 
 


Disclaimer: http://www.peralex.com/disclaimer.html



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Faster loading with scabies

2008-10-17 Thread Brett Henderson
Sascha Silbe wrote:
 On Thu, Oct 16, 2008 at 09:21:54AM +1100, Brett Henderson wrote:

 The biggest problem I found wasn't the actual processing of INSERT
 statements, it was MySQL scaling non-linearly with the number of rows.
 MyISAM tables are very fast to import regardless of number of rows, but
 InnoDB seems to slow down as the number of rows increase.  I'm surprised
 loading with LOAD DATA INFILE fixes that.
 Just a quick stab in the dark as I do know almost nothing about 
 osmosis and MySQL:
 If you create the indices before loading data, they will be constantly 
 rebuilt as you add each row = would explain the exponential 
 slow-down. The PostgreSQL manual has a section about optimizing 
 database imports giving some useful tips  tricks; I'd guess the same 
 is true for the MySQL one.

 If osmosis already defers index creation (including referential 
 constraints which usually build indices implicitly), just ignore this 
 mail. :)
It does some index disabling prior to import using the DISABLE KEYS 
statements.  But that doesn't disable all indices (and I can't remember 
if it works on InnoDB tables ...).  So this would appear to explain the 
exponential slow down.  I seem to remember manually dropping indexes 
completely and still having trouble though, but I have no numbers to 
back this up.  If somebody wishes to run some benchmarks with and 
without indexes for several different dataset sizes I'd be extremely 
grateful.  Writing code to do this stuff is the easy bit, it's testing 
and benchmarking that takes the time.  Better yet if somebody can send 
me patches for osmosis I'll be ecstatic.

Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Faster loading with scabies

2008-10-17 Thread Brett Henderson
Joachim Zobel wrote:
 Am Donnerstag, den 16.10.2008, 09:21 +1100 schrieb Brett Henderson:
   
 Do you have any numbers to show what sort of performance increase can
 be expected.  
 

 I can load europe.osm into the current_* tables in less than 12 hours
 (on my laptop - overpowered CPU, 5400 disk). 
   
Do you have an equivalent time for osmosis to do the same import?  If it 
can provide a measurable performance improvements then I should 
definitely look into it.  I just don't want to spend a lot of time 
writing something to find it only saves 5% of time on import ...
   
 If it's significant I should add similar functionality into osmosis.
 

 I did hope you would say that.
   
I'm happy to add it to the TODO list (ie. TRAC).  I probably won't get 
to it for a while though.  Of course if somebody sends me a patch it 
will get done sooner :-)
   
 The biggest problem I found wasn't the actual processing of INSERT
 statements, it was MySQL scaling non-linearly with the number of
 rows.  
 

 I missed that one. I loaded europe osm with osmosis and found it too
 slow. I then did all the speed tests with europe.osm.

 Do you have any numbers concerning the scalability issue?
   
No I don't actually.  I did have a text file with all the timings I'd 
done after making various tweaks but I think I've deleted it.  My 
current code just runs a DISABLE KEYS statement on each of the tables 
prior to import.  I believe that leaves some indexes still intact which 
would explain a non-linear import time.  But I think I tried deleting 
all indexes and it didn't make much difference.  To be honest I can't 
remember exactly what happened now, I think I'd burnt a lot of time on 
it and started to lose interest ;-)
   
 MyISAM tables are very fast to import regardless of number of rows,
 but InnoDB seems to slow down as the number of rows increase.  I'm
 surprised loading with LOAD DATA INFILE fixes that.
 

 LDI is probably the fastes way to get data into mysql. 
   
Yep, agree.  I'd love to see some hard numbers for comparison though if 
possible.

Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Faster loading with scabies

2008-10-16 Thread Joachim Zobel
Am Donnerstag, den 16.10.2008, 09:21 +1100 schrieb Brett Henderson:
 Do you have any numbers to show what sort of performance increase can
 be expected.  

I can load europe.osm into the current_* tables in less than 12 hours
(on my laptop - overpowered CPU, 5400 disk). 

 If it's significant I should add similar functionality into osmosis.

I did hope you would say that.

 The biggest problem I found wasn't the actual processing of INSERT
 statements, it was MySQL scaling non-linearly with the number of
 rows.  

I missed that one. I loaded europe osm with osmosis and found it too
slow. I then did all the speed tests with europe.osm.

Do you have any numbers concerning the scalability issue?

 MyISAM tables are very fast to import regardless of number of rows,
 but InnoDB seems to slow down as the number of rows increase.  I'm
 surprised loading with LOAD DATA INFILE fixes that.

LDI is probably the fastes way to get data into mysql. 

Sincerely,
Joachim



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Faster loading with scabies

2008-10-16 Thread Sascha Silbe

On Thu, Oct 16, 2008 at 09:21:54AM +1100, Brett Henderson wrote:


The biggest problem I found wasn't the actual processing of INSERT
statements, it was MySQL scaling non-linearly with the number of rows.
MyISAM tables are very fast to import regardless of number of rows, 
but
InnoDB seems to slow down as the number of rows increase.  I'm 
surprised

loading with LOAD DATA INFILE fixes that.
Just a quick stab in the dark as I do know almost nothing about osmosis 
and MySQL:
If you create the indices before loading data, they will be constantly 
rebuilt as you add each row = would explain the exponential slow-down. 
The PostgreSQL manual has a section about optimizing database imports 
giving some useful tips  tricks; I'd guess the same is true for the 
MySQL one.


If osmosis already defers index creation (including referential 
constraints which usually build indices implicitly), just ignore this 
mail. :)


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Faster loading with scabies

2008-10-15 Thread Stefan de Konink
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Joachim Zobel schreef:
 Hi.
 
 I have written a utility named scabies to load data from osm files into
 a mysql database. See
 
 http://www.heute-morgen.de/scabies/
 
 Usage is rather straightforward, but you need lots of disk space.
 scabies_parse.pl parses the osm file and creates files that are then
 loaded by scabies_load.sql with LOAD DATA INFILE. The current version
 only loads the current_* tables, but its easy to extend. It is early
 beta.
 
 The motivation for this was that osmosis is too slow with the InnoDb
 tables. Checking what osmosis does I found no way to incrementally
 improve it. Loading data can probably not be done much better as long as
 INSERTs are used. 


http://repo.or.cz/w/handlerosm.git?a=blob;f=osmparser.c;h=5357c2b394a75ab7390be0ce1d231123a6a3e5b7;hb=HEAD


If someone things it needs to be even faster.


Stefan
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEAREKAAYFAkj2UQQACgkQYH1+F2Rqwn0pqQCfcH2cswRHlCnbSvCA6Bc59qpF
HXgAn3oE2CxH6eU3dciEP5N/JReC94qs
=dKm4
-END PGP SIGNATURE-

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Faster loading with scabies

2008-10-15 Thread Brett Henderson
Hi Joachim,

Nice work, any info on how to improve load speed is much appreciated.

On Thu, Oct 16, 2008 at 6:15 AM, Joachim Zobel [EMAIL PROTECTED]wrote:

 Hi.

 I have written a utility named scabies to load data from osm files into
 a mysql database. See

 http://www.heute-morgen.de/scabies/

 Usage is rather straightforward, but you need lots of disk space.
 scabies_parse.pl parses the osm file and creates files that are then
 loaded by scabies_load.sql with LOAD DATA INFILE. The current version
 only loads the current_* tables, but its easy to extend. It is early
 beta.

 The motivation for this was that osmosis is too slow with the InnoDb
 tables. Checking what osmosis does I found no way to incrementally
 improve it. Loading data can probably not be done much better as long as
 INSERTs are used.


Do you have any numbers to show what sort of performance increase can be
expected.  If it's significant I should add similar functionality into
osmosis.  I already have something similar for the PostgreSQL schema
supported by osmosis, it wouldn't be difficult to create a similar thing for
MySQL.

The biggest problem I found wasn't the actual processing of INSERT
statements, it was MySQL scaling non-linearly with the number of rows.
MyISAM tables are very fast to import regardless of number of rows, but
InnoDB seems to slow down as the number of rows increase.  I'm surprised
loading with LOAD DATA INFILE fixes that.

Brett
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev