Hello,

importing large quantities of data (e.g. a country extract or the whole 
planet) into the postgres api schema is quite slow. The import of a 
planet on the dev server has been running nearly a week.

I have tried to see if performance can be improved and the two main 
things I noticed were that the postgres DatabaseContext doesn't support 
disabling the indices, and doesn't use the Copy command  that is 
supposed to be faster for bulk imports.

As a proof of concept, I added statements into disableIndexes to 
manually drop each index and then recreate them in enableIndexes. 
Together with using the Copy command (supported in the postgres 8.4 JDBC 
driver), my initial experiments show a speedup of 3 - 4 times on the 
initial population of the tables (i.e. without populating the current 
tables, but I suspect that this step can be similarly sped up). These 
numbers were obtained using small country extracts (e.g. 1 - 20 Mb in 
bz2 size), but I would guess that they hold up with the full planet 
imports too.

The main benefit comes from disabling the indecies, and the copy command 
seems less important.

The patch I have is quite ugly (and untested for correctness), as it 
breaks the levels of abstraction and has to hard code all the available 
indecies. So my question is, what would be the best way to do this in a 
clean way? Looking at the speedups obtained and the time involved in 
imports, it seems like it might be worth it.

Kai

P.S. would there be any objections to a patch to spit out some progress 
information in the XmlReader class?

_______________________________________________
osmosis-dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/osmosis-dev

Reply via email to