Hello, importing large quantities of data (e.g. a country extract or the whole planet) into the postgres api schema is quite slow. The import of a planet on the dev server has been running nearly a week.
I have tried to see if performance can be improved and the two main things I noticed were that the postgres DatabaseContext doesn't support disabling the indices, and doesn't use the Copy command that is supposed to be faster for bulk imports. As a proof of concept, I added statements into disableIndexes to manually drop each index and then recreate them in enableIndexes. Together with using the Copy command (supported in the postgres 8.4 JDBC driver), my initial experiments show a speedup of 3 - 4 times on the initial population of the tables (i.e. without populating the current tables, but I suspect that this step can be similarly sped up). These numbers were obtained using small country extracts (e.g. 1 - 20 Mb in bz2 size), but I would guess that they hold up with the full planet imports too. The main benefit comes from disabling the indecies, and the copy command seems less important. The patch I have is quite ugly (and untested for correctness), as it breaks the levels of abstraction and has to hard code all the available indecies. So my question is, what would be the best way to do this in a clean way? Looking at the speedups obtained and the time involved in imports, it seems like it might be worth it. Kai P.S. would there be any objections to a patch to spit out some progress information in the XmlReader class? _______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
