Re: [osmosis-dev] Slow to import into a postgres api schema

Brett Henderson Mon, 22 Feb 2010 04:44:20 -0800

On Mon, Feb 22, 2010 at 12:31 AM, Kai Krueger <[email protected]> wrote:


> On 02/21/2010 11:43 AM, Brett Henderson wrote:
>
>>
>>
> Having a --write-api0.6-dump command might not be a bad idea. Although the
> --wd task is simpler from a users point of view if it can be made to be
> equally fast.


Sorry, I wasn't clear there.  I wasn't suggesting we create a
--write-apidb-dump task.  I was just suggesting that there may be some
re-usable code in the --write-pgsql-dump task because it already formats a
number of PostgreSQL data types into the COPY format.  Although I haven't
looked at the new JDBC COPY support so I'm not sure if that even makes
sense.


>
>
>
>>
>>    As a proof of concept, I added statements into disableIndexes to
>>    manually drop each index and then recreate them in enableIndexes.
>>    Together with using the Copy command (supported in the postgres 8.4
>> JDBC
>>    driver), my initial experiments show a speedup of 3 - 4 times on the
>>    initial population of the tables (i.e. without populating the current
>>    tables, but I suspect that this step can be similarly sped up). These
>>    numbers were obtained using small country extracts (e.g. 1 - 20 Mb in
>>    bz2 size), but I would guess that they hold up with the full planet
>>    imports too.
>>
>>    The main benefit comes from disabling the indecies, and the copy
>> command
>>    seems less important.
>>
>>
>>    The patch I have is quite ugly (and untested for correctness), as it
>>    breaks the levels of abstraction and has to hard code all the available
>>    indecies. So my question is, what would be the best way to do this in a
>>    clean way? Looking at the speedups obtained and the time involved in
>>    imports, it seems like it might be worth it.
>>
>>
>> If it's truly 3-4 times faster then it's worth a lot of effort.  I don't
>> have a lot of time to get involved in this myself though, so if you have
>> some time to write a maintainable patch, then I'd be very grateful.  My
>> only ask is that you stick around to get it working and provide support
>> until it is proven stable.
>>
>
> I have very little time myself at the moment. But perhaps I will be able to
> come up with something. Perhaps not too soon though. At least not a
> maintainable patch that integrates nicely with moduler structure of osmosis.


Okay.  See how you go, and post whatever you come up with back to the list.


>
>     * I'd prefer to make this a new task within Osmosis.  The current
>>
>>      one is called --write-apidb I think.  You could create a new one
>>      called something like --write-apidb-fast.  If you can get it
>>      stable and it works well, then we can point --write-apidb at your
>>      task and delete the current one.
>>
>
> Oh, that would mean I would actually have to understand this wonderful
> extensible structure of osmosis rather than just hack some random bits and
> pieces into it... ;-)


In case you don't know already, creating a new task is a matter of modifying
TaskRegistrar to register your task factory, creating a new task factory
(use those called from TaskRegistrar as an example), then create a new Task
implementing the Sink interface (use the existing DB tasks as an example).
How much of the rest of Osmosis you choose to use is largely up to you,
although more re-use is definitely a good thing from a maintenance
perspective.

Cheers,
Brett

_______________________________________________
osmosis-dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/osmosis-dev

Re: [osmosis-dev] Slow to import into a postgres api schema

Reply via email to