On Fri, Feb 18, 2011 at 10:20 PM, Frederik Ramm <frede...@remote.org> wrote:

> Igor,
>
>
> On 02/18/11 11:40, Igor Podolskiy wrote:
>
>> just a random thought: what's wrong with using --dataset-bounding-box?
>> Importing the planet file into a database and doing a bunch of queries
>> against is equivalent to creating a single disk buffer for all bb tasks
>> (the database _is_ the disk buffer, if you want). This still isn't very
>> elegant as it requires two passes (import into DB and export to PBFs)
>> but is IMHO more elegant than selection lists.
>>
>
> I currently use bounding polygons so I'd have to add a
> --dataset-bounding-polygon task for that but that should be possible.
>
> The major problem is finding a database that would be able to import the
> data, index it, and fulfil my requests, all within less than 24 hours. From
> what I hear, PostGIS takes a week for the import step alone but I want to
> produce fresh extracts every day. So I would have to import into PostGIS and
> then apply the daily diffs, but I could only do that if applying a daily
> diff and making all the bounding-polygon requests would take much less than
> a day and somehow I suspect that's not going to work.
>
>
>  Don't get me wrong, maybe there's a reason why dataset tasks are
>> unsuitable - but I've not found any... Maybe the dbb task itself needs
>> improvement - but in this case, this should be the way to go.
>>
>
> I doubt that a performant solution can ever be found with PostGIS but I'd
> love to hear from someone who can prove me wrong.
>

Processing a full day's worth of daily diffs is doable in much less than a
day.  The one time I did this on a server with only average IO performance
(data spread across two disks) it would take about 6 hours to process a day
of data.  Extracting continent sized chunks from the database will probably
be a problem though.  The schema works well for 1x1 or 2x2 degree bboxes,
but gets rapidly slower as the size of returned data increases.

Given that each of your extracts is extracting a large percentage of the
overall planet, then doing multiple passes across the entire planet will be
tough to beat performance wise.

In other words, a database is best (and the only way possible) for small
up-to-date extracts, but sequential reads are still best for large extracts
where multi-hour time lags are acceptable.  I suspect that lots of
optimisation is possible for databases, because for one thing the PostGIS
data types take up humungous amounts of disk space.  Perhaps someone with
lots of time and some custom data types could improve on this ...

Brett
_______________________________________________
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev

Reply via email to