[osmosis-dev] Improving completeWays/completeRelations performance

Frederik Ramm Fri, 18 Feb 2011 00:58:13 -0800

Hi,

in the long run I'd like to change the Geofabrik extracts so thatthey have the completeWays/completeRelations feature enabled. It's apain because that totally breaks the elegant and well-performingstreaming mode in Osmosis but it would really make the extracts moreusable, and more in line with what people get from the API.

My biggest concern is the disk space used for temporary storage. If Iread things correctly, a temporary storage of the input stream iscreated for each --bb or --bp task. So if you do something like


osmosis --rb planet --tee 5
  --bb ... --wb europe
  --bb ... --wb asia
  --bb ... --wb america
  --bb ... --wb australia
  --bb ... --wb africa

then you will temporarily have 5 copies of the planet file lying around.So while, if there was only one copy of it, I could still hope to makeuse of linux file system buffers and a lot of RAM to soften the negativeimpact of file storage, that will kill performance for sure.

I wonder if there is a way to at least reduce this to *one* temporarystorage. The easiest thing I could imagine would be a new "multi-bb" (or"multi-bp") task that basically combines the tee and bb. That would beless elegant and would probably also be less efficient because it wouldnot use multiple threads, but it could easily use one shared temporarystorage.

But I've been thinking: With the the high performance of PBF reading, atwo-pass operation should become possible. Simply read the input filetwice, determining which objects to copy in pass 1, and actually copyingthem in pass 2. I'm just not sure how that could be made to fit inOsmosis. One way could be creating a special type of file, a "selectionlist", from a given entity stream. A new task "--write-seelction-list"would dump the IDs of all nodes, ways, and relations that were eitherpresent or referenced in the entity stream:


osmosis --rb planet --tee 5
  --bb ... --write-selection-list europe.sel
  --bb ... --write-selection-list asia.sel
  --bb ... --write-selection-list america.sel
  --bb ... --write-selection-list australia.sel
  --bb ... --write-selection-list africa.sel

Then, in a second pass, one would use a new task"--apply-selection-list" to actually filter the objects:


osmosis --rb planet --tee 5
  --apply-selection-list europe.sel --wb europe
  --apply-selection-list asia.sel --wb asia
  ...

The selection lists would be quite big, and would for efficiency have tobe fully kept in memory, so the above jobs could probably eat 20 GB ofRAM easily (1.5 billion objects, IDs have 64 bit, hash table overhead).Also, what I have sketched above would be able to give you


* all nodes in the bounding box
* all ways using any of these nodes
* all nodes used by any of these ways even if outside
* all relations using any of these nodes or ways
o all nodes and ways used by any of these relations even if outside
o but NOT all nodes used by a way drawn in through a relation.

(The points marked "*" are what the API does; the API does not do the"o" marked points even though users could be interested in them.)


Does anybody have any thoughts about this; maybe a different approach still?

Bye
Frederik

_______________________________________________
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev

[osmosis-dev] Improving completeWays/completeRelations performance

Reply via email to