Hi All, I've been playing with the PBF reader implementation in Osmosis to see if I can improve its performance.
The nice thing about the PBF format is that the data stream is broken into coarse chunks that can be processed using multiple threads without thread synchronisation being a major overhead. I've just checked in a new --read-pbf-fast implementation which does just that. It is a complete rewrite of the existing PBF implementation (it was easier to re-write than to retrofit threading into the current implementation). From an end user perspective, it is similar to the existing --read-pbf task but has an additional argument called "workers" which defines the number of worker threads to use for processing. It defaults to 1, but increasing it to match your number of cores gives a significant performance boost. On my quad-core system (no hyper-threading) I get a 2-3 times performance increase when just reading the file and discarding the contents. Real-world usage with a longer pipeline will be less dramatic. My command line in testing looks like this: osmosis --read-pbf-fast myfile.pbf workers=4 --b bufferCapacity=10000 --wn A large buffer is very important. The task implementation uses the master thread to split the input stream *and* send results to the sink. Using a buffer is essential if you have any downstream tasks connected. It's only available in Git for now, and is documented on the development version of detailed documentation. http://wiki.openstreetmap.org/wiki/Osmosis/Detailed_Usage_0.41#--read-pbf-fast_.28--rbf.29 Brett
_______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
