Re: [osmosis-dev] Merge huge count of files

Igor Podolskiy Fri, 30 Sep 2011 10:21:43 -0700

Hi Rüdiger,

On 29.09.2011 15:28, Gubler, Ruediger wrote:

I have to merge a huge count of files. Doing this in one osmosis call
creates thousands of threads and stops the rest of the system working well.
Is it possible and efficient to split the giant merge into smaller pieces?
What is the best strategy to merge a huge count (e.g. 100x100 matrix)
together with a minimum of needed memory?
Must the whole dataset fit in the memory?

Memory isn't the problem with merges, the only thing worth mentioningthat merge stores in memory are the buffers. Those are either very small(20 entities in 0.39 release) or can be set to a more appropriate valueon the command line (in current trunk, or HEAD now that it's in git ;)).Other than that, --merge just looks at the next entities on the inputstream and chooses one of them to pass through downstream.

I think the limit you're hitting are the threads - thousands of threadsisn't healthy for a Java process (or any process for that matter, ifwe're talking "real", heavyweight threads).

What merge strategy you choose shouldn't matter very much - just don'tmerge too much files in the same pass. Every reader needs a thread andevery merge needs a thread. So if you merge 8 files at a time you have 8readers and 3 merges with 11 threads which should be fine on a 4 coreCPU. You would need a whole lot of passes with 10000 files, though...

Also, I would really recommend that you use the current HEAD (you cangrab the newest build from the build server [1] if you don't want tocompile yourself) since the default input buffer size is way to smallfor current hardware. If your buffer sizes is set to 20, you spend a_lot_ of time switching between the reader threads and the merge thread.

Just another thought: if your XML files are guaranteed to be fullydisjunct (no entity ever occurs in two different files, non-overlappingbounding boxes is generally _not_ enough), you could more or less justconcatenate them (modulo XML header and such) and then sort them. Thisshould be equivalent to a merge. This would be very simple (and veryfast) to do with any SAX parser/writer in whatever language, and for thesorting you can use Osmosis with --sort. But, again, your filesabsolutely need to be disjunct or else bad things will happen.


Hope that helps
Igor

_______________________________________________
osmosis-dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/osmosis-dev

Re: [osmosis-dev] Merge huge count of files

Reply via email to