Hi RĂ¼diger,

On 29.09.2011 15:28, Gubler, Ruediger wrote:
I have to merge a huge count of files. Doing this in one osmosis call
creates thousands of threads and stops the rest of the system working well.
Is it possible and efficient to split the giant merge into smaller pieces?
What is the best strategy to merge a huge count (e.g. 100x100 matrix)
together with a minimum of needed memory?
Must the whole dataset fit in the memory?
Memory isn't the problem with merges, the only thing worth mentioning that merge stores in memory are the buffers. Those are either very small (20 entities in 0.39 release) or can be set to a more appropriate value on the command line (in current trunk, or HEAD now that it's in git ;)). Other than that, --merge just looks at the next entities on the input stream and chooses one of them to pass through downstream.

I think the limit you're hitting are the threads - thousands of threads isn't healthy for a Java process (or any process for that matter, if we're talking "real", heavyweight threads).

What merge strategy you choose shouldn't matter very much - just don't merge too much files in the same pass. Every reader needs a thread and every merge needs a thread. So if you merge 8 files at a time you have 8 readers and 3 merges with 11 threads which should be fine on a 4 core CPU. You would need a whole lot of passes with 10000 files, though...

Also, I would really recommend that you use the current HEAD (you can grab the newest build from the build server [1] if you don't want to compile yourself) since the default input buffer size is way to small for current hardware. If your buffer sizes is set to 20, you spend a _lot_ of time switching between the reader threads and the merge thread.

Just another thought: if your XML files are guaranteed to be fully disjunct (no entity ever occurs in two different files, non-overlapping bounding boxes is generally _not_ enough), you could more or less just concatenate them (modulo XML header and such) and then sort them. This should be equivalent to a merge. This would be very simple (and very fast) to do with any SAX parser/writer in whatever language, and for the sorting you can use Osmosis with --sort. But, again, your files absolutely need to be disjunct or else bad things will happen.

Hope that helps
Igor

_______________________________________________
osmosis-dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/osmosis-dev

Reply via email to