Well, if you're not starting your timer until after you initialize your array, the VM manager shouldn't be causing you to get slow results with larger number of threads. One thing to remember, though, is that just doing a new or malloc() or similar really only gives you a pointer to virtual memory reserved for your process. It doesn't give your process actual physical memory until it tries to modify its contents. That's true on most any modern OS. (You'd be surprised how much that can get in your way - compare reading a gigabyte file into memory from a 4-gbps fiber-channel SAN, once with the memory memset() before reading, and once straight into a new buffer off the heap without doing a memset() before doing the read.)
There's also the possibility that with a large number of threads you're overwhelming the virtual memory manager as it tries to create memory pages for each of your threads. Depending on how each thread goes about processing, they may modify their stack memory until they actually start processing data. If all your threads begin processing at about the same time and they all suddenly need access to their stack memory, the virtual memory manager is going to be overwhelmed. You can eliminate this problem by explicitly creating the stack memory for each thread yourself, and doing a memset() on the block before starting the respective thread. (And memset() is going to be a lot faster than any loop you write yourself.) If not that, you could also be seeing a constant processing time for smaller number of threads because your process is limited by the amount of memory you're going through, which if I read your description correctly is a constant. If my guess is true, doubling your RAM from 256MB to 512MB should cause the flat part of your performance plot to double from about 10 seconds to about 20 seconds, and then you should see your times increasing once you get above 4k threads or so. Above that number of threads, you're handling smaller and smaller chunks of data, and I suspect your message processing might be more limiting than the amount of data you're having all your threads process. And yes, those hypotheses are based on the thread scheduler not being your problem. What does running 16K threads get you on a 2-cpu host? Resources on any computer are limited, and oversubscribing any one of them will slow you down. This message posted from opensolaris.org _______________________________________________ opensolaris-code mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/opensolaris-code
