Hi all, it seems like thread_reaper can't keep up with massive creation of threads. I'm currently analyzing a problem with our SAN-virtualization. IO stucks for about 40 secs whenever segkp-cache gets filled up. I tried to increase sekgpsize from default 2gb to 8gb but that did not help. I just have to do more IO or even do IO a little longer and the hang occurs as well. Yes, there's a big problem with the virtualization agent running on our machines. My test machine is a T5440 with 4x 1.4GHz and 128gb RAM. For every single IO (=interrupt), the agent creates a thread to map the blocks from virtual to physical. These threads are very short-living as they exit immediately after handling one IO. I know this should be accomplished with worker threads or threads handling a bulk of IOs/interrupts...
So if I run a filebench with 8kb unbuffered writes on one of these virtualized volumes, the test creates 1669114 ops in 6 minutes (~4600 IO/s with a max of 15,000 IO/s). segkp is 8gb. Usin dtrace, I can see the amount of threads reaped by the thread_reaper and created by the agent in one second and in total. The values are quite equal but indeed, they are not. As the filebench stops I don't see any more new new threads from the agent, but thread_reaper is still running for about 30 secs, reaping ~2500 threads per second. If I stop the tracing directly after filebench has finished, I see this result: --- TOTALS --- threads reaped 1506682 smv threads created 1669150 As I said, filebench did 1669114 ops in total and we can see them here again (1669150). But there's a difference between reaped and created threads of 162468 !! As we are on a sparc machine with 8k pagesize and 3x pagesize per thread, there are 3.7gb of freeable space. I expected the reaper keeps up with thread_creation, but I was wrong. If the test is doing more IO (= more threads) and the runtime is increased, the sekpg fills up completely and the segkp gets locked up to free threads. This is causing an IO hang for about 40 secs. In my point of view, the algorithm of the thread_reaper is not optimal. It must be possible to create this amount of threads in a very short time without getting problems with filled caches. Is there anything else I can do to "tune" up the thread_reaper? Any comment regarding this problem is welcome. I am willing to test your suggestions or to provide some more detailed information - just let me know. Thank you all for your help! Best regards, Thomas -- This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org