adrian cockcroft writes: > I think this is a good forum to discuss a systematic performance issue with > swap. The problem has been there for a long time, I tried to get people > interested in doing something about it around ten years ago, I left Sun in > 2004 and don't even use Sun's at my current job, but it would be nice if > someone figured out how to do a comprehensive redesign of swapfs that solves > the performance, observability and management problems that were designed in > around 1990. Since there is an easy workaround (just add RAM) it was not > considered a high priority issue when I complained.
What can we propose then ? Assuming the application text is preserved in memory. For swap-in of application data, what I see happening is. an application become runnable. an instruction will load some data into register. the corresponding page had been swapped out before so we have a pagefault. We wait a disk rotation, fill the memory page and resume execution. This has to execute at 1 event per disk latency (a vague number, track buffer can help here). It seems that the only way to improve on this would be to have 'something' that reduces the number of faults; So that seems to imply, that on one pagefault, we have to issue an I/O that will be bigger than 8K which will also bring in anon data that we expect will be pagefaulted-in next. Moreover we have to determine at swap-out time, what the swap-in sequence is likely to be. Clearly for sparsely used anonymous segments there is not much hope. But for densily used segments (which is: if I use a page I'm likely to be using adjacent pages in the near future) then we can see a ray of hope : Instead of pageout stuff one page at a time, we could decide to group a set of 64K or 128K together. The scanner today is setup to detect pages that are not recently used (through the 2 handclock mechanism). At some point it holds one page that it's wants to swapout. So the new scheme would say in the interest of future swap-in performance I will decide now that adjacent pages from the same segment to the one I am holding on to are also candidate to swap-out despite them not yet being swepts by the scanner... So instead of swapout individual pages, I will swapout a kluster of pages from the same segment (as soon as one page in the kluster is scanned as 'not recently used'). In the best case the gains will be proportional to the kluster size (8X or more). In the bad case, we'll swapout pages that are live. Maybe we could swap-out klusters like this but only release the individual pages to the freelist when the scanner decides so. In this scheme, are there easy ways to decide what is a densily used anonymous segment. Should we consider all segment as such or try to be more clever. Does linux manage to issue big I/Os on pagefault induced swapins ? Do we need to do something special for tmpfs files ? -r > My opinion of Solaris swap - "If you think you understand how it works, you > weren't looking closely enough". See this > http://www.itworld.com/Comp/2377/UIR980701perf/ > > Adrian > I think this is a good forum to discuss a systematic performance issue with > swap. The problem has been there for a long time, I tried to get people > interested in doing something about it around ten years ago, I left Sun in > 2004 and don't even use Sun's at my current job, but it would be > nice if someone figured out how to do a comprehensive redesign of swapfs > that solves the performance, observability and management problems that were > designed in around 1990. Since there is an easy workaround (just add RAM) it > was not considered a high priority issue when I complained. > <div><br class="webkit-block-placeholder"></div><div>My opinion of Solaris > swap - "If you think you understand how it works, you weren't > looking closely enough". See this <a > href="http://www.itworld.com/Comp/2377/UIR980701perf/"> > http://www.itworld.com/Comp/2377/UIR980701perf/</a></div><div><br > class="webkit-block-placeholder"></div><div>Adrian</div><div><br > class="webkit-block-placeholder"></div> _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org