What is confusing me in this case is that the speeds seem to be unreasonably slow. I would expect, for instance, that the speed of swap access would scale with the speed of the disk, cpu, etc.
In this case, the speed of swap looks like it has stayed about what it was on an ultra1 while everything else has shot ahead, leading me to feel like there's a specific limit that may be there that could be easiliy removed or tuned to not make this as visible. Thanks, -Peter On Wed, Aug 15, 2007 at 03:49:33PM -0700, adrian cockcroft wrote: > I've seen this before and its expected behavior, this is my explanation: > > Pages are written using large sequential writes to swap in physical memory > scanner order, so they are guaranteed to be randomly jumbled on disk. This > means that swap is as fast as possible for writes and as slow as possible > for reads, which will be random seeks one page at a time. Its been this way > forever. Anything that swaps/pages out will be horribly slow on the way back > in. > Add enough RAM to never swap, or possibly mount a real disk or a solid state > disk for /tmp > > Adrian > > On 8/15/07, Peter C. Norton <[EMAIL PROTECTED]> wrote: > > > > On Wed, Aug 15, 2007 at 04:29:54PM -0400, Jim Mauro wrote: > > > > > > What would be interesting here is the paging statistics during your test > > > case. > > > What does "vmstat 1", and "vmstat -p 1" look like while your generating > > > this behavior? > > > > > > Is it really case the reading/writing from swap is slow, or simply that > > > the system > > > on the whole is slow because it's dealing with a sustained memory > > deficit? > > > > It tends to look something like this: > > > > $ vmstat 1 > > kthr memory page disk faults cpu > > r b w swap free re mf pi po fr de sr cd cd m1 m1 in sy cs us sy > > id > > 0 0 0 19625708 3285120 1 4 1 1 1 0 6 1 1 11 11 455 260 247 4 0 > > 96 > > 0 1 70 16001276 645628 2 27 3428 0 0 0 0 442 447 0 0 3012 516 1982 > > 97 3 0 > > 0 1 70 16001276 642208 0 0 3489 0 0 0 0 437 432 0 0 3074 381 2002 > > 97 3 0 > > 0 1 70 16001276 638964 0 0 3343 0 0 0 0 417 417 0 0 2997 350 1914 > > 98 2 0 > > 0 1 70 16001276 635504 0 0 3442 0 0 0 0 430 434 0 0 3067 536 2016 > > 97 3 0 > > 0 1 70 16001276 632076 0 0 3434 0 0 0 0 429 425 0 0 3164 885 2125 > > 97 3 0 > > 0 1 70 16001276 628548 0 0 3549 0 0 0 0 445 445 0 0 3185 582 2105 > > 97 3 0 > > 0 1 70 16001276 625104 0 0 3459 0 0 0 0 463 469 0 0 3376 594 2100 > > 97 3 0 > > > > $ vmstat -p 1 > > memory > > page executable anonymous filesystem > > > > swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf > > 19625616 3285052 1 4 1 0 > > 6 0 0 0 0 0 0 1 0 1 > > 16001244 440392 21 31 0 0 0 0 0 0 0 0 0 > > 2911 0 0 > > 16001244 437120 21 0 0 0 0 0 0 0 0 0 0 > > 3188 0 0 > > 16001244 433592 14 0 0 0 0 0 0 0 0 0 0 > > 3588 0 0 > > 16001244 429732 28 0 0 0 0 0 0 0 0 0 0 > > 3712 0 0 > > 16001244 426036 18 0 0 0 0 0 0 0 0 0 0 > > 3679 0 0 > > 16001244 422448 2 0 0 0 0 0 0 0 0 0 0 > > 3468 0 0 > > 16001244 418980 5 0 0 0 0 0 0 0 0 0 0 > > 3435 0 0 > > 16001244 416012 8 0 0 0 0 0 0 0 0 0 0 > > 2855 0 0 > > 16001244 412648 8 0 0 0 0 0 0 0 0 0 0 > > 3256 0 0 > > 16001244 409292 31 0 0 0 0 0 0 0 0 0 0 > > 3426 0 0 > > 16001244 405760 10 0 0 0 0 0 0 0 0 0 0 > > 3602 0 0 > > > > > > > Also, I'd like to understand better what you're looking to optimize for. > > > In general, "tuning" for swap is a pointless exercise (and it's not my > > > contention > > > that that is what you're looking to do - I'm not actually sure), because > > the > > > IO performance of the swap device is really a second order effect of > > > having a memory working set size larger than physical RAM, which means > > > the kernel spends a lot of time doing memory management things. > > > > I think we're trying to optimize for usage of swap having as little > > impact as possible. With multiple large java processes needing to run > > in as little time as possible, and with the business demands that > > exist making it impossible to have an overall rss < real mem 100% of > > the time, we want to minimize the impact of pagins. > > > > > The poor behavior of swap may really be a just a symptom of other > > activities > > > related to memory management. > > > > Possibly. > > > > > What kind of machine is this, and what does CPU utilization look like > > > when you're inducing this behavior? > > > > These are a variety of systems. IBM 360, sun v20z, and x4100 (we have > > m1's and m2's. I personally have only tested on m1 systems). This > > behavior seems consistant on all of them. > > > > The program we're using to pin memory is this: > > > > > > > > #include <stdio.h> > > #include <stdlib.h> > > #include <unistd.h> > > > > int main(int argc, char** argv) > > { > > if (argc != 2) { > > printf("Bad args\n"); > > return 1; > > } > > > > const int count = atoi(argv[1]); > > if (count <= 3) { > > printf("Bad count: %s\n", argv[1]); > > return 1; > > } > > > > // Malloc > > const int nints = count >> 2; > > int* buf = (int*)malloc(count); > > if (buf == NULL) { > > perror("Failed to malloc"); > > return 1; > > } > > > > // Init > > for (int i=0; i < nints; i++) { > > buf[i] = rand(); > > } > > > > // Maintain working set > > for (;;) { > > return 1; > > } > > > > const int count = atoi(argv[1]); > > if (count <= 3) { > > printf("Bad count: %s\n", argv[1]); > > return 1; > > } > > > > // Malloc > > const int nints = count >> 2; > > int* buf = (int*)malloc(count); > > if (buf == NULL) { > > perror("Failed to malloc"); > > return 1; > > } > > > > // Init > > for (int i=0; i < nints; i++) { > > buf[i] = rand(); > > } > > > > // Maintain working set > > for (;;) { > > for (int i=0; i < nints; i++) { > > buf[i]++; > > } > > //sleep(1); > > } > > > > return 0; > > } > > > > Nothing too complex. Reads and writes to /tmp and /var/tmp in our > > tests were all done with dd. > > > > I am following up with sun support for this, but in the mean time I am > > curious if you our anyone else out there see the same behavior? > > > > Thanks, > > > > -Peter > > > > -- > > The 5 year plan: > > In five years we'll make up another plan. > > Or just re-use this one. > > > > _______________________________________________ > > perf-discuss mailing list > > perf-discuss@opensolaris.org > > -- The 5 year plan: In five years we'll make up another plan. Or just re-use this one. _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org