Are you running Solaris 10u3+? In that case this problem may be due to 6583268 tmpfs tries too hard to reserve memory <http://monaco.sfbay/detail.jsp?cr=6583268>.
This is currently fixed in Nevada. I guess it will be back ported to S10 patch. -Prakash. * * adrian cockcroft wrote: > How fast do disks turn? You get one page per revolution. Adding more > swap disks would only help if there was more than one thread trying to > read the data. Ultra 1 had a nice fast 7200rpm SCSI disk... > > Adrian > > On 8/15/07, *Peter C. Norton* <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > What is confusing me in this case is that the speeds seem to be > unreasonably slow. I would expect, for instance, that the speed of > swap access would scale with the speed of the disk, cpu, etc. > > In this case, the speed of swap looks like it has stayed about what it > was on an ultra1 while everything else has shot ahead, leading me to > feel like there's a specific limit that may be there that could be > easiliy removed or tuned to not make this as visible. > > Thanks, > > -Peter > > On Wed, Aug 15, 2007 at 03:49:33PM -0700, adrian cockcroft wrote: > > I've seen this before and its expected behavior, this is my > explanation: > > > > Pages are written using large sequential writes to swap in > physical memory > > scanner order, so they are guaranteed to be randomly jumbled on > disk. This > > means that swap is as fast as possible for writes and as slow as > possible > > for reads, which will be random seeks one page at a time. Its > been this way > > forever. Anything that swaps/pages out will be horribly slow on > the way back > > in. > > Add enough RAM to never swap, or possibly mount a real disk or a > solid state > > disk for /tmp > > > > Adrian > > > > On 8/15/07, Peter C. Norton <[EMAIL PROTECTED]> > wrote: > > > > > > On Wed, Aug 15, 2007 at 04:29:54PM -0400, Jim Mauro wrote: > > > > > > > > What would be interesting here is the paging statistics > during your test > > > > case. > > > > What does "vmstat 1", and "vmstat -p 1" look like while your > generating > > > > this behavior? > > > > > > > > Is it really case the reading/writing from swap is slow, or > simply that > > > > the system > > > > on the whole is slow because it's dealing with a sustained > memory > > > deficit? > > > > > > It tends to look something like this: > > > > > > $ vmstat 1 > > > > kthr memory page disk faults cpu > > > r b w swap free re mf pi po fr de sr cd cd m1 m1 in > sy cs us sy > > > id > > > 0 0 0 19625708 3285120 1 4 1 1 1 0 6 1 1 11 > 11 455 260 247 4 0 > > > 96 > > > 0 1 70 16001276 645628 2 27 3428 0 0 0 0 442 447 0 0 > 3012 516 1982 > > > 97 3 0 > > > 0 1 70 16001276 642208 0 0 3489 0 0 0 0 437 432 0 0 > 3074 381 2002 > > > 97 3 0 > > > 0 1 70 16001276 638964 0 0 3343 0 0 0 0 417 417 0 0 > 2997 350 1914 > > > 98 2 0 > > > 0 1 70 16001276 635504 0 0 3442 0 0 0 0 430 434 0 0 > 3067 536 2016 > > > 97 3 0 > > > 0 1 70 16001276 632076 0 0 3434 0 0 0 0 429 425 0 0 > 3164 885 2125 > > > 97 3 0 > > > 0 1 70 16001276 628548 0 0 3549 0 0 0 0 445 445 0 0 > 3185 582 2105 > > > 97 3 0 > > > 0 1 70 16001276 625104 0 0 3459 0 0 0 0 463 469 0 0 > 3376 594 2100 > > > 97 3 0 > > > > > > $ vmstat -p 1 > > > memory > > > page executable anonymous filesystem > > > > > > > swap free re mf fr de sr epi epo epf api apo apf fpi fpo > fpf > > > 19625616 3285052 1 4 1 0 > > > 6 0 0 0 0 0 0 1 0 1 > > > 16001244 440392 21 31 0 0 0 0 0 0 0 0 0 > > > 2911 0 0 > > > 16001244 437120 21 0 0 0 0 0 0 0 0 0 0 > > > 3188 0 0 > > > 16001244 433592 14 0 0 0 0 0 0 0 0 0 0 > > > 3588 0 0 > > > 16001244 429732 28 0 0 0 0 0 0 0 0 0 0 > > > 3712 0 0 > > > 16001244 426036 18 0 0 0 0 0 0 0 0 0 0 > > > 3679 0 0 > > > 16001244 422448 2 0 0 0 0 0 0 0 0 0 0 > > > 3468 0 0 > > > 16001244 418980 5 0 0 0 0 0 0 0 0 0 0 > > > 3435 0 0 > > > 16001244 416012 8 0 0 0 0 0 0 0 0 0 0 > > > 2855 0 0 > > > 16001244 412648 8 0 0 0 0 0 0 0 0 0 0 > > > 3256 0 0 > > > 16001244 409292 31 0 0 0 0 0 0 0 0 0 0 > > > 3426 0 0 > > > 16001244 405760 10 0 0 0 0 0 0 0 0 0 0 > > > 3602 0 0 > > > > > > > > > > Also, I'd like to understand better what you're looking to > optimize for. > > > > In general, "tuning" for swap is a pointless exercise (and > it's not my > > > > contention > > > > that that is what you're looking to do - I'm not actually > sure), because > > > the > > > > IO performance of the swap device is really a second order > effect of > > > > having a memory working set size larger than physical RAM, > which means > > > > the kernel spends a lot of time doing memory management things. > > > > > > I think we're trying to optimize for usage of swap having as > little > > > impact as possible. With multiple large java processes > needing to run > > > in as little time as possible, and with the business demands that > > > exist making it impossible to have an overall rss < real mem > 100% of > > > the time, we want to minimize the impact of pagins. > > > > > > > The poor behavior of swap may really be a just a symptom of > other > > > activities > > > > related to memory management. > > > > > > Possibly. > > > > > > > What kind of machine is this, and what does CPU utilization > look like > > > > when you're inducing this behavior? > > > > > > These are a variety of systems. IBM 360, sun v20z, and x4100 > (we have > > > m1's and m2's. I personally have only tested on m1 systems). This > > > behavior seems consistant on all of them. > > > > > > The program we're using to pin memory is this: > > > > > > > > > > > > #include <stdio.h> > > > #include <stdlib.h> > > > #include <unistd.h> > > > > > > int main(int argc, char** argv) > > > { > > > if (argc != 2) { > > > printf("Bad args\n"); > > > return 1; > > > } > > > > > > const int count = atoi(argv[1]); > > > if (count <= 3) { > > > printf("Bad count: %s\n", argv[1]); > > > return 1; > > > } > > > > > > // Malloc > > > const int nints = count >> 2; > > > int* buf = (int*)malloc(count); > > > if (buf == NULL) { > > > perror("Failed to malloc"); > > > return 1; > > > } > > > > > > // Init > > > for (int i=0; i < nints; i++) { > > > buf[i] = rand(); > > > } > > > > > > // Maintain working set > > > for (;;) { > > > return 1; > > > } > > > > > > const int count = atoi(argv[1]); > > > if (count <= 3) { > > > printf("Bad count: %s\n", argv[1]); > > > return 1; > > > } > > > > > > // Malloc > > > const int nints = count >> 2; > > > int* buf = (int*)malloc(count); > > > if (buf == NULL) { > > > perror("Failed to malloc"); > > > return 1; > > > } > > > > > > // Init > > > for (int i=0; i < nints; i++) { > > > buf[i] = rand(); > > > } > > > > > > // Maintain working set > > > for (;;) { > > > for (int i=0; i < nints; i++) { > > > buf[i]++; > > > } > > > //sleep(1); > > > } > > > > > > return 0; > > > } > > > > > > Nothing too complex. Reads and writes to /tmp and /var/tmp in our > > > tests were all done with dd. > > > > > > I am following up with sun support for this, but in the mean > time I am > > > curious if you our anyone else out there see the same behavior? > > > > > > Thanks, > > > > > > -Peter > > > > > > -- > > > The 5 year plan: > > > In five years we'll make up another plan. > > > Or just re-use this one. > > > > > > _______________________________________________ > > > perf-discuss mailing list > > > perf-discuss@opensolaris.org <mailto:perf-discuss@opensolaris.org> > > > > > -- > The 5 year plan: > In five years we'll make up another plan. > Or just re-use this one. > > > ------------------------------------------------------------------------ > > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org > _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org