What is confusing me in this case is that the speeds seem to be
unreasonably slow. I would expect, for instance, that the speed of
swap access would scale with the speed of the disk, cpu, etc.

In this case, the speed of swap looks like it has stayed about what it
was on an ultra1 while everything else has shot ahead, leading me to
feel like there's a specific limit that may be there that could be
easiliy removed or tuned to not make this as visible.

Thanks,

-Peter

On Wed, Aug 15, 2007 at 03:49:33PM -0700, adrian cockcroft wrote:
> I've seen this before and its expected behavior, this is my explanation:
> 
> Pages are written using large sequential writes to swap in physical memory
> scanner order, so they are guaranteed to be randomly jumbled on disk. This
> means that swap is as fast as possible for writes and as slow as possible
> for reads, which will be random seeks one page at a time. Its been this way
> forever. Anything that swaps/pages out will be horribly slow on the way back
> in.
> Add enough RAM to never swap, or possibly mount a real disk or a solid state
> disk for /tmp
> 
> Adrian
> 
> On 8/15/07, Peter C. Norton <[EMAIL PROTECTED]> wrote:
> >
> > On Wed, Aug 15, 2007 at 04:29:54PM -0400, Jim Mauro wrote:
> > >
> > > What would be interesting here is the paging statistics during your test
> > > case.
> > > What does "vmstat 1", and "vmstat -p 1" look like while your generating
> > > this behavior?
> > >
> > > Is it really case the reading/writing from swap is slow, or simply that
> > > the system
> > > on the whole is slow because it's dealing with a sustained memory
> > deficit?
> >
> > It tends to look something like this:
> >
> > $ vmstat 1
> > kthr      memory            page            disk          faults      cpu
> > r b w   swap  free  re  mf pi po fr de sr cd cd m1 m1   in   sy   cs us sy
> > id
> > 0 0 0 19625708 3285120 1 4  1  1  1  0  6  1  1 11 11  455  260  247  4  0
> > 96
> > 0 1 70 16001276 645628 2 27 3428 0 0 0  0 442 447 0 0 3012  516 1982
> > 97  3  0
> > 0 1 70 16001276 642208 0 0 3489 0 0  0  0 437 432 0 0 3074  381 2002
> > 97  3  0
> > 0 1 70 16001276 638964 0 0 3343 0 0  0  0 417 417 0 0 2997  350 1914
> > 98  2  0
> > 0 1 70 16001276 635504 0 0 3442 0 0  0  0 430 434 0 0 3067  536 2016
> > 97  3  0
> > 0 1 70 16001276 632076 0 0 3434 0 0  0  0 429 425 0 0 3164  885 2125
> > 97  3  0
> > 0 1 70 16001276 628548 0 0 3549 0 0  0  0 445 445 0 0 3185  582 2105
> > 97  3  0
> > 0 1 70 16001276 625104 0 0 3459 0 0  0  0 463 469 0 0 3376  594 2100
> > 97  3  0
> >
> > $ vmstat -p 1
> >      memory
> > page          executable      anonymous      filesystem
> >
> > swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
> > 19625616 3285052 1 4   1   0
> > 6    0    0    0    0    0    0    1    0    1
> > 16001244 440392 21 31  0   0   0    0    0    0    0    0    0
> > 2911    0    0
> > 16001244 437120 21 0   0   0   0    0    0    0    0    0    0
> > 3188    0    0
> > 16001244 433592 14 0   0   0   0    0    0    0    0    0    0
> > 3588    0    0
> > 16001244 429732 28 0   0   0   0    0    0    0    0    0    0
> > 3712    0    0
> > 16001244 426036 18 0   0   0   0    0    0    0    0    0    0
> > 3679    0    0
> > 16001244 422448 2  0   0   0   0    0    0    0    0    0    0
> > 3468    0    0
> > 16001244 418980 5  0   0   0   0    0    0    0    0    0    0
> > 3435    0    0
> > 16001244 416012 8  0   0   0   0    0    0    0    0    0    0
> > 2855    0    0
> > 16001244 412648 8  0   0   0   0    0    0    0    0    0    0
> > 3256    0    0
> > 16001244 409292 31 0   0   0   0    0    0    0    0    0    0
> > 3426    0    0
> > 16001244 405760 10 0   0   0   0    0    0    0    0    0    0
> > 3602    0    0
> >
> >
> > > Also, I'd like to understand better what you're looking to optimize for.
> > > In general, "tuning" for swap is a pointless exercise (and it's not my
> > > contention
> > > that that is what you're looking to do - I'm not actually sure), because
> > the
> > > IO performance of the swap device is really a second order effect of
> > > having a memory working set size larger than physical RAM, which means
> > > the kernel spends a lot of time doing memory management things.
> >
> > I think we're trying to optimize for usage of swap having as little
> > impact as possible.  With multiple large java processes needing to run
> > in as little time as possible, and with the business demands that
> > exist making it impossible to have an overall rss < real mem 100% of
> > the time, we want to minimize the impact of pagins.
> >
> > > The poor behavior of swap may really be a just a symptom of other
> > activities
> > > related to memory management.
> >
> > Possibly.
> >
> > > What kind of machine is this, and what does CPU utilization look like
> > > when you're inducing this behavior?
> >
> > These are a variety of systems. IBM 360, sun v20z, and x4100 (we have
> > m1's and m2's. I personally have only tested on m1 systems). This
> > behavior seems consistant on all of them.
> >
> > The program we're using to pin memory is this:
> >
> >
> >
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> >
> > int main(int argc, char** argv)
> > {
> >     if (argc != 2) {
> >         printf("Bad args\n");
> >         return 1;
> >     }
> >
> >     const int count = atoi(argv[1]);
> >     if (count <= 3) {
> >         printf("Bad count: %s\n", argv[1]);
> >         return 1;
> >     }
> >
> >     // Malloc
> >     const int nints = count >> 2;
> >     int* buf = (int*)malloc(count);
> >     if (buf == NULL) {
> >         perror("Failed to malloc");
> >         return 1;
> >     }
> >
> >     // Init
> >     for (int i=0; i < nints; i++) {
> >         buf[i] = rand();
> >     }
> >
> >     // Maintain working set
> >     for (;;) {
> >         return 1;
> >     }
> >
> >     const int count = atoi(argv[1]);
> >     if (count <= 3) {
> >         printf("Bad count: %s\n", argv[1]);
> >         return 1;
> >     }
> >
> >     // Malloc
> >     const int nints = count >> 2;
> >     int* buf = (int*)malloc(count);
> >     if (buf == NULL) {
> >         perror("Failed to malloc");
> >         return 1;
> >     }
> >
> >     // Init
> >     for (int i=0; i < nints; i++) {
> >         buf[i] = rand();
> >     }
> >
> >     // Maintain working set
> >     for (;;) {
> >         for (int i=0; i < nints; i++) {
> >             buf[i]++;
> >         }
> >         //sleep(1);
> >     }
> >
> >     return 0;
> > }
> >
> > Nothing too complex. Reads and writes to /tmp and /var/tmp in our
> > tests were all done with dd.
> >
> > I am following up with sun support for this, but in the mean time I am
> > curious if you our anyone else out there see the same behavior?
> >
> > Thanks,
> >
> > -Peter
> >
> > --
> > The 5 year plan:
> > In five years we'll make up another plan.
> > Or just re-use this one.
> >
> > _______________________________________________
> > perf-discuss mailing list
> > perf-discuss@opensolaris.org
> >

-- 
The 5 year plan:
In five years we'll make up another plan.
Or just re-use this one.

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to