Re: [perf-discuss] Poor swap performance

Prakash Sangappa Wed, 15 Aug 2007 17:16:36 -0700

That is an internal link to the bug report. Here is the external link
http://bugs.opensolaris.org/view_bug.do?bug_id=6583268


-Prakash.

Prakash Sangappa wrote:
> Are you running Solaris 10u3+? In that case this problem may be due
> to
>
> 6583268 tmpfs tries too hard to reserve memory 
> <http://monaco.sfbay/detail.jsp?cr=6583268>.
>
> This is currently fixed in Nevada. I guess it will be back ported to
> S10 patch.
>
> -Prakash.
>
> *
> *
> adrian cockcroft wrote:
>   
>> How fast do disks turn? You get one page per revolution. Adding more 
>> swap disks would only help if there was more than one thread trying to 
>> read the data. Ultra 1 had a nice fast 7200rpm SCSI disk...
>>
>> Adrian
>>
>> On 8/15/07, *Peter C. Norton* <[EMAIL PROTECTED] 
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     What is confusing me in this case is that the speeds seem to be
>>     unreasonably slow. I would expect, for instance, that the speed of
>>     swap access would scale with the speed of the disk, cpu, etc.
>>
>>     In this case, the speed of swap looks like it has stayed about what it
>>     was on an ultra1 while everything else has shot ahead, leading me to
>>     feel like there's a specific limit that may be there that could be
>>     easiliy removed or tuned to not make this as visible.
>>
>>     Thanks,
>>
>>     -Peter
>>
>>     On Wed, Aug 15, 2007 at 03:49:33PM -0700, adrian cockcroft wrote:
>>     > I've seen this before and its expected behavior, this is my
>>     explanation:
>>     >
>>     > Pages are written using large sequential writes to swap in
>>     physical memory
>>     > scanner order, so they are guaranteed to be randomly jumbled on
>>     disk. This
>>     > means that swap is as fast as possible for writes and as slow as
>>     possible
>>     > for reads, which will be random seeks one page at a time. Its
>>     been this way
>>     > forever. Anything that swaps/pages out will be horribly slow on
>>     the way back
>>     > in.
>>     > Add enough RAM to never swap, or possibly mount a real disk or a
>>     solid state
>>     > disk for /tmp
>>     >
>>     > Adrian
>>     >
>>     > On 8/15/07, Peter C. Norton <[EMAIL PROTECTED]>
>>     wrote:
>>     > >
>>     > > On Wed, Aug 15, 2007 at 04:29:54PM -0400, Jim Mauro wrote:
>>     > > >
>>     > > > What would be interesting here is the paging statistics
>>     during your test
>>     > > > case.
>>     > > > What does "vmstat 1", and "vmstat -p 1" look like while your
>>     generating
>>     > > > this behavior?
>>     > > >
>>     > > > Is it really case the reading/writing from swap is slow, or
>>     simply that
>>     > > > the system
>>     > > > on the whole is slow because it's dealing with a sustained
>>     memory
>>     > > deficit?
>>     > >
>>     > > It tends to look something like this:
>>     > >
>>     > > $ vmstat 1
>>     > >
>>     kthr      memory            page            disk          faults      cpu
>>     > > r b w   swap  free  re  mf pi po fr de sr cd cd m1 m1   in  
>>     sy   cs us sy
>>     > > id
>>     > > 0 0 0 19625708 3285120 1 4  1  1  1  0  6  1  1 11
>>     11  455  260  247  4  0
>>     > > 96
>>     > > 0 1 70 16001276 645628 2 27 3428 0 0 0  0 442 447 0 0
>>     3012  516 1982
>>     > > 97  3  0
>>     > > 0 1 70 16001276 642208 0 0 3489 0 0  0  0 437 432 0 0
>>     3074  381 2002
>>     > > 97  3  0
>>     > > 0 1 70 16001276 638964 0 0 3343 0 0  0  0 417 417 0 0
>>     2997  350 1914
>>     > > 98  2  0
>>     > > 0 1 70 16001276 635504 0 0 3442 0 0  0  0 430 434 0 0
>>     3067  536 2016
>>     > > 97  3  0
>>     > > 0 1 70 16001276 632076 0 0 3434 0 0  0  0 429 425 0 0
>>     3164  885 2125
>>     > > 97  3  0
>>     > > 0 1 70 16001276 628548 0 0 3549 0 0  0  0 445 445 0 0
>>     3185  582 2105
>>     > > 97  3  0
>>     > > 0 1 70 16001276 625104 0 0 3459 0 0  0  0 463 469 0 0
>>     3376  594 2100
>>     > > 97  3  0
>>     > >
>>     > > $ vmstat -p 1
>>     > >      memory
>>     > > page          executable      anonymous      filesystem
>>     > >
>>     > >
>>     swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  
>> fpf
>>     > > 19625616 3285052 1 4   1   0
>>     > > 6    0    0    0    0    0    0    1    0    1
>>     > > 16001244 440392 21 31  0   0   0    0    0    0    0    0    0
>>     > > 2911    0    0
>>     > > 16001244 437120 21 0   0   0   0    0    0    0    0    0    0
>>     > > 3188    0    0
>>     > > 16001244 433592 14 0   0   0   0    0    0    0    0    0    0
>>     > > 3588    0    0
>>     > > 16001244 429732 28 0   0   0   0    0    0    0    0    0    0
>>     > > 3712    0    0
>>     > > 16001244 426036 18 0   0   0   0    0    0    0    0    0    0
>>     > > 3679    0    0
>>     > > 16001244 422448 2  0   0   0   0    0    0    0    0    0    0
>>     > > 3468    0    0
>>     > > 16001244 418980 5  0   0   0   0    0    0    0    0    0    0
>>     > > 3435    0    0
>>     > > 16001244 416012 8  0   0   0   0    0    0    0    0    0    0
>>     > > 2855    0    0
>>     > > 16001244 412648 8  0   0   0   0    0    0    0    0    0    0
>>     > > 3256    0    0
>>     > > 16001244 409292 31 0   0   0   0    0    0    0    0    0    0
>>     > > 3426    0    0
>>     > > 16001244 405760 10 0   0   0   0    0    0    0    0    0    0
>>     > > 3602    0    0
>>     > >
>>     > >
>>     > > > Also, I'd like to understand better what you're looking to
>>     optimize for.
>>     > > > In general, "tuning" for swap is a pointless exercise (and
>>     it's not my
>>     > > > contention
>>     > > > that that is what you're looking to do - I'm not actually
>>     sure), because
>>     > > the
>>     > > > IO performance of the swap device is really a second order
>>     effect of
>>     > > > having a memory working set size larger than physical RAM,
>>     which means
>>     > > > the kernel spends a lot of time doing memory management things.
>>     > >
>>     > > I think we're trying to optimize for usage of swap having as
>>     little
>>     > > impact as possible.  With multiple large java processes
>>     needing to run
>>     > > in as little time as possible, and with the business demands that
>>     > > exist making it impossible to have an overall rss < real mem
>>     100% of
>>     > > the time, we want to minimize the impact of pagins.
>>     > >
>>     > > > The poor behavior of swap may really be a just a symptom of
>>     other
>>     > > activities
>>     > > > related to memory management.
>>     > >
>>     > > Possibly.
>>     > >
>>     > > > What kind of machine is this, and what does CPU utilization
>>     look like
>>     > > > when you're inducing this behavior?
>>     > >
>>     > > These are a variety of systems. IBM 360, sun v20z, and x4100
>>     (we have
>>     > > m1's and m2's. I personally have only tested on m1 systems). This
>>     > > behavior seems consistant on all of them.
>>     > >
>>     > > The program we're using to pin memory is this:
>>     > >
>>     > >
>>     > >
>>     > > #include <stdio.h>
>>     > > #include <stdlib.h>
>>     > > #include <unistd.h>
>>     > >
>>     > > int main(int argc, char** argv)
>>     > > {
>>     > >     if (argc != 2) {
>>     > >         printf("Bad args\n");
>>     > >         return 1;
>>     > >     }
>>     > >
>>     > >     const int count = atoi(argv[1]);
>>     > >     if (count <= 3) {
>>     > >         printf("Bad count: %s\n", argv[1]);
>>     > >         return 1;
>>     > >     }
>>     > >
>>     > >     // Malloc
>>     > >     const int nints = count >> 2;
>>     > >     int* buf = (int*)malloc(count);
>>     > >     if (buf == NULL) {
>>     > >         perror("Failed to malloc");
>>     > >         return 1;
>>     > >     }
>>     > >
>>     > >     // Init
>>     > >     for (int i=0; i < nints; i++) {
>>     > >         buf[i] = rand();
>>     > >     }
>>     > >
>>     > >     // Maintain working set
>>     > >     for (;;) {
>>     > >         return 1;
>>     > >     }
>>     > >
>>     > >     const int count = atoi(argv[1]);
>>     > >     if (count <= 3) {
>>     > >         printf("Bad count: %s\n", argv[1]);
>>     > >         return 1;
>>     > >     }
>>     > >
>>     > >     // Malloc
>>     > >     const int nints = count >> 2;
>>     > >     int* buf = (int*)malloc(count);
>>     > >     if (buf == NULL) {
>>     > >         perror("Failed to malloc");
>>     > >         return 1;
>>     > >     }
>>     > >
>>     > >     // Init
>>     > >     for (int i=0; i < nints; i++) {
>>     > >         buf[i] = rand();
>>     > >     }
>>     > >
>>     > >     // Maintain working set
>>     > >     for (;;) {
>>     > >         for (int i=0; i < nints; i++) {
>>     > >             buf[i]++;
>>     > >         }
>>     > >         //sleep(1);
>>     > >     }
>>     > >
>>     > >     return 0;
>>     > > }
>>     > >
>>     > > Nothing too complex. Reads and writes to /tmp and /var/tmp in our
>>     > > tests were all done with dd.
>>     > >
>>     > > I am following up with sun support for this, but in the mean
>>     time I am
>>     > > curious if you our anyone else out there see the same behavior?
>>     > >
>>     > > Thanks,
>>     > >
>>     > > -Peter
>>     > >
>>     > > --
>>     > > The 5 year plan:
>>     > > In five years we'll make up another plan.
>>     > > Or just re-use this one.
>>     > >
>>     > > _______________________________________________
>>     > > perf-discuss mailing list
>>     > > perf-discuss@opensolaris.org <mailto:perf-discuss@opensolaris.org>
>>     > >
>>
>>     --
>>     The 5 year plan:
>>     In five years we'll make up another plan.
>>     Or just re-use this one.
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> perf-discuss mailing list
>> perf-discuss@opensolaris.org
>>   
>>     
>
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org
>   

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] Poor swap performance

Reply via email to